Olete.in
Articles
Mock Tests
🧪 High Performance Computing (HPC) MCQ Quiz Hub
High Performance Computing (HPC) MCQ Set 1
Choose a topic to test your knowledge and improve your High Performance Computing (HPC) skills
1. A CUDA program is comprised of two primary components: a host and a _____.
gpu??kernel
cpu??kernel
os
none of above
2. The kernel code is dentified by the ________qualifier with void return type
_host_
__global__??
_device_
void
3. Calling a kernel is typically referred to as _________.
kernel thread
kernel initialization
kernel termination
kernel invocation
4. the BlockPerGrid and ThreadPerBlock parameters are related to the ________ model supported by CUDA.
host
kernel
thread??abstraction
None of the above
5. _______ is Callable from the device only
_host_
__global__?? C.
_device_
None of the above
6. ____ is Callable from the host
_host_
__global__??
_device_
none of above
7. ______ is Callable from the host
_host_ B. C.
__global__??
_device_
None of the above
8. CUDA supports ____________ in which code in a single thread is executed by all other threads.
tread division
tread termination
thread abstraction
None of the above
9. . In CUDA, a single invoked kernel is referred to as a _____.
block
tread
grid
None of the above
10. A grid is comprised of ________ of threads.
block
bunch
host
None of the above
11. A block is comprised of multiple _______.
treads
bunch
host
None of the above
12. a solution of the problem in representing the parallelismin algorithm is
cud
pta
cda
cuda
13. Host codes in a CUDA application can not Reset a device
true
false
all
None of These
14. Any condition that causes a processor to stall is called as _____.
hazard
page fault
system error
none of the above
15. The time lost due to branch instruction is often referred to as _____.
latency
delay
branch penalty
None of the above
16. ___ method is used in centralized systems to perform out of order execution.
scorecard
score boarding
optimizing
redundancy
17. The computer cluster architecture emerged as an alternative for ____.
isa
workstation
super computers
distributed systems
18. NVIDIA CUDA Warp is made up of how many threads?
512
1024
312
32
19. Out-of-order instructions is not possible on GPUs.
true
false
--
--
20. CUDA supports programming in ....
c or c++ only
java, python, and more
c, c++, third party wrappers for java, python, and more
pascal
21. FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.
32-bit ieee floating point instructions
32-bit integer instructions
both
none of the above
22. Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).
1024
128
512
8
23. Each NVIDIA GPU has ------ Streaming Multiprocessors
8
1024
512
16
24. CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.
programming-overhead”, 2 clock
zero-overhead”, 1 clock
64, 2 clock
32, 1 clock
25. Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.
simd (single instruction multiple data)
simt (single instruction multiple thread)
sisd (single instruction single data)
sist (single instruction single thread)
26. Limitations of CUDA Kernel
recursion, call stack, static variable declaration
no recursion, no call stack, no static variable declarations
recursion, no call stack, static variable declaration
no recursion, call stack, no static variable declarations
27. What is Unified Virtual Machine
it is a technique that allow both cpu and gpu to read from single virtual machine, simultaneously.
it is a technique for managing separate host and device memory spaces.
it is a technique for executing device code on host and host code on device.
it is a technique for executing general purpose programs on device instead of host.
28. _____ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.
python, gpus.
c, cpus.
cuda c, gpus.
java, cpus.
29. The CUDA architecture consists of --------- for parallel computing kernels and functions.
risc instruction set architecture
cisc instruction set architecture
zisc instruction set architecture
ptx instruction set architecture
30. CUDA stands for --------, designed by NVIDIA.
common union discrete architecture
complex unidentified device architecture
compute unified device architecture
complex unstructured distributed architecture
31. The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.
true
false
---
---
32. The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device
28, 256, 512
32, 64, 128
64, 128, 256
256, 512, 1024
33. NVIDIA 8-series GPUs offer -------- .
50-200 gflops
200-400 gflops
400-800 gflops
800-1000 gflops
34. IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.
32-bit ieee floating point instructions
32-bit integer instructions
both
none of the above
35. CUDA Hardware programming model supports: a) fully generally data-parallel archtecture; b) General thread launch; c) Global load-store; d) Parallel data cache; e) Scalar architecture; f) Integers, bit operation
a,c,d,f
b,c,d,e
a,d,e,f
a,b,c,d,e,f
36. In CUDA memory model there are following memory types available: a) Registers; b) Local Memory; c) Shared Memory; d) Global Memory; e) Constant Memory; f) Texture Memory.
a, b, d, f
a, c, d, e, f
a, b, c, d, e, f
b, c, e, f
37. What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World! "); return 0; }
int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\n"); return 0; }
__global__ void kernel( void ) { } int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\n"); return 0; }
__global__ void kernel( void ) { kernel <<<1,1>>>(); printf("hello, world!\n"); return 0; }
_global__ int main ( void ) { kernel <<<1,1>>>(); printf("hello, world!\n"); return 0; }
38. Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }
a
s
both a,b
---
39. If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:
cudamalloc( &dev_a, sizeof( int ) )
malloc( &dev_a, sizeof( int ) )
cudamalloc( (void**) &dev_a, sizeof( int ) )
malloc( (void**) &dev_a, sizeof( int ) )
40. If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:
memcpy( dev_a, &a, size);
cudamemcpy( dev_a, &a, size, cudamemcpyhosttodevice );
memcpy( (void*) dev_a, &a, size);
cudamemcpy( (void*) &dev_a, &a, size, cudamemcpydevicetohost );
41. Triple angle brackets mark in a statement inside main function, what does it indicates?
a call from host code to device code
a call from device code to host code
less than comparison
greater than comparison
42. What makes a CUDA code runs in parallel
__global__ indicates parallel execution of code
main() function indicates parallel execution of code
kernel name outside triple angle bracket indicates excecution of kernel n times in parallel
. first parameter value inside triple angle bracket (n) indicates excecution of kernel n times in parallel
43. In ___________, the number of elements to be sorted is small enough to fit into the process's main memory.
internal sorting
internal searching
external sorting
external searching
44. _____________ algorithms use auxiliary storage (such as tapes and hard disks) for sorting because the number of elements to be sorted is too large to fit into memory.
internal sorting
internal searching
external sorting
external searching
45. ____ can be comparison-based or noncomparison-based.
searching
sorting
both a and b
none of above
46. The fundamental operation of comparison-based sorting is ________.
compare-exchange
searching
sorting
swapping
47. The performance of quicksort depends critically on the quality of the ______-.
non-pivote
pivot
center element
len of array
48. The main advantage of ______ is that its storage requirement is linear in the depth of the state space being searched.
bfs
dfs
a and b
None of the above
49. ___ algorithms use a heuristic to guide search.
bfs
dfs
a and b
none of above
50. Graph search involves a closed list, where the major operation is a _______
sorting
searching
lookup
None of the above
51. Breadth First Search is equivalent to which of the traversal in the Binary Trees?
pre-order traversal
post-order traversal
level-order traversal
in-order traversal
52. Time Complexity of Breadth First Search is? (V – number of vertices, E – number of edges)
o(v + e)
o(v)
o(e)
o(v*e)
53. Which of the following is not an application of Breadth First Search?
when the graph is a binary tree
when the graph is a linked list
when the graph is a n-ary tree
when the graph is a ternary tree
54. In BFS, how many times a node is visited?
once
twice
equivalent to number of indegree of the node
thrice
55. Which of the following is not a stable sorting algorithm in its typical implementation.
insertion sort
merge sort
quick sort
bubble sort
56. Which of the following is not true about comparison based sorting algorithms?
the minimum possible time complexity of a comparison based sorting algorithm is o(nlogn) for a random input array
any comparison based sorting algorithm can be made stable by using position as a criteria when two elements are compared
counting sort is not a comparison based sorting algortihm
heap sort is not a comparison based sorting algorithm.
57. mathematically efficiency is
e=s/p
e=p/s
e*s=p/2
e=p+e/e
58. Cost of a parallel system is sometimes referred to____ of product
work
processor time
both
None of the above
59. Scaling Characteristics of Parallel Programs Ts is
increase
constant
decreases
none
60. Speedup tends to saturate and efficiency _____ as a consequence of Amdahl’s law.
increase
constant
decreases
none
61. Speedup obtained when the problem size is _______ linearlywith the number of processing elements.
increase
constant
decreases
depend on problem size
62. The n × n matrix is partitioned among n processors, with each processor storing complete ___ of the matrix.
row
column
both
depend on processor
63. cost-optimal parallel systems have an efficiency of ___
. 1
n
logn
complex
64. The n × n matrix is partitioned among n2 processors such that each processor owns a _____ element.
n
2n
single
double
65. how many basic communication operations are used in matrix vector multiplication
1
2
3
4
66. In DNS algorithm of matrix multiplication it used
1d partition
2d partition
3d partition
both a,b
67. In the Pipelined Execution, steps contain
normalization
communication
elimination
all
68. the cost of the parallel algorithm is higher than the sequential run time by a factor of __
2020-03-02 00:00:00
2020-02-03 00:00:00
3*2
2/3+3/2
69. The load imbalance problem in Parallel Gaussian Elimination: can be alleviated by using a ____ mapping
acyclic
cyclic
both
none
70. A parallel algorithm is evaluated by its runtime in function of
the input size,
the number of processors
the communication parameters.
all
71. For a problem consisting of W units of work, p__W processors can be used optimally.
<=
>=
<
>
72. C(W)__Θ(W) for optimality (necessary condition).
>
<
<=
equals
73. many interactions in oractical parallel programs occur in _____ pattern
well defined
zig-zac
reverse
straight
74. efficient implementation of basic communication operation can improve
performance
communication
algorithm
all
75. efficient use of basic communication operations can reduce
development effort and
software quality
both
none
76. Group communication operations are built using_____ Messenging primitives.
point-to-point
one-to-all
all-to-one
none
77. one processor has a piece of data and it need to send to everyone is
one -to-all
all-to-one
point -to-point
all of above
78. the dual of one -to-all is
all-to-one reduction
one -to-all reduction
pnoint -to-point reducntion
none
79. Data items must be combined piece-wise and the result made available at
target processor finally
target variable finatlalyrget receiver finally
both (a) and (b)
None of these
Submit