🧪 High Performance Computing (HPC) MCQ Quiz Hub

High Performance Computing (HPC) MCQ Set 1

Choose a topic to test your knowledge and improve your High Performance Computing (HPC) skills

1. A CUDA program is comprised of two primary components: a host and a _____.




2. The kernel code is dentified by the ________qualifier with void return type




3. Calling a kernel is typically referred to as _________.




4. the BlockPerGrid and ThreadPerBlock parameters are related to the ________ model supported by CUDA.




5. _______ is Callable from the device only




6. ____ is Callable from the host




7. ______ is Callable from the host




8. CUDA supports ____________ in which code in a single thread is executed by all other threads.




9. . In CUDA, a single invoked kernel is referred to as a _____.




10. A grid is comprised of ________ of threads.




11. A block is comprised of multiple _______.




12. a solution of the problem in representing the parallelismin algorithm is




13. Host codes in a CUDA application can not Reset a device




14. Any condition that causes a processor to stall is called as _____.




15. The time lost due to branch instruction is often referred to as _____.




16. ___ method is used in centralized systems to perform out of order execution.




17. The computer cluster architecture emerged as an alternative for ____.




18. NVIDIA CUDA Warp is made up of how many threads?




19. Out-of-order instructions is not possible on GPUs.




20. CUDA supports programming in ....




21. FADD, FMAD, FMIN, FMAX are ----- supported by Scalar Processors of NVIDIA GPU.




22. Each streaming multiprocessor (SM) of CUDA herdware has ------ scalar processors (SP).




23. Each NVIDIA GPU has ------ Streaming Multiprocessors




24. CUDA provides ------- warp and thread scheduling. Also, the overhead of thread creation is on the order of ----.




25. Each warp of GPU receives a single instruction and “broadcasts” it to all of its threads. It is a ---- operation.




26. Limitations of CUDA Kernel




27. What is Unified Virtual Machine




28. _____ became the first language specifically designed by a GPU Company to facilitate general purpose computing on ____.




29. The CUDA architecture consists of --------- for parallel computing kernels and functions.




30. CUDA stands for --------, designed by NVIDIA.




31. The host processor spawns multithread tasks (or kernels as they are known in CUDA) onto the GPU device. State true or false.




32. The NVIDIA G80 is a ---- CUDA core device, the NVIDIA G200 is a ---- CUDA core device, and the NVIDIA Fermi is a ---- CUDA core device




33. NVIDIA 8-series GPUs offer -------- .




34. IADD, IMUL24, IMAD24, IMIN, IMAX are ----------- supported by Scalar Processors of NVIDIA GPU.




35. CUDA Hardware programming model supports: a) fully generally data-parallel archtecture; b) General thread launch; c) Global load-store; d) Parallel data cache; e) Scalar architecture; f) Integers, bit operation




36. In CUDA memory model there are following memory types available: a) Registers; b) Local Memory; c) Shared Memory; d) Global Memory; e) Constant Memory; f) Texture Memory.




37. What is the equivalent of general C program with CUDA C: int main(void) { printf("Hello, World! "); return 0; }




38. Which function runs on Device (i.e. GPU): a) __global__ void kernel (void ) { } b) int main ( void ) { ... return 0; }




39. If variable a is host variable and dev_a is a device (GPU) variable, to allocate memory to dev_a select correct statement:




40. If variable a is host variable and dev_a is a device (GPU) variable, to copy input from variable a to variable dev_a select correct statement:




41. Triple angle brackets mark in a statement inside main function, what does it indicates?




42. What makes a CUDA code runs in parallel




43. In ___________, the number of elements to be sorted is small enough to fit into the process's main memory.




44. _____________ algorithms use auxiliary storage (such as tapes and hard disks) for sorting because the number of elements to be sorted is too large to fit into memory.




45. ____ can be comparison-based or noncomparison-based.




46. The fundamental operation of comparison-based sorting is ________.




47. The performance of quicksort depends critically on the quality of the ______-.




48. The main advantage of ______ is that its storage requirement is linear in the depth of the state space being searched.




49. ___ algorithms use a heuristic to guide search.




50. Graph search involves a closed list, where the major operation is a _______




51. Breadth First Search is equivalent to which of the traversal in the Binary Trees?




52. Time Complexity of Breadth First Search is? (V – number of vertices, E – number of edges)




53. Which of the following is not an application of Breadth First Search?




54. In BFS, how many times a node is visited?




55. Which of the following is not a stable sorting algorithm in its typical implementation.




56. Which of the following is not true about comparison based sorting algorithms?




57. mathematically efficiency is




58. Cost of a parallel system is sometimes referred to____ of product




59. Scaling Characteristics of Parallel Programs Ts is




60. Speedup tends to saturate and efficiency _____ as a consequence of Amdahl’s law.




61. Speedup obtained when the problem size is _______ linearlywith the number of processing elements.




62. The n × n matrix is partitioned among n processors, with each processor storing complete ___ of the matrix.




63. cost-optimal parallel systems have an efficiency of ___




64. The n × n matrix is partitioned among n2 processors such that each processor owns a _____ element.




65. how many basic communication operations are used in matrix vector multiplication




66. In DNS algorithm of matrix multiplication it used




67. In the Pipelined Execution, steps contain




68. the cost of the parallel algorithm is higher than the sequential run time by a factor of __




69. The load imbalance problem in Parallel Gaussian Elimination: can be alleviated by using a ____ mapping




70. A parallel algorithm is evaluated by its runtime in function of




71. For a problem consisting of W units of work, p__W processors can be used optimally.




72. C(W)__Θ(W) for optimality (necessary condition).




73. many interactions in oractical parallel programs occur in _____ pattern




74. efficient implementation of basic communication operation can improve




75. efficient use of basic communication operations can reduce




76. Group communication operations are built using_____ Messenging primitives.




77. one processor has a piece of data and it need to send to everyone is




78. the dual of one -to-all is




79. Data items must be combined piece-wise and the result made available at