🧪 Databricks MCQ Quiz Hub

Databricks Mcq Question Set 1

Choose a topic to test your knowledge and improve your Databricks skills

Which one of the following is not a operations that can be performed using Azure Databricks?





✅ Correct Answer: 3

To which one of the following sources do Azure Databricks connect for collecting streaming data?





✅ Correct Answer: 1

Which one of the following is a Databrick concept?





✅ Correct Answer: 4

Which of the following ensures data reliability even after termination of cluster in Azure Databricks?





✅ Correct Answer: 2

Choose the correct option with respect to ETL operations of data in Azure Databricks?





✅ Correct Answer: 4

Which one of the following is incorrect regarding Workspace of Azure Databricks concept?





✅ Correct Answer: 1

Which of the following Azure datasources can be connected to Azure Databricks?





✅ Correct Answer: 4

Streaming data can be captured by?





✅ Correct Answer: 3

Authentication and authorization in databricks can be managed for :





✅ Correct Answer: 1

Which one of the following is a set of components that run on clusters of Azure Databricks?





✅ Correct Answer: 2

Spark was initially started by ______ at UC Berkeley AMPLab in 2009.





✅ Correct Answer: 2

______ is a component on top of Spark Core.





✅ Correct Answer: 2

Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.





✅ Correct Answer: 3

_______ leverages Spark Core fast scheduling capability to perform streaming analytics.





✅ Correct Answer: 2

____ is a distributed machine learning framework on top of Spark.





✅ Correct Answer: 1

Given a dataframe df, select the code that returns its number of rows:





✅ Correct Answer: 3

Users can easily run Spark on top of Amazon’s _____





✅ Correct Answer: 2

Which of the following can be used to launch Spark jobs inside MapReduce?





✅ Correct Answer: 2

Which of the following language is not supported by Spark?





✅ Correct Answer: 2

Spark is packaged with higher level libraries, including support for _________ queries.





✅ Correct Answer: 1

Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.





✅ Correct Answer: 4

Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:





✅ Correct Answer: 3

Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.





✅ Correct Answer: 1

Spark powers a stack of high-level tools including Spark SQL, MLlib for _____





✅ Correct Answer: 3

For Multiclass classification problem which algorithm is not the solution?





✅ Correct Answer: 4

Which of the following is a tool of Machine Learning Library?





✅ Correct Answer: 4

Which of the following is true for Spark core?





✅ Correct Answer: 1

Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.





✅ Correct Answer: 3

Which of the following is true for Spark MLlib?





✅ Correct Answer: 2

Which of the following is true for RDD?





✅ Correct Answer: 1

RDD is fault-tolerant and immutable





✅ Correct Answer: 1

The read operation on RDD is





✅ Correct Answer: 3

The write operation on RDD is





✅ Correct Answer: 2

Which one of the following commands does NOT trigger an eager evaluation?





✅ Correct Answer: 2

Which one of the following command triggers an eager evaluation?





✅ Correct Answer: 3

Is it possible to mitigate stragglers in RDD?





✅ Correct Answer: 1

Fault Tolerance in RDD is achieved using





✅ Correct Answer: 2

What is action in Spark RDD?





✅ Correct Answer: 1

The shortcomings of Hadoop MapReduce was overcome by Spark RDD by





✅ Correct Answer: 4

Spark is developed in which language





✅ Correct Answer: 2

Which of the following is NOT an actions





✅ Correct Answer: 2

Which of the following is an actions





✅ Correct Answer: 1

Which of the following is a transformation?





✅ Correct Answer: 2

Which of the following is not a component of the Spark Ecosystem?





✅ Correct Answer: 1

Which of the following algorithm is not present in MLlib?





✅ Correct Answer: 3

Which of the following is not the feature of Spark?





✅ Correct Answer: 3

Which of the following is the reason for Spark being Speedy than MapReduce?





✅ Correct Answer: 1

Which of the following statements are NOT true for broadcast variables ?





✅ Correct Answer: 2

Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.





✅ Correct Answer: 1

broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered





✅ Correct Answer: 2