Databricks Mcq Question Set 1 | Databricks | Learn & Give Online MCQS Test ,MCQs: multiple choice questions and answers,Mock Tests and Practice Papers

Question:

 Spark is packaged with higher level libraries, including support for _________ queries.

1.SQL

2. C

3.C++

4.None of the mentioned

Question:

Authentication and authorization in databricks can be managed for :

1.User, Group, Access Control List

2. User, Group

3. Access Control List

4.Group, Access Control List

Question:

Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

1.True

2.False

3.Canâ€™t Specify

4.None of the mentioned

Question:

broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered

1.mutable

2. immutable

3. both

4.None of above

Question:

Choose the correct option with respect to ETL operations of data in Azure Databricks?

1.For loading of data, data is moved from databricks to data warehouse

2.for loading of data, blob storage is used

3.Blob storage serves as a temporary storage

4. All of the above

Question:

Fault Tolerance in RDD is achieved using

1.Immutable nature of RDD

2.DAG (Directed Acyclic Graph)

3.Lazy-evaluation

4.none of the above

Question:

For Multiclass classification problem which algorithm is not the solution?

1.Naive Bayes

2.Random Forests

3.Logistic Regression

4.Decision Trees

Question:

Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.

1.orderBy(asc_nulls_last(â€œcreated_dateâ€))

2. sort(asc_nulls_last(â€œcreated_dateâ€))

3.orderBy(col(â€œcreated_dateâ€).asc_nulls_last())

4.orderBy(col(â€œcreated_dateâ€), ascending=True))

Question:

Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:

1.df.withColumnRenamed(â€œrevenueâ€, expr(â€œquantity*priceâ€))

2.df.withColumn(revenue, expr(â€œquantity*priceâ€))

3.df.withColumn(â€œrevenueâ€, expr(â€œquantity*priceâ€))

4.df.withColumn(expr(â€œquantity*priceâ€), â€œrevenueâ€)

Question:

Given a dataframe df, select the code that returns its number of rows:

1.df.take(â€˜allâ€™)

2.df.collect()

3.df.count()

4.df.numRows()

Question:

Is it possible to mitigate stragglers in RDD?

1.Yes

2. No

3. Both

4.None of the mentioned

Question:

RDD is fault-tolerant and immutable

1.True

2. False

3.Both

4. none of the mentioned

Question:

Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

1.50

2.60

3.70

4.80

Question:

Spark is developed in which language

1.Java

2.Scala

3.Python

4.R

Question:

Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.

1.100x

2.150x

3.200x

4.None of the mentioned

Question:

Spark powers a stack of high-level tools including Spark SQL, MLlib for _____

1.regression models

2. statistics

3.machine learning

4.reproductive research

Question:

Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

1.Spark Streaming

2.Spark SQL

3.RDDs

4.All of the Mentioned

Question:

Spark was initially started by ______ at UC Berkeley AMPLab in 2009.

1.Mahek Zaharia

2.Matei Zaharia

3.Doug Cutting

4.Stonebraker

Question:

Streaming data can be captured by?

1.Kafka

2.Event Hubs

3.Both A and B

4.none of the above

Question:

The read operation on RDD is

1.Fine-grained

2.Coarse-grained

3.Either fine-grained or coarse-grained

4.Neither fine-grained nor coarse-grained

Question:

The shortcomings of Hadoop MapReduce was overcome by Spark RDD by

1.Lazy-evaluation

2.DAG

3.In-memory processing

4.All of the above

Question:

The write operation on RDD is

1.Fine-grained

2.Coarse-grained

3.Either fine-grained or coarse-grained

4.Neither fine-grained nor coarse-grained

Question:

To which one of the following sources do Azure Databricks connect for collecting streaming data?

1.Kafka

2.Azure data lake

3.CosmosDB

4.none of the above

Question:

Users can easily run Spark on top of Amazonâ€™s _____

1. Infosphere

2.EC2

3.EMR

4.None of the mentioned

Question:

What is action in Spark RDD?

1.The ways to send result from executors to the driver

2.Takes RDD as input and produces one or more RDD as output.

3.Creates one or many new RDDs

4.All of the above

Question:

Which of the following algorithm is not present in MLlib?

1.Streaming Linear Regression

2. Streaming KMeans

3.Tanimoto distance

4.none of the above

Question:

Which of the following Azure datasources can be connected to Azure Databricks?

1.Azure Blob Storage

2.Azure Datawarehouse

3. Azure CosmosDB

4.All of the above

Question:

Which of the following can be used to launch Spark jobs inside MapReduce?

1.SIM

2.SIMR

3.SIR

4.RIS

Question:

Which of the following ensures data reliability even after termination of cluster in Azure Databricks?

1.Databricks Runtime

2.Databricks File System

3.Dashboards

4.Workspace

Question:

Which of the following is a tool of Machine Learning Library?

1.Persistence

2.Utilities like linear algebra, statistics

3.Pipelines

4.All of the above.

Question:

Which of the following is a transformation?

1.foreach()

2. flatMap()

3.save()

4.count()

Question:

Which of the following is an actions

1. foreach()

2.printSchema()

3.cache()

4.sort()

Question:

Which of the following is not a component of the Spark Ecosystem?

1.Sqoop

2.GraphX

3.MLlib

4.BlinkDB

Question:

Which of the following is NOT an actions

1.foreach()

2.printSchema()

3.first()

4.reduce()

Question:

Which of the following is not the feature of Spark?

1.Supports in-memory computation

2.Fault-tolerance

3. It is cost-efficient

4.Compatible with other file storage system

Question:

Which of the following is the reason for Spark being Speedy than MapReduce?

1.DAG execution engine and in-memory computation

2.Support for different language APIs like Scala, Java, Python and R

3.RDDs are immutable and fault-tolerant

4.none of the above

Question:

Which of the following is true for RDD?

1.We can operate Spark RDDs in parallel with a low-level API

2.RDDs are similar to the table in a relational database

3.It allows processing of a large amount of structured data

4.It has built-in optimization engine

Question:

Which of the following is true for Spark core?

1.It is the kernel of Spark

2.It enables users to run SQL / HQL queries on the top of Spark.

3. It is the scalable machine learning library which delivers efficiencies

4. Improves the performance of iterative algorithm drastically.

Question:

Which of the following is true for Spark MLlib?

1. Provides an execution platform for all the Spark applications

2. It is the scalable machine learning library which delivers efficiencies

3.enables powerful interactive and data analytics application across live streaming data

4.All of the above

Question:

Which of the following language is not supported by Spark?

1. Java

2.Pascal

3.Scala

4.Python

Question:

Which of the following statements are NOT true for broadcast variables ?

1.Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

2.A custom broadcast class can be defined by extending org.apache.spark.utilbroadcastV2 in Java or Scala or pyspark.Accumulatorparams in Python. â€“> CORRECT

3. It is a way of updating a value inside a variety of transformations and propagating that value to the driver node in an efficient and fault-tolerant way.â€“> CORRECT

4. It provides a mutable variable that Spark cluster can safely update on a per-row basis. â€“> CORRECT

Question:

Which one of the following command triggers an eager evaluation?

1.df.filter()

2.df.select()

3.df.show()

4.df.limit()

Question:

Which one of the following commands does NOT trigger an eager evaluation?

1.df.collect()

2.df.take()

3.df.show()

4.df.join() â€“> CORRECT

Question:

Which one of the following is a Databrick concept?

1.Workspace

2.Authentication and authorization

3.Data Management

4.All of the above

Question:

Which one of the following is a set of components that run on clusters of Azure Databricks?

1.DataBricks File System

2.Databricks Runtime

3.CosmosDB

4.Azure Data Lake

Question:

Which one of the following is incorrect regarding Workspace of Azure Databricks concept?

1.It manages ETL operations of data

2.It can store notebooks, libraries and dashboards

3.It is the root folder of Azure Databricks

4.none of the above

Question:

Which one of the following is not a operations that can be performed using Azure Databricks?

1. It is Apache Spark based analytics platform

2. It helps to extract, transform and load the data

3.Visualization if data is not possible with it

4.All of the above

Question:

____ is a distributed machine learning framework on top of Spark.

1.MLlib

2.Spark Streaming

3.GraphX

4.RDDs

Question:

______ is a component on top of Spark Core.

1.Spark Streaming

2.Spark SQL

3. RDDs

4.All of the Mentioned

Question:

_______ leverages Spark Core fast scheduling capability to perform streaming analytics.

1.MLlib

2.Spark Streaming

3.GraphX

4.RDDs

Posted on by Online Exam TestTeam

More MCQS

Search

Olete Team

Online Exam TestTop Tutorials are Core Java,Hibernate ,Spring,Sturts.The content on Online Exam Testwebsite is done by expert team not only with the help of books but along with the strong professional knowledge in all context like coding,designing, marketing,etc!

Databricks/Databricks Mcq Question Set 1 Sample Test,Sample questions

Question: Spark is packaged with higher level libraries, including support for _________ queries.

Question: Authentication and authorization in databricks can be managed for :

Question: Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

Question: broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered

Question: Choose the correct option with respect to ETL operations of data in Azure Databricks?

Question: Fault Tolerance in RDD is achieved using

Question: For Multiclass classification problem which algorithm is not the solution?

Question: Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.

Question: Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:

Question: Given a dataframe df, select the code that returns its number of rows:

Question: Is it possible to mitigate stragglers in RDD?

Question: RDD is fault-tolerant and immutable

Question: Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

Question: Spark is developed in which language

Question: Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.

Question: Spark powers a stack of high-level tools including Spark SQL, MLlib for _____

Question: Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

Question: Spark was initially started by ______ at UC Berkeley AMPLab in 2009.

Question: Streaming data can be captured by?

Question: The read operation on RDD is

Question: The shortcomings of Hadoop MapReduce was overcome by Spark RDD by

Question: The write operation on RDD is

Question: To which one of the following sources do Azure Databricks connect for collecting streaming data?

Question: Users can easily run Spark on top of Amazonâ€™s _____

Question: What is action in Spark RDD?

Question: Which of the following algorithm is not present in MLlib?

Question: Which of the following Azure datasources can be connected to Azure Databricks?

Question: Which of the following can be used to launch Spark jobs inside MapReduce?

Question: Which of the following ensures data reliability even after termination of cluster in Azure Databricks?

Question: Which of the following is a tool of Machine Learning Library?

Question: Which of the following is a transformation?

Question: Which of the following is an actions

Question: Which of the following is not a component of the Spark Ecosystem?

Question: Which of the following is NOT an actions

Question: Which of the following is not the feature of Spark?

Question: Which of the following is the reason for Spark being Speedy than MapReduce?

Question: Which of the following is true for RDD?

Question: Which of the following is true for Spark core?

Question: Which of the following is true for Spark MLlib?

Question: Which of the following language is not supported by Spark?

Question: Which of the following statements are NOT true for broadcast variables ?

Question: Which one of the following command triggers an eager evaluation?

Question: Which one of the following commands does NOT trigger an eager evaluation?

Question: Which one of the following is a Databrick concept?

Question: Which one of the following is a set of components that run on clusters of Azure Databricks?

Question: Which one of the following is incorrect regarding Workspace of Azure Databricks concept?

Question: Which one of the following is not a operations that can be performed using Azure Databricks?

Question: ____ is a distributed machine learning framework on top of Spark.

Question: ______ is a component on top of Spark Core.

Question: _______ leverages Spark Core fast scheduling capability to perform streaming analytics.

More MCQS

Search

Olete Team

Question:
Spark is packaged with higher level libraries, including support for _________ queries.

Question:
Authentication and authorization in databricks can be managed for :

Question:
Broadcast variables are shared, immutable variables that are cached on every machine in the cluster instead of being serialized with every single task.

Question:
broadcast variables are ______ and lazily replicated across all nodes in the cluster when an action is triggered

Question:
Choose the correct option with respect to ETL operations of data in Azure Databricks?

Question:
Fault Tolerance in RDD is achieved using

Question:
For Multiclass classification problem which algorithm is not the solution?

Question:
Given a DataFrame df that has some null values in the column created_date, find the code below such that it will sort rows in ascending order based on the column creted_date with null values appearing last.

Question:
Given a DataFrame df that includes a number of columns among which a column named quantity and a column named price, complete the code below such that it will create a DataFrame including all the original columns and a new column revenue defined as quantity*price:

Question:
Given a dataframe df, select the code that returns its number of rows:

Question:
Is it possible to mitigate stragglers in RDD?

Question:
RDD is fault-tolerant and immutable

Question:
Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

Question:
Spark is developed in which language

Question:
Spark is engineered from the bottom-up for performance, running ______ faster than Hadoop by exploiting in memory computing and other optimizations.

Question:
Spark powers a stack of high-level tools including Spark SQL, MLlib for _____

Question:
Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

Question:
Spark was initially started by ______ at UC Berkeley AMPLab in 2009.

Question:
Streaming data can be captured by?

Question:
The read operation on RDD is

Question:
The shortcomings of Hadoop MapReduce was overcome by Spark RDD by

Question:
The write operation on RDD is

Question:
To which one of the following sources do Azure Databricks connect for collecting streaming data?

Question:
Users can easily run Spark on top of Amazonâ€™s _____

Question:
What is action in Spark RDD?

Question:
Which of the following algorithm is not present in MLlib?

Question:
Which of the following Azure datasources can be connected to Azure Databricks?

Question:
Which of the following can be used to launch Spark jobs inside MapReduce?

Question:
Which of the following ensures data reliability even after termination of cluster in Azure Databricks?

Question:
Which of the following is a tool of Machine Learning Library?

Question:
Which of the following is a transformation?

Question:
Which of the following is an actions

Question:
Which of the following is not a component of the Spark Ecosystem?

Question:
Which of the following is NOT an actions

Question:
Which of the following is not the feature of Spark?

Question:
Which of the following is the reason for Spark being Speedy than MapReduce?

Question:
Which of the following is true for RDD?

Question:
Which of the following is true for Spark core?

Question:
Which of the following is true for Spark MLlib?

Question:
Which of the following language is not supported by Spark?

Question:
Which of the following statements are NOT true for broadcast variables ?

Question:
Which one of the following command triggers an eager evaluation?

Question:
Which one of the following commands does NOT trigger an eager evaluation?

Question:
Which one of the following is a Databrick concept?

Question:
Which one of the following is a set of components that run on clusters of Azure Databricks?

Question:
Which one of the following is incorrect regarding Workspace of Azure Databricks concept?

Question:
Which one of the following is not a operations that can be performed using Azure Databricks?

Question:
____ is a distributed machine learning framework on top of Spark.

Question:
______ is a component on top of Spark Core.

Question:
_______ leverages Spark Core fast scheduling capability to perform streaming analytics.