Labour Day Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: netbudy65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Questions 4

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

A.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", StringType()),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

B.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType),

3. StructField("attributes", ArrayType(StringType)),

4. StructField("supplier", StringType)])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C.

1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)

D.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType(StringType())),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E.

1.itemsDfSchema = StructType([

2. StructField("itemId", IntegerType()),

3. StructField("attributes", ArrayType([StringType()])),

4. StructField("supplier", StringType())])

5.

6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Buy Now
Questions 5

Which of the following code blocks reads all CSV files in directory filePath into a single DataFrame, with column names defined in the CSV file headers?

Content of directory filePath:

1._SUCCESS

2._committed_2754546451699747124

3._started_2754546451699747124

4.part-00000-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-298-1-c000.csv.gz

5.part-00001-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-299-1-c000.csv.gz

6.part-00002-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-300-1-c000.csv.gz

7.part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1-c000.csv.gz

spark.option("header",True).csv(filePath)

Options:

A.

spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)

B.

spark.read().option("header",True).load(filePath)

C.

spark.read.format("csv").option("header",True).load(filePath)

D.

spark.read.load(filePath)

Buy Now
Questions 6

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

Options:

A.

transactionsDf.sort(asc(value)).show(10)

B.

transactionsDf.sort(col("value")).show(10)

C.

transactionsDf.sort(col("value").desc()).head()

D.

transactionsDf.sort(col("value").asc()).print(10)

E.

transactionsDf.orderBy("value").asc().show(10)

Buy Now
Questions 7

The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to

30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__((__2__.__3__) __4__ (__5__))

Options:

A.

1. select

2. col("storeId")

3. between(20, 30)

4. and

5. col("productId")==2

B.

1. where

2. col("storeId")

3. geq(20).leq(30)

4. &

5. col("productId")==2

C.

1. select

2. "storeId"

3. between(20, 30)

4. &&

5. col("productId")==2

D.

1. select

2. col("storeId")

3. between(20, 30)

4. &&

5. col("productId")=2

E.

1. select

2. col("storeId")

3. between(20, 30)

4. &

5. col("productId")==2

Buy Now
Questions 8

Which of the following describes the conversion of a computational query into an execution plan in Spark?

Options:

A.

Spark uses the catalog to resolve the optimized logical plan.

B.

The catalog assigns specific resources to the optimized memory plan.

C.

The executed physical plan depends on a cost optimization from a previous stage.

D.

Depending on whether DataFrame API or SQL API are used, the physical plan may differ.

E.

The catalog assigns specific resources to the physical plan.

Buy Now
Questions 9

The code block shown below should return a copy of DataFrame transactionsDf with an added column cos. This column should have the values in column value converted to degrees and having

the cosine of those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))

Options:

A.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. transactionsDf.value

B.

1. withColumnRenamed

2. "cos"

3. cos

4. degrees

5. "transactionsDf.value"

C.

1. withColumn

2. "cos"

3. cos

4. degrees

5. transactionsDf.value

D.

1. withColumn

2. col("cos")

3. cos

4. degrees

5. col("value")

E

. 1. withColumn

2. "cos"

3. degrees

4. cos

5. col("value")

Buy Now
Questions 10

Which of the following describes characteristics of the Spark driver?

Options:

A.

The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

B.

If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

C.

The Spark driver processes partitions in an optimized, distributed fashion.

D.

In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

E.

The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

Buy Now
Questions 11

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier

whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__)

Options:

A.

1. filter

2. col("supplier").isin("Sports")

3. "itemName"

4. explode(col("attributes"))

B.

1. where

2. col("supplier").contains("Sports")

3. "itemName"

4. "attributes"

C.

1. where

2. col(supplier).contains("Sports")

3. explode(attributes)

4. itemName

D.

1. where

2. "Sports".isin(col("Supplier"))

3. "itemName"

4. array_explode("attributes")

E.

1. filter

2. col("supplier").contains("Sports")

3. "itemName"

4. explode("attributes")

Buy Now
Questions 12

The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.

Code block:

transactionsDf.join(itemsDf, "itemId", how="broadcast")

Options:

A.

The syntax is wrong, how= should be removed from the code block.

B.

The join method should be replaced by the broadcast method.

C.

Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.

D.

The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.

E.

broadcast is not a valid join type.

Buy Now
Questions 13

Which of the following describes slots?

Options:

A.

Slots are dynamically created and destroyed in accordance with an executor's workload.

B.

To optimize I/O performance, Spark stores data on disk in multiple slots.

C.

A Java Virtual Machine (JVM) working as an executor can be considered as a pool of slots for task execution.

D.

A slot is always limited to a single core.

Slots are the communication interface for executors and are used for receiving commands and sending results to the driver.

Buy Now
Questions 14

Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?

Schema of first partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

Schema of second partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- rollId: integer (nullable = true)

7. |-- f: integer (nullable = true)

8. |-- tax_id: integer (nullable = false)

Options:

A.

spark.read.parquet(filePath, mergeSchema='y')

B.

spark.read.option("mergeSchema", "true").parquet(filePath)

C.

spark.read.parquet(filePath)

D.

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.union(df_temp)

10. nx = nx+1

11.df

E.

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.join(df_temp, how="outer")

10. nx = nx+1

11.df

Buy Now
Questions 15

The code block displayed below contains an error. The code block should return a DataFrame in which column predErrorAdded contains the results of Python function add_2_if_geq_3 as applied to

numeric and nullable column predError in DataFrame transactionsDf. Find the error.

Code block:

1.def add_2_if_geq_3(x):

2. if x is None:

3. return x

4. elif x >= 3:

5. return x+2

6. return x

7.

8.add_2_if_geq_3_udf = udf(add_2_if_geq_3)

9.

10.transactionsDf.withColumnRenamed("predErrorAdded", add_2_if_geq_3_udf(col("predError")))

Options:

A.

The operator used to adding the column does not add column predErrorAdded to the DataFrame.

B.

Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so transactionsDf.predError.

C.

The udf() method does not declare a return type.

D.

UDFs are only available through the SQL API, but not in the Python API as shown in the code block.

E.

The Python function is unable to handle null values, resulting in the code block crashing on execution.

Buy Now
Questions 16

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the

code block to accomplish this.

transactionsDf.__1__(__2__)

Options:

A.

1. filter

2. "transactionId", "predError", "value", "f"

B.

1. select

2. "transactionId, predError, value, f"

C.

1. select

2. ["transactionId", "predError", "value", "f"]

D.

1. where

2. col("transactionId"), col("predError"), col("value"), col("f")

E.

1. select

2. col(["transactionId", "predError", "value", "f"])

Buy Now
Questions 17

Which of the following describes characteristics of the Spark UI?

Options:

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Buy Now
Questions 18

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

Options:

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Buy Now
Questions 19

Which of the following DataFrame operators is never classified as a wide transformation?

Options:

A.

DataFrame.sort()

B.

DataFrame.aggregate()

C.

DataFrame.repartition()

D.

DataFrame.select()

E.

DataFrame.join()

Buy Now
Questions 20

The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block

to accomplish this.

Example of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+------------------------------------------+

2.|itemId|itemName |supplier |itemNameElements |

3.+------+----------------------------------+-------------------+------------------------------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]|

5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] |

6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] |

7.+------+----------------------------------+-------------------+------------------------------------------+

Code block:

itemsDf.__1__(__2__(__3__)__4__)

Options:

A.

1. select

2. count

3. col("itemNameElements")

4. >3

B.

1. filter

2. count

3. itemNameElements

4. >=3

C.

1. select

2. count

3. "itemNameElements"

4. >3

D.

1. filter

2. size

3. "itemNameElements"

4. >=3

(Correct)

E.

1. select

2. size

3. "itemNameElements"

4. >3

Buy Now
Questions 21

The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to

transactionNumber. Find the error.

Code block:

transactionsDf.withColumn("transactionNumber", "transactionId")

Options:

A.

The arguments to the withColumn method need to be reordered.

B.

The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.

C.

The copy() operator should be appended to the code block to ensure a copy is returned.

D.

Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.

E.

The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Buy Now
Questions 22

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

Options:

A.

transactionsDf.groupBy(col(storeId).avg())

B.

transactionsDf.groupBy("storeId").avg(col("value"))

C.

transactionsDf.groupBy("storeId").agg(avg("value"))

D.

transactionsDf.groupBy("storeId").agg(average("value"))

E.

transactionsDf.groupBy("value").average()

Buy Now
Questions 23

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

Options:

A.

Caching is not supported in Spark, data are always recomputed.

B.

Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

C.

The storage level is inappropriate for fault-tolerant storage.

D.

The code block uses the wrong operator for caching.

E.

The DataFrameWriter needs to be invoked.

Buy Now
Questions 24

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+----------------+

2.|transactionId|predError|value|storeId|productId| f| transactionDate|

3.+-------------+---------+-----+-------+---------+----+----------------+

4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|

7.+-------------+---------+-----+-------+---------+----+----------------+

Code block:

1.transactionsDf = transactionsDf.drop("transactionDate")

2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

Options:

A.

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

B.

Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column

transactionDate should be wrapped in a col() operator.

C.

Column transactionDate should be wrapped in a col() operator.

D.

The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column

transactionDate with the new column transactionTimestamp.

E.

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment.

Buy Now
Questions 25

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

Options:

A.

1, 10

B.

1, 8

C.

10

D.

7, 9, 10

E.

1, 4, 6, 9

Buy Now
Questions 26

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

Options:

A.

1.counter = 0

2.

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

6.

7.print(counter)

B.

1.counter = 0

2.

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

6.

7.itemsDf.foreach(count)

8.print(counter)

C.

print(itemsDf.foreach(lambda x: 'Inc.' in x))

D.

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

E.

1.accum=sc.accumulator(0)

2.

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

6.

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Buy Now
Questions 27

The code block displayed below contains an error. The code block should save DataFrame transactionsDf at path path as a parquet file, appending to any existing parquet file. Find the error.

Code block:

Options:

A.

transactionsDf.format("parquet").option("mode", "append").save(path)

B.

The code block is missing a reference to the DataFrameWriter.

C.

save() is evaluated lazily and needs to be followed by an action.

D.

The mode option should be omitted so that the command uses the default mode.

E.

The code block is missing a bucketBy command that takes care of partitions.

F.

Given that the DataFrame should be saved as parquet file, path is being passed to the wrong method.

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Last Update: May 4, 2024
Questions: 180

PDF + Testing Engine

$130

Testing Engine

$95

PDF (Q&A)

$80