Weekend Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: netbudy65

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

GRANT USAGE ON DATABASE customers TO team;

B.

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

Buy Now
Questions 5

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Options:

A.

When another task needs to be replaced by the new task

B.

When another task needs to fail before the new task begins

C.

When another task has the same dependency libraries as the new task

D.

When another task needs to use as little compute resources as possible

E.

When another task needs to successfully complete before the new task begins

Buy Now
Questions 6

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.

A key-value pair configuration

B.

The preferred DBU/hour cost

C.

A path to cloud storage location for the written data

D.

A location of a target database for the written data

E.

At least one notebook library to be executed

Buy Now
Questions 7

A data engineer wants to create a new table containing the names of customers who live in France.

They have written the following command:

CREATE TABLE customersInFrance

_____ AS

SELECT id,

firstName,

lastName

FROM customerLocations

WHERE country = ’FRANCE’;

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (Pll).

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

COMMENT "Contains PIT

B.

511

C.

"COMMENT PII"

D.

TBLPROPERTIES PII

Buy Now
Questions 8

An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left running and cost the organization a lot of money beyond the first week of the project’s release.

Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the project’s release?

Options:

A.

They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.

B.

They can set the query’s refresh schedule to end after a certain number of refreshes.

C.

They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.

D.

They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.

E.

They can set the query’s refresh schedule to end on a certain date in the query scheduler.

Buy Now
Questions 9

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:

A.

Jobs

B.

Dashboards

C.

Catalog Explorer

D.

Repos

Buy Now
Questions 10

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

Buy Now
Questions 11

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Databricks-Certified-Data-Engineer-Associate Question 11

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

trigger("5 seconds")

B.

trigger()

C.

trigger(once="5 seconds")

D.

trigger(processingTime="5 seconds")

E.

trigger(continuous="5 seconds")

Buy Now
Questions 12

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

B.

They can set up the dashboard’s SQL endpoint to be serverless.

C.

They can turn on the Auto Stop feature for the SQL endpoint.

D.

They can reduce the cluster size of the SQL endpoint.

E.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Buy Now
Questions 13

Which of the following data workloads will utilize a Gold table as its source?

Options:

A.

A job that enriches data by parsing its timestamps into a human-readable format

B.

A job that aggregates uncleaned data to create standard summary statistics

C.

A job that cleans data by removing malformatted records

D.

A job that queries aggregated data designed to feed into a dashboard

E.

A job that ingests raw data from a streaming source into the Lakehouse

Buy Now
Questions 14

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

Options:

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Buy Now
Questions 15

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Buy Now
Questions 16

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:

A.

Manually programming in an alert system in each cell of the Notebook

B.

Setting up an Alert in the Job page

C.

Setting up an Alert in the Notebook

D.

There is no way to notify the Job owner in the case of Job failure

E.

MLflow Model Registry Webhooks

Buy Now
Questions 17

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Buy Now
Questions 18

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

Options:

A.

Databricks Filesystem

B.

Jobs

C.

Dashboards

D.

Repos

E.

Data Explorer

Buy Now
Questions 19

In which of the following file formats is data from Delta Lake tables primarily stored?

Options:

A.

Delta

B.

CSV

C.

Parquet

D.

JSON

E.

A proprietary, optimized format specific to Databricks

Buy Now
Questions 20

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

Options:

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.

They can turn on the Auto Stop feature for the SQL endpoint.

C.

They can increase the cluster size of the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can increase the maximum bound of the SQL endpoint's scaling range

Buy Now
Questions 21

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

O They can reduce the cluster size of the SQL endpoint.

B.

Q They can turn on the Auto Stop feature for the SQL endpoint.

C.

O They can set up the dashboard's SQL endpoint to be serverless.

D.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

Buy Now
Questions 22

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

Options:

A.

The table’s data was larger than 10 GB

B.

The table’s data was smaller than 10 GB

C.

The table was external

D.

The table did not have a location

E.

The table was managed

Buy Now
Questions 23

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Buy Now
Questions 24

A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their always-on SQL endpoint. They claim that this issue is present when many members of the team are running small queries simultaneously.They ask the data engineering team for help. The data engineering team notices that each of the team’s queries uses the same SQL endpoint.

Which of the following approaches can the data engineering team use to improve the latency of the team’s queries?

Options:

A.

They can increase the cluster size of the SQL endpoint.

B.

They can increase the maximum bound of the SQL endpoint’s scaling range.

C.

They can turn on the Auto Stop feature for the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to “Reliability Optimized.”

Buy Now
Questions 25

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:

A.

TRANSFORM

B.

PIVOT

C.

SUM

D.

CONVERT

Buy Now
Questions 26

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

Databricks-Certified-Data-Engineer-Associate Question 26

B)

Databricks-Certified-Data-Engineer-Associate Question 26

C)

Databricks-Certified-Data-Engineer-Associate Question 26

D)

Databricks-Certified-Data-Engineer-Associate Question 26

E)

Databricks-Certified-Data-Engineer-Associate Question 26

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 27

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.

B.

A data lakehouse supports ACID-compliant transactions.

C.

A data lakehouse allows the use of SQL queries to examine data.

D.

A data lakehouse stores data in open formats.

E.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now
Questions 28

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have beenmade and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.

Which of the following Git operations does the data engineer need to run to accomplish this task?

Options:

A.

Merge

B.

Push

C.

Pull

D.

Commit

E.

Clone

Buy Now
Questions 29

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

Options:

A.

CONSTRAINT valid_location EXPECT (location = NULL)

B.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Buy Now
Questions 30

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

grant all privileges on table sales TO team;

B.

GRANT SELECT ON TABLE sales TO team;

C.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

D.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now
Questions 31

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.

Review the Permissions tab in the table's page in Data Explorer

B.

All of these options can be used to identify the owner of the table

C.

Review the Owner field in the table's page in Data Explorer

D.

Review the Owner field in the table's page in the cloud storage solution

E.

There is no way to identify the owner of the table

Buy Now
Questions 32

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

Options:

A.

dbfs:/user/hive/database/customer360

B.

dbfs:/user/hive/warehouse

C.

dbfs:/user/hive/customer360

D.

More information is needed to determine the correct response

Buy Now
Exam Name: Databricks Certified Data Engineer Associate Exam
Last Update: Oct 16, 2025
Questions: 108

PDF + Testing Engine

$134.99

Testing Engine

$99.99

PDF (Q&A)

$84.99