Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Questions 4

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.

Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

Options:

The pvfunc model can be used to deploy models in a parallelizable fashion

The same preprocessing logic will automatically be applied when calling fit

The same preprocessing logic will automatically be applied when calling predict

This approach has no impact when loading the logged Pvfunc model for downstream deployment

There is no longer a need for pipeline-like machine learning objects

Buy Now

Questions 5

A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.

Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?

Options:

client.list_run_infos(exp_id)

spark.read.format("delta").load(exp_id)

There is no way to programmatically return row-level results from an MLflow Experiment.

mlflow.search_runs(exp_id)

spark.read.format("mlflow-experiment").load(exp_id)

Buy Now

Questions 6

Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?

Options:

All of these reasons

JS is not normalized or smoothed

None of these reasons

JS is more robust when working with large datasets

JS does not require any manual threshold or cutoff determinations

Buy Now

Questions 7

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

Databricks-Machine-Learning-Professional Question 7

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Options:

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Buy Now

Questions 8

A data scientist has developed a scikit-learn modelsklearn_modeland they want to log the model using MLflow.

They write the following incomplete code block:

Databricks-Machine-Learning-Professional Question 8

Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?

Options:

mlflow.spark.track_model(sklearn_model, "model")

mlflow.sklearn.log_model(sklearn_model, "model")

mlflow.spark.log_model(sklearn_model, "model")

mlflow.sklearn.load_model("model")

mlflow.sklearn.track_model(sklearn_model, "model")

Buy Now

Questions 9

Which of the following machine learning model deployment paradigms is the most common for machine learning projects?

Options:

On-device

Streaming

Real-time

Batch

None of these deployments

Buy Now

Questions 10

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:

1. Deploy a model to production and compute predicted values

2. Obtain the observed (actual) label values

3. _____

4. Run a statistical test to determine if there are changes over time

Which of the following should be completed as Step #3?

Options:

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

Buy Now

Questions 11

A machine learning engineer needs to deliver predictions of a machine learning model in real-time. However, the feature values needed for computing the predictions are available one week before the query time.

Which of the following is a benefit of using a batch serving deployment in this scenario rather than a real-time serving deployment where predictions are computed at query time?

Options:

Batch servinghas built-in capabilities in Databricks Machine Learning

There is no advantage to using batch serving deployments over real-time serving deployments

Computing predictions in real-time provides more up-to-date results

Testing is not possible in real-time serving deployments

Querying stored predictions can be faster than computing predictions in real-time

Buy Now

Questions 12

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

Options:

Databricks-Machine-Learning-Professional Question 12 Option 1

mlflow.log_data(importance_path, "feature-importance.csv")

mlflow.log_artifact(importance_path, "feature-importance.csv")

None of these code blocks tan accomplish the task.

Buy Now

Questions 13

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

Options:

MLflow APIs

AutoML APIs

MLflow Client

Jobs cannot be created programmatically

Databricks REST APIs

Buy Now

Questions 14

A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.

Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?

Options:

df = fs.get_missing_features(spark_df, model_uri)

fs.score_model(model_uri, df)

fs.score_model(model_uri, spark_df)

df = fs.get_missing_features(spark_df, model_uri)

fs.score_batch(model_uri, df)

df = fs.get_missing_features(spark_df)

fs.score_batch(model_uri, df)

fs.score_batch(model_uri, spark_df)

Buy Now

Answer:

Explanation:

To compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id, you can use the following code block:

Python

# Get the missing features from the Feature Store using the model URI and the customer_id column

df = fs.get_missing_features(spark_df, model_uri, lookup_key="customer_id")

# Score the DataFrame using the model URI and the Feature Store Client

fs.score_batch(model_uri, df)

AI-generated code. Review and use carefully. More info on FAQ.

The fs.get_missing_features method takes a Spark DataFrame, a model URI, and a lookup key as arguments. It returns a new Spark DataFrame that contains the originalcolumns plus the missing features that are required by the model. The missing features are retrieved from the Feature Store by joining the DataFrame with the feature tables using the lookup key. The lookup key must match the primary key of the feature tables. The model URI must point to a registered model that was trained using features from the Feature Store1.

The fs.score_batch method takes a model URI and a Spark DataFrame as arguments. It applies the model to the DataFrame and returns a new Spark DataFrame that contains the original columns plus a prediction column. The model URI must point to a registered model that was trained using features from the Feature Store2.

The other options are incorrect because:

Option A: fs.score_model is not a valid method name, as it is missing an underscore. The correct method name is fs.score_batch2.
Option B: fs.score_model without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option D: fs.score_batch without getting the missing features will not work, as the model expects the DataFrame to have all the features that were used for training. The correct way is to use fs.get_missing_features before fs.score_batch12.
Option E: fs.score_batch without specifying the lookup key will not work, as the fs.get_missing_features method requires a lookup key to join the DataFrame with the feature tables. The correct way is to use fs.get_missing_features with the lookup key “customer_id” before fs.score_batch12. References: Get missing features, Score batch

Questions 15

A machine learning engineer is using the following code block as part of a batch deployment pipeline:

Databricks-Machine-Learning-Professional Question 15

Which of the following changes needs to be made so this code block will work when theinferencetable is a stream source?

Options:

Replace "inference" with the path to the location of the Delta table

Replace schema(schema) with option("maxFilesPerTriqqer", 1}

Replace spark.read with spark.readStream

Replace formatfdelta") with format("stream")

Replace predict with a stream-friendly prediction function

Buy Now

Answer:

Explanation:

To read data from a stream source, such as Kafka, socket, or rate, the spark.readStream method should be used instead of spark.read. The spark.readStream method returns a streaming DataFrame that represents the unbounded input data stream. The spark.readStream method supports the same options and formats as the spark.read method, such as schema, delta, csv, json, etc. The spark.readStream method can also read from a Delta table as a stream source, by specifying the format("delta") and the path or table name of the Delta table123

The other options are incorrect because:

A. Replacing “inference” with the path to the location of the Delta table does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the path or table name of the Delta table can be specified as an option or argument.
B. Replacing schema(schema) with option("maxFilesPerTrigger", 1) does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the schema can be specified as an option or argument. The option("maxFilesPerTrigger", 1) is an optional configuration that limits the number of files processed in each trigger for file-based stream sources, such as delta, csv, json, etc. It does not affect the reading of data from a stream source4
D. Replacing format("delta") with format("stream") does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the format can be specified as an option or argument. The format("stream") is not a valid format for reading data from a stream source. The supported formats are delta, kafka, socket, rate, etc1
E. Replacing predict with a stream-friendly prediction function does not change the fact that spark.read is used to read from a stream source, which is not supported. The spark.readStream method should be used instead, and the prediction function can be applied to the streaming DataFrame as usual. The predict function does not need to be changed, as long as it can accept a streaming DataFrame as input and return a column of predictions as output5

References:

Input Sources - Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Structured Streaming + Delta Lake - Databricks
Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Configuration - Structured Streaming Programming Guide - Spark 3.2.0 Documentation
Machine Learning with Structured Streaming - Databricks

Questions 16

Which of the following describes concept drift?

Options:

Concept drift is when there is a change in the distribution of an input variable

Concept drift is when there is a change in the distribution of a target variable

Concept drift is when there is a change in the relationship between input variables and target variables

Concept drift is when there is a change in the distribution of the predicted target given by the model

None of these describe Concept drift

Buy Now

Questions 17

Which of the following describes label drift?

Options:

Label drift is when there is a change in the distribution of the predicted target given by the model

None of these describe label drift

Label drift is when there is a change in the distribution of an input variable

Label drift is when there is a change in the relationship between input variables and target variables

Label drift is when there is a change in the distribution of a target variable

Buy Now

Questions 18

Which of the following is a simple statistic to monitor for categorical feature drift?

Options:

Mode

None of these

Mode, number of unique values, and percentage of missing values

Percentage of missing values

Number of unique values

Buy Now

Exam Code: Databricks-Machine-Learning-Professional

Exam Name: Databricks Certified Machine Learning Professional

Last Update: Jul 19, 2026

Questions: 60

PDF + Testing Engine

$134.99

Testing Engine

$99.99

PDF (Q&A)

$84.99

buy now Databricks-Machine-Learning-Professional pdf

Weekend Certification Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Dumpsbuddy logo

Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

PDF + Testing Engine

Testing Engine

PDF (Q&A)

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept

DumpsBuddy. All Rights Reserved