Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Questions 4

Which of the following machine learning algorithms typically uses bagging?

Options:

Gradient boosted trees

K-means

Random forest

Linear regression

Decision tree

Buy Now

Questions 5

A data scientist is using MLflow to track their machine learning experiment. As a part of each of their MLflow runs, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values. All parent and child runs are being manually started with mlflow.start_run.

Which of the following approaches can the data scientist use to accomplish this MLflow run organization?

Options:

Theycan turn on Databricks Autologging

Theycan specify nested=True when startingthe child run for each unique combination of hyperparameter values

Theycan start each child run inside the parentrun's indented code block usingmlflow.start runO

They can start each child run with the same experiment ID as the parent run

They can specify nested=True when starting the parent run for the tuningprocess

Buy Now

Questions 6

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.

Which of the following describes why?

Options:

Gradient boosting is not a linear algebra-based algorithm which is required for parallelization

Gradient boosting requires access to all data at once which cannot happen during parallelization.

Gradient boosting calculates gradients in evaluation metrics using all cores which prevents parallelization.

Gradient boosting is an iterative algorithm that requires information from the previous iteration to perform the next step.

Buy Now

Answer:

Explanation:

Gradient boosting is fundamentally an iterative algorithm where each new tree is built based on the errors of the previous ones. This sequential dependency makes it difficult to parallelize the training of trees in gradient boosting, as each step relies on the results from the preceding step. Parallelization in this context would undermine the core methodology of the algorithm, which depends on sequentially improving the model'sperformance with each iteration.References:

Machine Learning Algorithms (Challenges with Parallelizing Gradient Boosting).

Gradient boosting is an ensemble learning technique that builds models in a sequential manner. Each new model corrects the errors made by the previous ones. This sequential dependency means that each iteration requires the results of the previous iteration to make corrections. Here is a step-by-step explanation of why this makes parallelization challenging:

Sequential Nature: Gradient boosting builds one tree at a time. Each tree is trained to correct the residual errors of the previous trees. This requires the model to complete one iteration before starting the next.
Dependence on Previous Iterations: The gradient calculation at each step depends on the predictions made by the previous models. Therefore, the model must wait until the previous tree has been fully trained and evaluated before starting to train the next tree.
Difficulty in Parallelization: Because of this dependency, it is challenging to parallelize the training process. Unlike algorithms that process data independently in each step (e.g., random forests), gradient boosting cannot easily distribute the work across multiple processors or cores for simultaneous execution.

This iterative and dependent nature of the gradient boosting process makes it difficult to parallelize effectively.

References

Gradient Boosting Machine Learning Algorithm
Understanding Gradient Boosting Machines

Questions 7

A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature.

Which of the following lines of code can the data scientist run to accomplish the task?

Options:

spark_df.summary ()

spark_df.stats()

spark_df.describe().head()

spark_df.printSchema()

spark_df.toPandas()

Buy Now

Questions 8

A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model bycomparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.

Which of the following possible explanations for this difference is invalid?

Options:

The second model is much more accurate than the first model

The data scientist failed to exponentiate the predictions in the second model prior tocomputingthe RMSE

The datascientist failed to take the logof the predictions in the first model prior to computingthe RMSE

The first model is much more accurate than the second model

The RMSE is an invalid evaluation metric for regression problems

Buy Now

Questions 9

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Options:

Leave-one-out encoding

Target encoding

One-hot encoding

Categorical

String indexing

Buy Now

Questions 10

Which of the following tools can be used to distribute large-scale feature engineering without the use of a UDF or pandas Function API for machine learning pipelines?

Options:

Keras

Scikit-learn

PyTorch

Spark ML

Buy Now

Questions 11

A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.

Databricks-Machine-Learning-Associate Question 11

Which of the following suggestions should the team include in their guidelines?

Options:

The F1 score should be utilized over accuracy when the number of actual positive cases is identical to the number of actual negative cases.

The F1 score should be utilized over accuracy when there are greater than two classes in the target variable.

The F1 score should be utilized over accuracy when there is significant imbalance between positive and negative classes and avoiding false negatives is a priority.

The F1 score should be utilized over accuracy when identifying true positives and true negatives are equally important to the business problem.

Buy Now

Questions 12

A machine learning engineer has grown tired of needing to install the MLflow Python library on each of their clusters. They ask a senior machine learning engineer how their notebooks can load the MLflow library without installing it each time. The senior machine learning engineer suggests that they use Databricks Runtime for Machine Learning.

Which of the following approaches describes how the machine learning engineer can begin using Databricks Runtime for Machine Learning?

Options:

They can add a line enabling Databricks Runtime ML in their init script when creating their clusters.

They can check the Databricks Runtime ML box when creating their clusters.

They can select a Databricks Runtime ML version from the Databricks Runtime Version dropdown when creating their clusters.

They can set the runtime-version variable in their Spark session to “ml”.

Buy Now

Questions 13

A data scientist wants to tune a set of hyperparameters for a machine learning model. They have wrapped a Spark ML model in the objective functionobjective_functionand they have defined the search spacesearch_space.

As a result, they have the following code block:

Databricks-Machine-Learning-Associate Question 13

Which of the following changes do they need to make to the above code block in order to accomplish the task?

Options:

Change SparkTrials() to Trials()

Reduce num_evals to be less than 10

Change fmin() to fmax()

Remove the trials=trials argument

Remove the algo=tpe.suggest argument

Buy Now

Questions 14

A data scientist wants to use Spark ML to one-hot encode the categorical features in their PySpark DataFramefeatures_df. A list of the names of the string columns is assigned to theinput_columnsvariable.

They have developed this code block to accomplish this task:

Databricks-Machine-Learning-Associate Question 14

The code block is returning an error.

Which of the following adjustments does the data scientist need to make to accomplish this task?

Options:

They need to specify the method parameter to the OneHotEncoder.

They need to remove the line with the fit operation.

They need to use Stringlndexer prior to one-hot encodinq the features.

They need to useVectorAssemblerprior to one-hot encoding the features.

Buy Now

Questions 15

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library'sfminoperation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with theobjective_functionbeing passed as an argument tofmin.

They use the following code block to create theobjective_function:

Databricks-Machine-Learning-Associate Question 15

Which of the following changes does the data scientist need to make to theirobjective_functionin order to produce a more accurate model?

Options:

Add test set validation process

Add a random_state argument to the RandomForestRegressor operation

Remove the mean operation that is wrapping the cross_val_score operation

Replace the r2 return value with -r2

Replace the fmin operation with the fmax operation

Buy Now

Questions 16

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Options:

import pyspark.pandas as ps

df = ps.DataFrame(spark_df)

import pyspark.pandas as ps

df = ps.to_pandas(spark_df)

spark_df.to_pandas()

import pandas as pd

df = pd.DataFrame(spark_df)

Buy Now

Questions 17

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column price is greater than 0.

Which of the following code blocks will accomplish this task?

Options:

spark_df[spark_df["price"] > 0]

spark_df.filter(col("price") > 0)

SELECT * FROM spark_df WHERE price > 0

spark_df.loc[spark_df["price"] > 0,:]

spark_df.loc[:,spark_df["price"] > 0]

Buy Now

Questions 18

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in parallel. They elect to use the Hyperopt library to facilitate this process.

Which of the following Hyperopt tools provides the ability to optimize hyperparameters in parallel?

Options:

fmin

SparkTrials

quniform

search_space

objective_function

Buy Now

Questions 19

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Options:

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

pandas API on Spark DataFrames are more performant than Spark DataFrames

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Buy Now

Questions 20

A machine learning engineer wants to parallelize the training of group-specific models using the Pandas Function API. They have developed thetrain_modelfunction, and they want to apply it to each group of DataFramedf.

They have written the following incomplete code block:

Databricks-Machine-Learning-Associate Question 20

Which of the following pieces of code can be used to fill in the above blank to complete the task?

Options:

applyInPandas

mapInPandas

predict

train_model

groupedApplyIn

Buy Now

Questions 21

A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?

Options:

They should exponentiate the computed RMSE value

They should take the log of the predictions before computing the RMSE

They should evaluate the MSE of the log predictions to compute the RMSE

They should exponentiate the predictions before computing the RMSE

Buy Now

Questions 22

A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model.

Which of the following classification metrics should be used to evaluate the model?

Options:

RMSE

Precision

Area under the residual operating curve

Accuracy

Recall

Buy Now

Exam Code: Databricks-Machine-Learning-Associate

Exam Name: Databricks Certified Machine Learning Associate Exam

Last Update: Jul 19, 2026

Questions: 74

PDF + Testing Engine

$134.99

Testing Engine

$99.99

PDF (Q&A)

$84.99

buy now Databricks-Machine-Learning-Associate pdf

Weekend Certification Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Dumpsbuddy logo

Databricks-Machine-Learning-Associate Databricks Certified Machine Learning Associate Exam Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

PDF + Testing Engine

Testing Engine

PDF (Q&A)

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept

DumpsBuddy. All Rights Reserved