MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model ' s performance? (Select TWO.)

Options:

Increase the model complexity by increasing the number of features in the dataset.

Decrease the model complexity by reducing the number of features in the dataset.

Decrease the model complexity by reducing the number of samples in the dataset.

Increase the value of the L2 regularization parameter.

Decrease the value of the L2 regularization parameter.

Buy Now

Questions 5

A company uses Amazon SageMaker AI to create ML models. The data scientists need fine-grained control of ML workflows, DAG visualization, experiment history, and model governance for auditing and compliance.

Which solution will meet these requirements?

Options:

Use AWS CodePipeline with SageMaker Studio and SageMaker ML Lineage Tracking.

Use AWS CodePipeline with SageMaker Experiments.

Use SageMaker Pipelines with SageMaker Studio and SageMaker ML Lineage Tracking.

Use SageMaker Pipelines with SageMaker Experiments.

Buy Now

Questions 6

An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances.

What should the ML engineer do to MINIMIZE the communication overhead between the instances?

Options:

Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed.

Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed.

Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed.

Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed.

Buy Now

Questions 7

A company has developed a computer vision model. The company needs to deploy the model into production on Amazon SageMaker AI. The company has not hosted a model on SageMaker AI previously.

An ML engineer needs to implement a solution to track model versions. The solution also must provide recommendations about which Amazon EC2 instance types to use to host the model.

Which solution will meet these requirements?

Options:

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use AWS Compute Optimizer for recommendations about instance types.

Register the model in the SageMaker Model Registry. Use SageMaker Inference Recommender for recommendations about instance types.

Register the model in Amazon Elastic Container Registry (Amazon ECR). Use SageMaker Experiments for recommendations about instance types.

Buy Now

Questions 8

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

MLA-C01 Question 8

Options:

Buy Now

Questions 9

A hospital is using an ML model to validate x-ray results. The hospital runs a nightly batch inference job. The hospital needs to produce a daily report about model data quality and model performance.

Which solution will meet these requirements?

Options:

Schedule a monitoring job in Amazon SageMaker Model Monitor. Generate the monitoring results for the model and data.

Create an Amazon CloudWatch dashboard that includes the metrics for processing steps in the nightly batch inference job. Compare the baseline resource metrics. Share the dashboard link.

Use AWS Glue DataBrew to create a custom recipe job that uses the Numerical Statistics data quality check for the model file. Generate the results.

Create a SageMaker AI pipeline that includes a QualityCheck step to run monitoring jobs. Generate the monitoring results for the model and the data.

Buy Now

Questions 10

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company ' s main competitor.

Which solution will meet this requirement?

Options:

Configure the competitor ' s name as a blocked phrase in Amazon Q Business.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor ' s name.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor ' s name.

Buy Now

Questions 11

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records per second.

The company needs a scalable AWS solution to identify anomalous data points with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Ingest data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to detect anomalies.

Ingest data into Kinesis Data Streams. Deploy a SageMaker AI endpoint and use AWS Lambda to detect anomalies.

Ingest data into Apache Kafka on Amazon EC2 and use SageMaker AI for detection.

Send data to Amazon SQS and use AWS Glue ETL jobs for batch anomaly detection.

Buy Now

Questions 12

A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company ' s Amazon S3 bucket every 3-4 days.

The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Buy Now

Questions 13

A retail company is analyzing customer purchase data to develop personalized product recommendations. The company wants to use Amazon SageMaker Clarify to assess fairness metrics across different customer groups to avoid potential bias in the recommendation system.

The recommendation system needs to identify if certain customer segments are underrepresented in the training data. The company needs to choose a pre-training bias metric in SageMaker Clarify.

Which metric meets these requirements?

Options:

Prediction distribution skew

Feature attribution bias

Class imbalance ratio

Model performance gap

Buy Now

Questions 14

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Buy Now

Questions 15

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

CSV files compressed with Snappy

JSON objects in JSONL format

JSON files compressed with gzip

Apache Parquet files

Buy Now

Questions 16

A company has an ML model in Amazon SageMaker AI. An ML engineer needs to implement a monitoring solution to automatically detect changes in the input data distribution of model features.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Configure SageMaker Model Monitor. Establish a data quality baseline. Ensure that the emit_metrics option is enabled in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in specific metrics that are related to data quality.

Configure SageMaker Model Monitor. Establish a model quality baseline. Ensure that the comparison_method option is set to Robust in the baseline constraints file. Configure an Amazon CloudWatch alarm to notify the company about changes in model quality metrics.

Use SageMaker Debugger with custom rules to track shifts in feature distributions. Configure Amazon CloudWatch alarms to notify the company when the rules detect significant changes.

Use Amazon CloudWatch to directly observe the SageMaker AI endpoint ' s performance metrics. Manually analyze the CloudWatch logs for indicators of data drift or shifts in feature distribution.

Buy Now

Questions 17

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

Options:

Asynchronous inference

Batch transform

Serverless inference

Real-time inference

Buy Now

Questions 18

A company has an existing Amazon SageMaker AI model (v1) on a production endpoint. The company develops a new model version (v2) and needs to test v2 in production before substituting v2 for v1.

The company needs to minimize the risk of v2 generating incorrect output in production and must prevent any disruption of production traffic during the change.

Which solution will meet these requirements?

Options:

Create a second production variant for v2. Assign 1% of the traffic to v2 and 99% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Create a second production variant for v2. Assign 10% of the traffic to v2 and 90% to v1. Collect all output of v2 in Amazon S3. If v2 performs as expected, switch all traffic to v2.

Deploy v2 to a new endpoint. Turn on data capture for the production endpoint. Send 100% of the input data to v2.

Deploy v2 into a shadow variant that samples 100% of the inference requests. Collect all output in Amazon S3. If v2 performs as expected, promote v2 to production.

Buy Now

Questions 19

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model ' s performance?

Options:

Accuracy

Area Under the ROC Curve (AUC)

F1 score

Mean absolute error (MAE)

Buy Now

Questions 20

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Options:

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

Use the SageMaker Model Registry and model groups to catalog the models.

Use the SageMaker Model Registry and unique tags for each model version.

Buy Now

Questions 21

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

Options:

Purchase Reserved Instances with partial upfront payment.

Purchase On-Demand Instances.

Purchase SageMaker AI Savings Plans.

Purchase Spot Instances that use automated checkpoints.

Buy Now

Questions 22

A company uses a batching solution to process daily analytics. The company wants to provide near real-time updates, use open-source technology, and avoid managing or scaling infrastructure.

Which solution will meet these requirements?

Options:

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters.

Create Amazon MSK Provisioned clusters.

Create Amazon Kinesis Data Streams with Application Auto Scaling.

Create self-hosted Apache Flink applications on Amazon EC2.

Buy Now

Questions 23

An ML engineer is deploying a generative AI model-based customer support agent that uses Amazon SageMaker AI for inference. The customer support agent must respond to customer questions about topics such as shipping policies, refund processes, and account management. The generative AI model generates one token at a time.

Customers report dissatisfaction with how long the customer support agent takes to generate lengthy responses to questions. The ML engineer must apply an inference optimization technique to improve the performance of the customer support agent.

Which solution will meet this requirement?

Options:

Compilation

Speculative decoding

Quantization

Fast model loading

Buy Now

Questions 24

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

Options:

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Buy Now

Questions 25

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

Options:

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Buy Now

Questions 26

An ML engineer has trained an ML model by using Amazon SageMaker AI. The ML engineer determines that the model is overfitting and that the training data contains unnecessary features. The ML engineer must reduce the overfitting and the impact of the unnecessary features.

Which solution will meet these requirements?

Options:

Apply L1 regularization to the training data. Retrain the model.

Use SageMaker Debugger to apply L1 regularization to the running model.

Increase the number of training iterations. Retrain the model.

Decrease the number of training iterations. Retrain the model.

Buy Now

Questions 27

A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.

The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.

How should the company deploy the model into production to meet these requirements?

Options:

Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

Install SageMaker Operator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Deploy the model in Amazon EKS. Set horizontal pod auto scaling to scale replicas based on the memory metric.

Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Buy Now

Questions 28

A company has a large collection of chat recordings from customer interactions after a product release. An ML engineer needs to create an ML model to analyze the chat data. The ML engineer needs to determine the success of the product by reviewing customer sentiments about the product.

Which action should the ML engineer take to complete the evaluation in the LEAST amount of time?

Options:

Use Amazon Rekognition to analyze sentiments of the chat conversations.

Train a Naive Bayes classifier to analyze sentiments of the chat conversations.

Use Amazon Comprehend to analyze sentiments of the chat conversations.

Use random forests to classify sentiments of the chat conversations.

Buy Now

Questions 29

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.

What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Options:

Adjust the model ' s parameters and hyperparameters.

Initiate a manual Model Monitor job that uses the most recent production data.

Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

Include additional data in the existing training set for the model. Retrain and redeploy the model.

Buy Now

Questions 30

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Buy Now

Questions 31

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

MLA-C01 Question 31

Options:

Buy Now

Questions 32

A company wants to migrate ML models from an on-premises environment to Amazon SageMaker AI. The models are based on the PyTorch algorithm. The company needs to reuse its existing custom scripts as much as possible.

Which SageMaker AI feature should the company use?

Options:

SageMaker AI built-in algorithms

SageMaker Canvas

SageMaker JumpStart

SageMaker AI script mode

Buy Now

Questions 33

A company is uploading thousands of PDF policy documents into Amazon S3 and Amazon Bedrock Knowledge Bases. Each document contains structured sections. Users often search for a small section but need the full section context. The company wants accurate section-level search with automatic context retrieval and minimal custom coding.

Which chunking strategy meets these requirements?

Options:

Hierarchical

Maximum tokens

Semantic

Fixed-size

Buy Now

Questions 34

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Buy Now

Questions 35

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model ' s performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

Set up SageMaker Debugger and create a custom rule.

Set up blue/green deployments with all-at-once traffic shifting.

Set up blue/green deployments with canary traffic shifting.

Set up shadow testing with a shadow variant of the new model.

Buy Now

Questions 36

An ML engineer is using Amazon SageMaker JumpStart to fine-tune a Llama 3.2 model for text generation. The ML engineer is using an instruction-based fine-tuning method. The model uses 70 billion parameters.

Select the correct fine-tuning term from the following list to match each description. Select each term one time or not at all. (Select THREE.)

• Hyperparameter tuning

• Low-rank adaptation (LoRA)

• Fully Sharded Data Parallel (FSDP)

• Learning rate

• Int8 quantization

MLA-C01 Question 36

Options:

Buy Now

Questions 37

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Options:

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

Use AWS Glue Data Quality to monitor bias.

Use SageMaker notebooks to compare the bias.

Buy Now

Answer:

Explanation:

Monitoring bias drift in deployed machine learning models is crucial to ensure fairness and accuracy over time. Amazon SageMaker Clarify provides tools to detect bias in ML models, both during training and after deployment. To monitor bias drift for models deployed to real-time endpoints, an effective approach involves orchestrating SageMaker Clarify jobs using AWS Lambda functions.

Implementation Steps:

Set Up Data Capture:

Enable data capture on the SageMaker endpoint to record input data and model predictions. This captured data serves as the basis for bias analysis.

Develop a Lambda Function:

Create an AWS Lambda function configured to initiate a SageMaker Clarify job. This function will process the captured data to assess bias metrics.

Schedule or Trigger the Lambda Function:

Configure the Lambda function to run on-demand or at scheduled intervals using Amazon CloudWatch Events or EventBridge. This setup allows for regular bias monitoring as per the application ' s requirements.

Analyze and Respond to Results:

After each Clarify job completes, review the generated bias reports. If bias drift is detected, take appropriate actions, such as retraining the model or adjusting data preprocessing steps.

Advantages of This Approach:

Automation: Utilizing AWS Lambda for orchestrating Clarify jobs enables automated and scalable bias monitoring without manual intervention.

Cost-Effectiveness: AWS Lambda ' s serverless nature ensures that you only pay for the compute time consumed during the execution of the function, optimizing resource usage.

Flexibility: The solution can be tailored to specific monitoring needs, allowing for adjustments in monitoring frequency and analysis parameters.

By implementing this solution, the company can effectively monitor bias drift in real-time, ensuring that the AI application maintains fairness and accuracy throughout its lifecycle.

[References:, Bias drift for models in production - Amazon SageMaker, Schedule Bias Drift Monitoring Jobs - Amazon SageMaker, , , , ]

Questions 38

A company is using Amazon SageMaker to create ML models. The company ' s data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.

Which solution will meet these requirements?

Options:

Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Buy Now

Questions 39

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing each user ' s IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Buy Now

Questions 40

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the re-training job.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Buy Now

Questions 41

A company uses Amazon SageMaker for its ML workloads. The company ' s ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Options:

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

Create a SageMaker processing job by calling the SageMaker Python SDK.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Buy Now

Questions 42

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

Options:

Create code to evaluate each instance ' s memory and compute usage.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

Check AWS CloudTrail event history for the creation of the resources.

Run AWS Compute Optimizer.

Buy Now

Questions 43

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

Options:

Increased anomaly identification frequency and increased recall

Decreased anomaly identification frequency and decreased recall

Increased anomaly identification frequency and decreased recall

Decreased anomaly identification frequency and increased recall

Buy Now

Questions 44

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

Options:

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

Create a model group for each category. Move the existing models into these category model groups.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Buy Now

Questions 45

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model ' s F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Buy Now

Questions 46

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

Options:

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Buy Now

Questions 47

A company has multiple models that are hosted on Amazon SageMaker Al. The models need to be re-trained. The requirements for each model are different, so the company needs to choose different deployment strategies to transfer all requests to a new model.

Select the correct strategy from the following list for each requirement. Select each strategy one time. (Select THREE.)

. Canary traffic shifting

. Linear traffic shifting guardrail

. All at once traffic shifting

MLA-C01 Question 47

Options:

Buy Now

Questions 48

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

Grid search

Random search

Bayesian optimization

Hyperband

Buy Now

Questions 49

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

Define the average F1 score in the TrainingInputMode parameter.

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

Publish the average F1 score as a custom Amazon CloudWatch metric.

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Buy Now

Questions 50

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Buy Now

Questions 51

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline ' s correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

MLA-C01 Question 51

Options:

Buy Now

Questions 52

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

Options:

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Buy Now

Questions 53

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

Amazon SageMaker Pipelines

Amazon Managed Workflows for Apache Airflow (MWAA)

AWS Lambda

Apache Spark

Buy Now

Questions 54

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

Options:

Use SageMaker AI file mode to load and process the images in batches.

Reduce the batch size of the model and increase the number of pre-processing threads.

Reduce the quality of the training images in the S3 bucket.

Convert the images into RecordIO format and use the lazy loading pattern.

Buy Now

Questions 55

A logistics company has installed in-vehicle cameras for basic monitoring of its drivers. The company wants to improve driver safety by identifying distractions that could lead to accidents.

Which solution will meet this requirement with the LEAST operational effort?

Options:

Use Amazon Rekognition eye gaze direction detection to monitor driver behavior and identify distractions.

Use Amazon SageMaker AI to customize an AI model to monitor driver behavior and identify distractions.

Integrate a third-party driver monitoring system with Amazon Rekognition to monitor driver behavior and identify distractions.

Use Amazon Comprehend to analyze text-based driver feedback and identify distractions.

Buy Now

Questions 56

An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate.

During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions.

What should the ML engineer do to improve the fraud detection for new transactions?

Options:

Increase the learning rate.

Remove some irrelevant features from the training dataset.

Increase the value of the max_depth hyperparameter.

Decrease the value of the max_depth hyperparameter.

Buy Now

Questions 57

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company ' s website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

Options:

Use the CatBoost algorithm to recommend the next airport destination.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

Use the Factorization Machines algorithm to recommend the next airport destination.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

Buy Now

Questions 58

A digital media entertainment company needs real-time video content moderation to ensure compliance during live streaming events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Rekognition and AWS Lambda to extract and analyze the metadata from the videos ' image frames.

Use Amazon Rekognition and a large language model (LLM) hosted on Amazon Bedrock to extract and analyze the metadata from the videos’ image frames.

Use Amazon SageMaker AI to extract and analyze the metadata from the videos ' image frames.

Use Amazon Transcribe and Amazon Comprehend to extract and analyze the metadata from the videos ' image frames.

Buy Now

Questions 59

A healthcare company wants to detect irregularities in patient vital signs that could indicate early signs of a medical condition. The company has an unlabeled dataset that includes patient health records, medication history, and lifestyle changes.

Which algorithm and hyperparameter should the company use to meet this requirement?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to greater than 100 to regulate tree complexity.

Use the Amazon SageMaker AI k-means clustering algorithm. Set k to determine the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to the number of training iterations.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set num_trees to greater than 100.

Buy Now

Questions 60

A healthcare analytics company wants to segment patients into groups that have similar risk factors to develop personalized treatment plans. The company has a dataset that includes patient health records, medication history, and lifestyle changes. The company must identify the appropriate algorithm to determine the number of groups by using hyperparameters.

Which solution will meet these requirements?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to control tree complexity for risk groups.

Use the Amazon SageMaker k-means clustering algorithm. Set k to specify the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to determine the number of training iterations for risk groups.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set a contamination hyperparameter for risk anomaly detection.

Buy Now

Questions 61

An ML engineer uses an Amazon SageMaker AI notebook instance to run a training job that trains a neural network model with an estimator. The training job loads data iteratively from an Amazon S3 path that is configured as an environment variable. The ML engineer viewed a profiling report of the training job. The ML engineer discovered that a substantial amount of the training time is spent during data loading.

How can the ML engineer improve the training speed?

Options:

Provision Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD io1 storage during the estimator initialization. Download the training data from the S3 path to Amazon EBS. Point the data loader to the EBS location.

Provision Amazon Elastic File System (Amazon EFS) storage during the estimator initialization. Download the training data to Amazon EFS by using the S3 path. Point the data loader to the EFS location.

Download the training data to the estimator by using fast file mode. Point the data loader to the location specified by the S3 path.

Configure the path to the S3 bucket that contains the training data as a hyperparameter instead of an environment variable.

Buy Now

Questions 62

An ML engineer wants to run a training job on Amazon SageMaker AI by using multiple GPUs. The training dataset is stored in Apache Parquet format.

The Parquet files are too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

Options:

Attach an Amazon EBS Provisioned IOPS SSD volume and store the files on the EBS volume.

Repartition the Parquet files by using Apache Spark on Amazon EMR and use the repartitioned files for training.

Change to memory-optimized instance types with sufficient memory.

Use SageMaker distributed data parallelism (SMDDP) to split memory usage.

Buy Now

Questions 63

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

AWS::SageMaker::Model

AWS::SageMaker::Endpoint

AWS::SageMaker::NotebookInstance

AWS::SageMaker::Pipeline

Buy Now

Questions 64

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Buy Now

Questions 65

A company needs to deploy a custom-trained classification ML model on AWS. The model must make near real-time predictions with low latency and must handle variable request volumes.

Which solution will meet these requirements?

Options:

Create an Amazon SageMaker AI batch transform job to process inference requests in batches.

Use Amazon API Gateway to receive prediction requests. Use an Amazon S3 bucket to host and serve the model.

Deploy an Amazon SageMaker AI endpoint. Configure auto scaling for the endpoint.

Launch AWS Deep Learning AMIs (DLAMI) on two Amazon EC2 instances. Run the instances behind an Application Load Balancer.

Buy Now

Questions 66

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Buy Now

Questions 67

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

Options:

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Buy Now

Questions 68

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company ' s ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers ' campaigns.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers ' campaigns.

Buy Now

Questions 69

A company ' s ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

Options:

Binary encoding

Label encoding

One-hot encoding

Tokenization

Buy Now

Questions 70

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Buy Now

Questions 71

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

Options:

Retrain the model with a different SageMaker built-in algorithm.

Use random undersampling to reduce the majority class and retrain the model.

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

Use random oversampling to duplicate minority samples and retrain the model.

Buy Now

Questions 72

An ML engineer is collecting data to train a classification ML model by using Amazon SageMaker AI. The target column can have two possible values: Class A or Class B. The ML engineer wants to ensure that the number of samples for both Class A and Class B are balanced, without losing any existing training data. The ML engineer must test the balance of the training data.

Which solution will meet this requirement?

Options:

Use SageMaker Clarify to check for class imbalance (CI). If the value is equal to 0, then use random undersampling in SageMaker Data Wrangler to balance the classes.

Use SageMaker Clarify to check for class imbalance (CI). If the value is greater than 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Data Wrangler to balance the classes.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is greater than 0, then use random undersampling in SageMaker Studio to balance the classes.

Use SageMaker JumpStart to generate a class imbalance (CI) report. If the value is equal to 0, then use synthetic minority oversampling technique (SMOTE) in SageMaker Studio to balance the classes.

Buy Now

Exam Code: MLA-C01

Exam Name: AWS Certified Machine Learning Engineer - Associate

Last Update: Jul 19, 2026

Questions: 241

PDF + Testing Engine

$134.99

Testing Engine

$99.99

PDF (Q&A)

$84.99

Weekend Certification Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

Dumpsbuddy logo

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer: