Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: best70

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team's permissions.

Which solution will meet this requirement?

Options:

A.

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Buy Now
Questions 5

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.

Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

B.

Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

C.

Use AWS Glue DataBrew built-in features to oversample the minority class.

D.

Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Buy Now
Questions 6

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.

Run the primary node, core nodes, and task nodes on Spot Instances.

C.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Buy Now
Questions 7

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

A.

Use a multi-model serverless endpoint. Enable caching.

B.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Buy Now
Questions 8

A company launches a feature that predicts home prices. An ML engineer trained a regression model using the SageMaker AI XGBoost algorithm. The model performs well on training data but underperforms on real-world validation data.

Which solution will improve the validation score with the LEAST implementation effort?

Options:

A.

Create a larger training dataset with more real-world data and retrain.

B.

Increase the num_round hyperparameter.

C.

Change the eval_metric from RMSE to Error.

D.

Increase the lambda hyperparameter.

Buy Now
Questions 9

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

Options:

A.

Apply statistics from a well-known dataset to normalize the production samples.

B.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

C.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

D.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Buy Now
Questions 10

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

A.

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Buy Now
Questions 11

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images.

B.

Optimize for F1 score. Use image augmentation on the less common images.

C.

Optimize for accuracy. Use SMOTE to generate synthetic images.

D.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Buy Now
Questions 12

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account В in the same Region.

Which solution will meet this requirement with the LEAST development effort?

Options:

A.

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

C.

Use AWS DataSync to replicate the model from Account A to Account B.

D.

Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Buy Now
Questions 13

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant's answers.

Which solution will meet these requirements?

Options:

A.

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

B.

Use a reranker model before passing the articles to the foundation model (FM).

C.

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

D.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Buy Now
Questions 14

A company has significantly increased the amount of data stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than before.

An ML engineer must implement a solution to optimize the data for query performance with the LEAST operational overhead.

Which solution will meet this requirement?

Options:

A.

Configure an AWS Lambda function to split the .csv files into smaller objects.

B.

Configure an AWS Glue job to drop string-type columns and save the results to S3.

C.

Configure an AWS Glue ETL job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data in S3.

Buy Now
Questions 15

A company regularly receives new training data from a vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3–4 days.

The company has an Amazon SageMaker AI pipeline to retrain the model. An ML engineer needs to run the pipeline automatically when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Create an S3 lifecycle rule to transfer the data to the SageMaker AI training instance and initiate training.

B.

Create an AWS Lambda function that scans the S3 bucket and initiates the pipeline when new data is uploaded.

C.

Create an Amazon EventBridge rule that matches S3 upload events and configures the SageMaker pipeline as the target.

D.

Use Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate the pipeline when new data is uploaded.

Buy Now
Questions 16

A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.

An ML engineer must make the training data accessible for ML models that are in the SageMaker environment.

Which solution will meet these requirements?

Options:

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

B.

Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Buy Now
Questions 17

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Create code to evaluate each instance's memory and compute usage.

B.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C.

Check AWS CloudTrail event history for the creation of the resources.

D.

Run AWS Compute Optimizer.

Buy Now
Questions 18

A company uses an ML model to recommend videos to users. The model is deployed on Amazon SageMaker AI. The model performed well initially after deployment, but the model's performance has degraded over time.

Which solution can the company use to identify model drift in the future?

Options:

A.

Create a monitoring job in SageMaker Model Monitor. Then create a baseline from the training dataset.

B.

Create a baseline from the training dataset. Then create a monitoring job in SageMaker Model Monitor.

C.

Create a baseline by using a built-in rule in SageMaker Clarify. Monitor the drift in Amazon CloudWatch.

D.

Retrain the model on new data. Compare the retrained model's performance to the original model's performance.

Buy Now
Questions 19

An ML engineer is using AWS CodeDeploy to deploy new container versions for inference on Amazon ECS.

The deployment must shift 10% of traffic initially, and the remaining 90% must shift within 10–15 minutes.

Which deployment configuration meets these requirements?

Options:

A.

CodeDeployDefault.LambdaLinear10PercentEvery10Minutes

B.

CodeDeployDefault.ECSAllAtOnce

C.

CodeDeployDefault.ECSCanary10Percent15Minutes

D.

CodeDeployDefault.LambdaCanary10Percent15Minutes

Buy Now
Questions 20

An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach.

Which solution will meet these requirements?

Options:

A.

Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

B.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

C.

Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

D.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Buy Now
Questions 21

A company has implemented a data ingestion pipeline for sales transactions from its ecommerce website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service. The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model generates real-time sales forecasts based on the data and presents the data in an OpenSearch dashboard.

The company needs to optimize the data ingestion pipeline to support sub-second latency for the real-time dashboard.

Which change to the architecture will meet these requirements?

Options:

A.

Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B.

Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C.

Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D.

Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Buy Now
Questions 22

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

Options:

A.

CSV files compressed with Snappy

B.

JSON objects in JSONL format

C.

JSON files compressed with gzip

D.

Apache Parquet files

Buy Now
Questions 23

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

Options:

A.

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Buy Now
Questions 24

A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.

Which technique for feature engineering should the ML engineer use for the model?

Options:

A.

Apply label encoding to the color categories. Automatically assign each color a unique integer.

B.

Implement padding to ensure that all color feature vectors have the same length.

C.

Perform dimensionality reduction on the color categories.

D.

One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Buy Now
Questions 25

A company needs to deploy a custom-trained classification ML model on AWS. The model must make near real-time predictions with low latency and must handle variable request volumes.

Which solution will meet these requirements?

Options:

A.

Create an Amazon SageMaker AI batch transform job to process inference requests in batches.

B.

Use Amazon API Gateway to receive prediction requests. Use an Amazon S3 bucket to host and serve the model.

C.

Deploy an Amazon SageMaker AI endpoint. Configure auto scaling for the endpoint.

D.

Launch AWS Deep Learning AMIs (DLAMI) on two Amazon EC2 instances. Run the instances behind an Application Load Balancer.

Buy Now
Questions 26

A company is using Amazon SageMaker to create ML models. The company's data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.

Which solution will meet these requirements?

Options:

A.

Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

B.

Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

C.

Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

D.

Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Buy Now
Questions 27

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.

The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage.

Which solution will meet these requirements?

Options:

A.

Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

B.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

C.

Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

D.

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Buy Now
Questions 28

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

Options:

A.

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

B.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

C.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

D.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Buy Now
Questions 29

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

Options:

A.

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

B.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

C.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

D.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Buy Now
Questions 30

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

Options:

A.

Use Spot Instances to increase the number of primary nodes.

B.

Use Spot Instances to increase the number of core nodes.

C.

Use Spot Instances to increase the number of task nodes.

D.

Use On-Demand Instances to increase the number of core nodes.

Buy Now
Questions 31

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

Options:

A.

Spot Instances

B.

Reserved Instances

C.

On-Demand Instances

D.

Dedicated Instances

Buy Now
Questions 32

An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

B.

Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords.

C.

Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords.

D.

Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Buy Now
Questions 33

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Buy Now
Questions 34

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

Options:

A.

LightGBM

B.

Linear learner

C.

К-means clustering

D.

Neural Topic Model (NTM)

Buy Now
Questions 35

A company has multiple models that are hosted on Amazon SageMaker Al. The models need to be re-trained. The requirements for each model are different, so the company needs to choose different deployment strategies to transfer all requests to a new model.

Select the correct strategy from the following list for each requirement. Select each strategy one time. (Select THREE.)

. Canary traffic shifting

. Linear traffic shifting guardrail

. All at once traffic shifting

MLA-C01 Question 35

Options:

Buy Now
Questions 36

A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.

Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket.

B.

Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system.

C.

Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system.

D.

Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job.

Buy Now
Questions 37

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.

What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Options:

A.

Adjust the model's parameters and hyperparameters.

B.

Initiate a manual Model Monitor job that uses the most recent production data.

C.

Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

D.

Include additional data in the existing training set for the model. Retrain and redeploy the model.

Buy Now
Questions 38

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Buy Now
Questions 39

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

A.

Amazon SageMaker Pipelines

B.

Amazon Managed Workflows for Apache Airflow (MWAA)

C.

AWS Lambda

D.

Apache Spark

Buy Now
Questions 40

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Buy Now
Questions 41

An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs).

Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.)

• Embedding

• Retrieval Augmented Generation (RAG)

• Temperature

• Token

MLA-C01 Question 41

Options:

Buy Now
Questions 42

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.

Use a custom Amazon SageMaker AI notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now
Questions 43

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

A.

Grid search

B.

Random search

C.

Bayesian optimization

D.

Hyperband

Buy Now
Questions 44

A company is developing an application that reads animal descriptions from user prompts and generates images based on the information in the prompts. The application reads a message from an Amazon Simple Queue Service (Amazon SQS) queue. Then the application uses Amazon Titan Image Generator on Amazon Bedrock to generate an image based on the information in the message. Finally, the application removes the message from SQS queue.

Which IAM permissions should the company assign to the application's IAM role? (Select TWO.)

Options:

A.

Allow the bedrock:InvokeModel action for the Amazon Titan Image Generator resource.

B.

Allow the bedrock:Get* action for the Amazon Titan Image Generator resource.

C.

Allow the sqs:ReceiveMessage action and the sqs:DeleteMessage action for the SQS queue resource.

D.

Allow the sqs:GetQueueAttributes action and the sqs:DeleteMessage action for the SQS queue resource.

E.

Allow the sagemaker:PutRecord* action for the Amazon Titan Image Generator resource.

Buy Now
Questions 45

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.

Which action will meet this requirement with the LEAST operational overhead?

Options:

A.

Use AWS Glue to transform the categorical data into numerical data.

B.

Use AWS Glue to transform the numerical data into categorical data.

C.

Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

D.

Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Buy Now
Questions 46

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?

Options:

A.

Amazon EMR Spark jobs

B.

Amazon Kinesis Data Streams

C.

Amazon DynamoDB

D.

AWS Lake Formation

Buy Now
Questions 47

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

A.

AWS::SageMaker::Model

B.

AWS::SageMaker::Endpoint

C.

AWS::SageMaker::NotebookInstance

D.

AWS::SageMaker::Pipeline

Buy Now
Questions 48

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

Options:

A.

Retrain the model with a different SageMaker built-in algorithm.

B.

Use random undersampling to reduce the majority class and retrain the model.

C.

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

D.

Use random oversampling to duplicate minority samples and retrain the model.

Buy Now
Questions 49

A company's dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

Options:

A.

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

B.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

C.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

D.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Buy Now
Questions 50

A travel company has trained hundreds of geographic data models to answer customer questions by using Amazon SageMaker AI. Each model uses its own inferencing endpoint, which has become an operational challenge for the company.

The company wants to consolidate the models' inferencing endpoints to reduce operational overhead.

Which solution will meet these requirements?

Options:

A.

Use SageMaker AI multi-model endpoints. Deploy a single endpoint.

B.

Use SageMaker AI multi-container endpoints. Deploy a single endpoint.

C.

Use Amazon SageMaker Studio. Deploy a single-model endpoint.

D.

Use inference pipelines in SageMaker AI to combine tasks from hundreds of models to 15 models.

Buy Now
Questions 51

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months.

Which solution will meet this requirement?

Options:

A.

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation.

B.

Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation.

C.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation.

D.

Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation.

Buy Now
Questions 52

A company is gathering audio, video, and text data in various languages. The company needs to use a large language model (LLM) to summarize the gathered data that is in Spanish.

Which solution will meet these requirements in the LEAST amount of time?

Options:

A.

Train and deploy a model in Amazon SageMaker to convert the data into English text. Train and deploy an LLM in SageMaker to summarize the text.

B.

Use Amazon Transcribe and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Jurassic model to summarize the text.

C.

Use Amazon Rekognition and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Anthropic Claude model to summarize the text.

D.

Use Amazon Comprehend and Amazon Translate to convert the data into English text. Use Amazon Bedrock with the Stable Diffusion model to summarize the text.

Buy Now
Questions 53

A company is exploring generative AI and wants to add a new product feature. An ML engineer is making API calls from existing Amazon EC2 instances to Amazon Bedrock.

The EC2 instances are in a private subnet and must remain private during the implementation. The EC2 instances have a security group that allows access to all IP addresses in the private subnet.

What should the ML engineer do to establish a connection between the EC2 instances and Amazon Bedrock?

Options:

A.

Modify the security group to allow inbound and outbound traffic to and from Amazon Bedrock.

B.

Use AWS PrivateLink to access Amazon Bedrock through an interface VPC endpoint.

C.

Configure Amazon Bedrock to use the private subnet where the EC2 instances are deployed.

D.

Use AWS Direct Connect to link the VPC to Amazon Bedrock.

Buy Now
Questions 54

A company is developing ML models by using PyTorch and TensorFlow estimators with Amazon SageMaker AI. An ML engineer configures the SageMaker AI estimator and now needs to initiate a training job that uses a training dataset.

Which SageMaker AI SDK method can initiate the training job?

Options:

A.

fit method

B.

create_model method

C.

deploy method

D.

predict method

Buy Now
Questions 55

An ML engineer is analyzing a classification dataset before training a model in Amazon SageMaker AI. The ML engineer suspects that the dataset has a significant imbalance between class labels that could lead to biased model predictions. To confirm class imbalance, the ML engineer needs to select an appropriate pre-training bias metric.

Which metric will meet this requirement?

Options:

A.

Mean squared error (MSE)

B.

Difference in proportions of labels (DPL)

C.

Silhouette score

D.

Structural similarity index measure (SSIM)

Buy Now
Questions 56

An ML engineer is tuning an image classification model that shows poor performance on one of two available classes during prediction. Analysis reveals that the images whose class the model performed poorly on represent an extremely small fraction of the whole training dataset.

The ML engineer must improve the model's performance.

Which solution will meet this requirement?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images to generate new samples.

B.

Optimize for F1 score. Use image augmentation on the less common images to generate new samples.

C.

Optimize for accuracy. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

D.

Optimize for F1 score. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

Buy Now
Questions 57

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

B.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Buy Now
Questions 58

A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size.

An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.

B.

Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.

C.

Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.

D.

Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.

Buy Now
Questions 59

A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups.

The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker.

Which solution will provide the ML engineers with the appropriate access?

Options:

A.

Enable S3 bucket versioning.

B.

Configure S3 Object Lock settings for each user.

C.

Add cross-origin resource sharing (CORS) policies to the S3 buckets.

D.

Create IAM policies. Attach the policies to IAM users or IAM roles.

Buy Now
Questions 60

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

MLA-C01 Question 60

Options:

Buy Now
Questions 61

A company uses the Amazon SageMaker AI Object2Vec algorithm to train an ML model. The model performs well on training data but underperforms after deployment. The company wants to avoid overfitting the model and maintain the model's ability to generalize.

Which solution will meet these requirements?

Options:

A.

Decrease the early_stopping_patience hyperparameter.

B.

Increase the mini_batch_size hyperparameter.

C.

Decrease the dropout rate.

D.

Increase the number of epochs.

Buy Now
Questions 62

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

Options:

A.

Purchase Reserved Instances with partial upfront payment.

B.

Purchase On-Demand Instances.

C.

Purchase SageMaker AI Savings Plans.

D.

Purchase Spot Instances that use automated checkpoints.

Buy Now
Exam Code: MLA-C01
Exam Name: AWS Certified Machine Learning Engineer - Associate
Last Update: Mar 1, 2026
Questions: 207

PDF + Testing Engine

$134.99

Testing Engine

$99.99

PDF (Q&A)

$84.99