A Slurm user needs to submit a batch job script for execution tomorrow.
Which command should be used to complete this task?
An administrator needs to submit a script named “my_script.sh” to Slurm and specify a custom output file named “output.txt” for storing the job's standard output and error.
Which ‘sbatch’ option should be used?
A system administrator is experiencing issues with Docker containers failing to start due to volume mounting problems. They suspect the issue is related to incorrect file permissions on shared volumes between the host and containers.
How should the administrator troubleshoot this issue?
A system administrator is looking to set up virtual machines in an HGX environment with NVIDIA Fabric Manager.
What three (3) tasks will Fabric Manager accomplish? (Choose three.)
An administrator requires full access to the NGC Base Command Platform CLI.
Which command should be used to accomplish this action?
What steps should an administrator take if they encounter errors related to RDMA (Remote Direct Memory Access) when using Magnum IO?
A data scientist is training a deep learning model and notices slower than expected training times. The data scientist alerts a system administrator to inspect the issue. The system administrator suspects the disk IO is the issue.
What command should be used?
When troubleshooting Slurm job scheduling issues, a common source of problems is jobs getting stuck in a pending state indefinitely.
Which Slurm command can be used to view detailed information about all pending jobs and identify the cause of the delay?
A system administrator wants to run these two commands in Base Command Manager.
main
showprofile device status apc01
What command should the system administrator use from the management node system shell?
A Slurm user is experiencing a frequent issue where a Slurm job is getting stuck in the “PENDING” state and unable to progress to the “RUNNING” state.
Which Slurm command can help the user identify the reason for the job’s pending status?
An organization only needs basic network monitoring and validation tools.
Which UFM platform should they use?
You are deploying an AI workload on a Kubernetes cluster that requires access to GPUs for training deep learning models. However, the pods are not able to detect the GPUs on the nodes.
What would be the first step to troubleshoot this issue?
Your organization is deploying an AI workload that requires high-throughput access to shared storage across multiple servers. The workload involves both training and inference tasks that need fast read and write speeds.
Which storage architecture would best support this AI workload?
You are setting up a Kubernetes cluster on NVIDIA DGX systems using BCM, and you need to initialize the control-plane nodes.
What is the most important step to take before initializing these nodes?
You are tasked with deploying a deep learning framework container from NVIDIA NGC on a stand-alone GPU-enabled server.
What must you complete before pulling the container? (Choose two.)