/tag/utilization
CPU Memory Best Practices
Efficient CPU memory usage helps ensure that the shared cluster resources remain available for all users. Requesting too much memory can lead to longer queue times (for you and others), while requesting too little may cause jobs to fail.
Aim to request an appropriate amount memory for all of your jobs.
• Target utilization: ~80–90% of requested memory
If you are running many similar jobs (e.g., job arrays, parameter sweeps, workflows processing many different samples, etc.), it is especially important to estimate memory needs before scaling up.
Why this matters
Submitting hundreds or thousands of jobs with overestimated memory can:
GPU Best Practices
Core Principles Measure before you scale. Always take a short, single‑GPU baseline and record simple metrics. Right‑size, don’t over‑ask. Request only the GPUs/CPUs/RAM and walltime your measurements justify. Keep GPUs busy. If utilization is low, fix input/data issues before adding more GPUs. Short interactive, long batch. Use OOD for quick experiments; move long work to SLURM. Be a good citizen. Release idle sessions, clean up scratch, and prefer storage patterns that reduce system load. Right‑Sizing in 5 Steps Baseline (≤5 minutes): Run a tiny slice on 1 GPU. Note: Throughput (samples/s or tokens/s) GPU utilization and memory usage Any stalls from CPU or I/O Find the knee: Increase batch size and enable mixed precision if supported.
Policy on Zero GPU Utilization Jobs
User Notification and Enforcement Research Computing expects all users of the UVA HPC clusters to make efficient use of the resources allocated to their jobs. Recently, we’ve observed significant queuing delays in the gpu partition, an issue frequently echoed in user feedback. At the same time, many submitted GPU jobs either show very low utilization or, in some cases, no GPU usage at all—despite consuming high-demand resources.
To improve system efficiency and reduce wait times, we are introducing the following policy changes:
Alert Emails Begin August 12, 2025 Starting August 12, Research Computing will begin sending informational emails to users whose jobs in the gpu partition meet the following conditions: