Compiling for a GPU
Using a GPU can accelerate a code, but requires special programming and compiling. Several options are available for GPU-enabled programs.
OpenACC
OpenACC is a standard
Available NVIDIA CUDA Compilers
Module | Version | Module Load Command |
---|---|---|
cuda | 11.4.2 | module load cuda/11.4.2 |
cuda | 11.8.0 | module load cuda/11.8.0 |
cuda | 12.2.2 | module load cuda/12.2.2 |
cuda | 12.4.1 | module load cuda/12.4.1 |
cuda | 12.8.0 | module load cuda/12.8.0 |
Module | Version | Module Load Command |
---|---|---|
nvhpc | 24.1 | module load nvhpc/24.1 |
nvhpc | 24.5 | module load nvhpc/24.5 |
nvhpc | 25.3 | module load nvhpc/25.3 |
GPU architecture
According to the CUDA documentation, “in the CUDA naming scheme, GPUs are named sm_xy
, where x
denotes the GPU generation number, and y
the version in that generation.” The documentation contains details about the architecture and the corresponding xy
value. The compute capability is x.y
.
Please use the following values when compiling CUDA code on the HPC system.
Type | GPU | Architecture | Compute Capability | CUDA Version |
---|---|---|---|---|
Datacenter | V100 | Volta | 7.0 | 9+ |
A100 | Ampere | 8.0 | 11+ | |
A40 | Ampere | 8.6 | 11+ | |
H200 | Hopper | 9.0 | 11.8+ | |
RTX | A6000 | Ampere | 8.6 | 11+ |
GeForce | RTX2080Ti | Turing | 7.5 | 10+ |
RTX3090 | Ampere | 8.6 | 11+ |
As an example, if you are only interested in V100 and A100:
-gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80