CUDA
CUDA (Compute Unified Device Architecture) is a computing platform for NVIDIA GPUs that allows writing code for GPGPU tasks.
The CUDA toolchain includes:
- C/C++ Language Extensions: Keywords to define GPU kernels in C/C++.
- Compiler:
nvcc
separates host code (CPU-side) from device code (GPU-side) - Runtime API: Managing GPU resources, kernels, memory and host/device transfer
- Libraries: Linear algebra (
cuBLAS
), deep learning (cuDNN
), image processing - Profiling and Debugging Tools: NVIDIA Nsight and Visual Profiler
Code Examples
#include <stdio.h> /* __global__, __device__, and __host__ specify where functions are executed */ __global__ void helloFromGPU() { // kernel function running on gpu printf("Hello from GPU! Thread ID: %d\n", threadIdx.x); } int main() { printf("Hello from CPU!\n"); helloFromGPU<<<1, 10>>>(); // launch kernel with 1 block and 10 threads cudaDeviceSynchronize(); // wait for gpu to finish return 0; }
nvcc hello_world.cu -o hello_world ./hello_world