CUDA
CUDA (Compute Unified Device Architecture) is a computing platform for NVIDIA GPUs that allows writing code for GPGPU tasks.
The CUDA toolchain includes:
- C/C++ Language Extensions: Keywords to define GPU kernels in C/C++.
- Compiler:
nvccseparates host code (CPU-side) from device code (GPU-side) - Runtime API: Managing GPU resources, kernels, memory and host/device transfer
- Libraries: Linear algebra (
cuBLAS), deep learning (cuDNN), image processing - Profiling and Debugging Tools: NVIDIA Nsight and Visual Profiler
Code Examples
#include <stdio.h>
/* __global__, __device__, and __host__ specify where functions are executed */
__global__ void helloFromGPU() { // kernel function running on gpu
printf("Hello from GPU! Thread ID: %d\n", threadIdx.x);
}
int main() {
printf("Hello from CPU!\n");
helloFromGPU<<<1, 10>>>(); // launch kernel with 1 block and 10 threads
cudaDeviceSynchronize(); // wait for gpu to finish
return 0;
}
nvcc hello_world.cu -o hello_world ./hello_world