CUDA

CUDA (Compute Unified Device Architecture) is a computing platform for NVIDIA GPUs that allows writing code for GPGPU tasks.

The CUDA toolchain includes:

C/C++ Language Extensions: Keywords to define GPU kernels in C/C++.
Compiler: nvcc separates host code (CPU-side) from device code (GPU-side)
Runtime API: Managing GPU resources, kernels, memory and host/device transfer
Libraries: Linear algebra (cuBLAS), deep learning (cuDNN), image processing
Profiling and Debugging Tools: NVIDIA Nsight and Visual Profiler

Code Examples

#include <stdio.h>

/* __global__, __device__, and __host__ specify where functions are executed */

__global__ void helloFromGPU() { // kernel function running on gpu
    printf("Hello from GPU! Thread ID: %d\n", threadIdx.x);
}

int main() {
    printf("Hello from CPU!\n");
    helloFromGPU<<<1, 10>>>(); // launch kernel with 1 block and 10 threads
    cudaDeviceSynchronize();   // wait for gpu to finish
    return 0;
}

nvcc hello_world.cu -o hello_world
./hello_world

Resources

NVIDIA GPU Glossary