Preface

Deploy a GPU computing server for private AI model training. This deployment uses the Ubuntu system as an example.

NameVersionArch
Ubuntu22.04x86_64
NVIDIA Drivers570.124.06x86_64
CUDA11520.61.05x86_64
CUDNN9.8.0.87x86_64

⚠️ 注意 Before configuring the service, please check for compatibility issues between versions, otherwise various errors may occur when deploying the training environment!

NVIDIA graphics card driver download CUDA driver version download list CUDNN library version download list

Prepare Ubuntu to install NVIDIA graphics card environment

2.1 Install system-based dependencies

Terminal window
koevn@localhost:~$ sudo apt install -y build-essential dracut-core linux-headers-$(uname -r)

2.2 Check if Linux recognizes the NVIDIA graphics card

Terminal window
koevn@localhost:~$ sudo lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

2.3 Check if Linux Nouveau is disabled

Terminal window
koevn@localhost:~$ sudo lsmod | grep nouveau
nouveau 2306048 0
mxm_wmi 16384 1 nouveau
i2c_algo_bit 16384 1 nouveau
drm_ttm_helper 16384 1 nouveau
ttm 86016 3 vmwgfx,drm_ttm_helper,nouveau
drm_kms_helper 311296 2 vmwgfx,nouveau
video 65536 1 nouveau
wmi 32768 2 mxm_wmi,nouveau
drm 622592 7 vmwgfx,drm_kms_helper,drm_ttm_helper,ttm,nouveau

If the above information is displayed, it means that the system nouveau is loading. Perform the following operations to disable nouveau

Terminal window
koevn@localhost:~$ sudo cat > /etc/modprobe.d/blacklist-nouveau.conf << EOF
blacklist nouveau
options nouveau modset=0
EOF
koevn@localhost:~$ sudo dracut --force
koevn@localhost:~$ sudo reboot

The reason why we need to disable system nouveau here is because we want to install the official driver provided by NVIDIA, which is closed source, while nouveau is open source. If it is not disabled, the Linux system will load nouveau by default, which will cause a conflict between the two drivers and cause strange problems.

After the system restarts, run the sudo lsmod | grep nouveau command to check if there is any output. If not, the system is complete.

Install NVIDIA Driver

Upload the downloaded NVIDIA driver package to Linux and then install it

Terminal window
koevn@localhost:~$ cd /tmp
koevn@localhost:/tmp$ sudo chmod +x NVIDIA-Linux-x86_64-570.124.06.run
koevn@localhost:/tmp$ sudo ./NVIDIA-Linux-x86_64-570.124.06.run -no-opengl-files -no-nouveau-check
  • -no-opengl-files: Do not use the OpenGL dynamic library provided by NVIDIA because the system used is not GUI
  • -no-nouveau-check: Skip nouveau check Verify that the NVIDIA driver is installed successfully
Terminal window
koevn@localhost:~$ sudo nvidia-smi
Tue Apr 8 16:12:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:03:00.0 Off | 0 |
| N/A 50C P0 25W / 70W | 1MiB / 15360MiB | 9% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

Install CUDA

According to the CUDA driver version download list,select the system version and architecture, select Download > Download the installation package with the installation type of runfile(local), upload it to Linux and install it.

Terminal window
koevn@localhost:/tmp$ wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
koevn@localhost:/tmp$ sudo chmod +x cuda_11.8.0_520.61.05_linux.run
koevn@localhost:/tmp$ sudo ./cuda_11.8.0_520.61.05_linux.run --no-opengl-libs --toolkit

CUDA Installation Steps

⚠️ 注意 Since the NVIDIA graphics driver has been installed before, press the space bar in this step to deselect the graphics driver installation, and then select install

The installation is complete. Configure the system environment variables according to the prompts

Terminal window
koevn@localhost:~$ sudo cat > /etc/profile.d/cuda.sh << EOF
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
EOF

Verify that CUDA is installed successfully

Terminal window
koevn@localhost:~$ sudo nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Add CUDNN

Download the corresponding cudnn version and upload it to Linux, and do the following

Terminal window
koevn@localhost:/tmp$ tar -xvf cudnn-linux-x86_64-9.8.0.87_cuda11-archive.tar.xz
koevn@localhost:/tmp$ mv cudnn-linux-x86_64-9.8.0.87_cuda11-archive cudnn
koevn@localhost:/tmp$ cd cudnn
koevn@localhost:/tmp/cudnn$ sudo cp lib/* /usr/local/cuda-11.8/lib64/
koevn@localhost:/tmp/cudnn$ sudo cp include/* /usr/local/cuda-11.8/include/
koevn@localhost:/tmp/cudnn$ sudo chmod a+r /usr/local/cuda-11.8/lib64/*
koevn@localhost:/tmp/cudnn$ sudo chmod a+r /usr/local/cuda-11.8/include/*

Verify CUDNN version

Terminal window
koevn@localhost:/tmp/cudnn$ sudo cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 9
#define CUDNN_MINOR 8
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 10000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
/* cannot use constexpr here since this is a C-only file */

That’s it!