Originally, Ubuntu Server had already installed the NVIDIA graphics driver, and executing nvidia-smi showed that the status was normal. After installing the CUDA driver, I executed nvidia-smi to check the status, and this prompt appeared.
root@localhost:~# nvidia-smiNVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.I thought the error was caused by the system not recognizing the graphics card, so I checked the PCI information.
root@localhost:~# lspci | grep -i nvidia0b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)If the graphics card device is still there, then there is a problem with the driver. In this case, use dkms to compile and install the nvidia driver.
Dynamic Kernel Module Support (DKMS) is a program/framework that enables generating Linux kernel modules whose sources generally reside outside the kernel source tree. The concept is to have DKMS modules automatically rebuilt when a new kernel is installed.
—— From Wikipedia
Install dkms
root@localhost:~# apt-get install dkmsCheck the NVIDIA driver version
root@localhost:~# ls /usr/src | grep nvidianvidia-550.25.65Execute dkms to compile and install the NVIDIA driver module
root@localhost:~# dkms install -m nvidia -v 550.25.65/bin/bash: /usr/local/anaconda/lib/libtinfo.so.6: no version information available (required by /bin/bash)Creating symlink /var/lib/dkms/nvidia/550.25.65/source -> /usr/src/nvidia-550.25.65
Kernel preparation unnecessary for this kernel. Skipping...
Building module:cleaning build area...'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.0-131-generic modules.....................cleaning build area...
nvidia.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.15.0-131-generic/updates/dkms/
nvidia-uvm.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.15.0-131-generic/updates/dkms/
nvidia-modeset.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.15.0-131-generic/updates/dkms/
nvidia-drm.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.15.0-131-generic/updates/dkms/
nvidia-peermem.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/5.15.0-131-generic/updates/dkms/
depmod....root@localhost:~#View NVIDIA driver information
root@localhost:~# nvidia-smiThu Feb 20 15:11:42 2025+-----------------------------------------------------------------------------------------+| NVIDIA-SMI 550.25.65 Driver Version: 550.25.65 CUDA Version: 12.8 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 Tesla T4 Off | 00000000:0B:00.0 Off | 0 || N/A 56C P0 26W / 70W | 1MiB / 15360MiB | 9% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| No running processes found |+-----------------------------------------------------------------------------------------+Displays normally, perfect!