Ubuntu 16.04.7 LTS
4.15.0-142-generic
服务器使用nvidia-smi突然显卡报错:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
解决方案:1.查看服务器nvidia版本
ls /usr/src | grep nvidia
nvidia-455.23.05
2.执行apt install dkms
apt install dkmsReading package lists... DoneBuilding dependency treeReading state information... Donedkms is already the newest version (2.2.0.3-2ubuntu11.8).0 upgraded, 0 newly installed, 0 to remove and 62 not upgraded.
电脑
3.执行dkms install -m nvidia -v 455.23.05(你的显卡版本号)
dkms install -m nvidia -v 455.23.05Creating symlink /var/lib/dkms/nvidia/455.23.05/source -> /usr/src/nvidia-455.23.05DKMS: add completed.Kernel preparation unnecessary for this kernel. Skipping...Building module:cleaning build area....'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=4.15.0-142-generic IGNORE_CC_MISMATCH='1' modules........cleaning build area....DKMS: build completed.nvidia.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.15.0-142-generic/updates/dkms/nvidia-uvm.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to 电脑/lib/modules/4.15.0-142-generic/updates/dkms/nvidia-modeset.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.15.0-142-generic/updates/dkms/nvidia-drm.ko:Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.15.0-142-generic/updates/dkms/depmod....DKMS: install completed.
4.再次执行nvidia-smi,看看是否正常
nvidia-smiFri Mar 25 09:03:41 2022+-----------------------------------------------------------------------------+| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id 电脑 Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A || 24% 32C P0 59W / 250W | 0MiB / 11178MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+| 1 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A || 21% 37C P0 59W / 250W | 0MiB / 11178MiB | 0% Default || | | N/A |+----------------电脑 ---------------+----------------------+----------------------+| 2 GeForce GTX 108... Off | 00000000:88:00.0 Off | N/A || 15% 30C P0 57W / 250W | 0MiB / 11178MiB | 0% Default || | | N/A |+-------------------------------+----------------------+----------------------+| 3 GeForce GTX 108... Off | 00000000:89:00.0 Off | N/A || 12% 29C P0 53W / 250W | 0MiB / 11178MiB | 2% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running 电脑processes found |+-----------------------------------------------------------------------------+
电脑 电脑