Installing NVIDIA Drivers on Ubuntu: A Practical Engineering Workflow
NVIDIA GPUs demand correct, up-to-date drivers for full hardware utilization on Ubuntu. Deploying the wrong driver—or leaving traces of the default Nouveau module—inevitably leads to suboptimal performance or system instability. This workflow outlines the standard, reliable process for a modern Ubuntu workstation (tested on 22.04 LTS, kernel 5.15+) with PCIe or laptop-integrated NVIDIA cards.
Recurring Problem: Erratic CUDA Kernel Failures
A common real-world scenario: deploying TensorRT workloads on a fresh Ubuntu install, only to encounter cryptic CUDA_ERROR_UNKNOWN
failures. In 7 of 10 cases, this traces back to mismatched or conflicting NVIDIA driver installations, typically due to automatic Nouveau loading or partial legacy drivers left by an upgrade.
Step 1: Confirm GPU Model & Existing Driver State
lspci | grep -i nvidia
Typical output:
01:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
Check driver status (returns error if not present):
nvidia-smi
Expected for working driver:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-------------------------------...
If you see:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver...
proceed with installation.
Step 2: Update System, Avoiding Old Kernel Issues
sudo apt update
sudo apt full-upgrade -y
full-upgrade
ensures dependency shifts (common after PPA addition) are handled cleanly.
Step 3: Explicitly Blacklist Nouveau (Hard Failures Without This)
Edit/create the modprobe config:
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
Insert:
blacklist nouveau
options nouveau modeset=0
Then:
sudo update-initramfs -u
sudo reboot
Known issue: Failing to reboot here often leaves Nouveau in memory, breaking DKMS module insertion later.
Post-reboot, validate:
lsmod | grep nouveau
No output confirms success.
Step 4: Enable Latest Driver Repository
Sometimes, apt
sources lag behind NVIDIA’s own releases (critical for recent cards, e.g., RTX 40xx):
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
For hardened environments, skip this and use the stock repos. Otherwise, PPA offers bleeding edge.
Step 5: Identify and Install the Recommended Driver
Use:
ubuntu-drivers devices
Look for recommended
in the output, such as:
driver : nvidia-driver-535 - third-party free recommended
Then install:
sudo apt install nvidia-driver-535 -y
(Replace with whatever is current/recommended for your GPU. Quadro and legacy users may need other versions—check compatibility on NVIDIA’s official matrix.)
Step 6: Reboot, Then Hard Verification
sudo reboot
Post-reboot checks:
nvidia-smi
lsmod | grep nvidia
glxinfo | grep "OpenGL renderer"
If nvidia-smi
still fails, review /var/log/syslog
for kernel module insertion errors:
grep NVRM /var/log/syslog
Practical example:
If Secure Boot is enabled in firmware, driver module signing will fail silently, causing DKMS misinstallation. Disable Secure Boot in BIOS, reinstall driver, and retest.
Step 7: Systems with Hybrid GPUs (Optimus/PRIME)
For laptops (e.g., ThinkPad P1, Dell XPS with both NVIDIA and Intel GPUs):
sudo apt install nvidia-prime
sudo prime-select nvidia
sudo reboot
Switch back to Intel as needed:
sudo prime-select intel
Performance and power trade-off: selecting NVIDIA forces discrete GPU, increasing power draw, but is mandatory for CUDA.
Step 8: (Optional) Install CUDA Toolkit—Version Matching Required
Non-obvious tip: Always install CUDA after verifying the kernel module loads cleanly.
Visit NVIDIA CUDA Toolkit Archive for the desired version (e.g., CUDA 12.2, matching driver >= 535). Example:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.0-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install cuda
Add to ~/.bashrc
:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Gotcha: Installing mismatched major CUDA and driver versions results in this error:
CUDA driver version is insufficient for CUDA runtime version
Troubleshooting
Symptom | Likely Cause | Solution |
---|---|---|
Black screen after driver install | Secure Boot, Wayland, old xorg config | Boot with nomodeset , disable Secure Boot, restore /etc/X11/xorg.conf.backup |
modprobe nvidia fails | Nouveau loaded | Repeat blacklisting, check for typos |
nvidia-smi hangs, dmesg RM: version... | PCIe ASPM bug, or old BIOS | Upgrade motherboard firmware, check PCIe settings |
Side note:
For cloud images (AWS, GCP), use distro-specific CUDA/NVIDIA guides—kernel flavors may break PPA modules.
Summary
Flawless NVIDIA GPU operation on Ubuntu hinges on three factors: eradicating Nouveau, syncing driver and CUDA versions, and rebooting at appropriate points. For advanced installations (multiple cards, GRID/virtualization), vendor scripts or custom DKMS builds may be justified, but for most workstations and development rigs, the outlined method is robust. Alternative approaches—like .run file installs—exist, but are harder to maintain and best avoided unless integration with nvidia-docker or experimental hardware is required.
If unique failures arise—kernel panic, missing /dev/nvidia*
—always check kernel logs and DKMS status before reinstalling drivers.
No approach is truly “one and done” as upstream changes can break dependencies. Document your working config for future reference.