The GPU Part I Glossed Over
GPU acceleration in Proxmox LXC, what breaks, and why the 1080 still delivers

In the previous post about setting up Ollama and Open WebUI on Proxmox, I mentioned GPU passthrough as “fiddly” and recommended getting CPU mode working first. That was honest, if abbreviated. Getting the GTX 1080 working inside an LXC container took longer than expected, and nearly all of that time went to two problems I hadn’t anticipated.
- The IOMMU dead end that led to discovering LXC was actually the better path anyway.
- A driver library mismatch that produces a CUDA error cryptic enough to cost an afternoon.
This post covers both. Environment for context is Proxmox VE 9, an old NVIDIA GTX 1080 (Pascal architecture, 8 GB GDDR5X) I found in a storage closet, and the Debian-based LXC container I setup in the previous post.
- The IOMMU Dead End — why LXC instead of VM passthrough
- Starting on the Host Side — driver setup and persistence mode
- Container Configuration — device nodes and cgroup2 permissions
- The Version Mismatch Problem — error 803 and the library fix
- Confirming GPU Acceleration — testing and verification
- What Eight Gigabytes Gets You — model sizing and performance
- Quick Reference — commands for when things break
The IOMMU Dead End
The original plan was VM PCI passthrough. The Proxmox PCI(e) Passthrough documentation describes it well, and it’s the cleaner approach when the hardware supports it — give the VM exclusive GPU access, install drivers inside, done.
PCI passthrough requires the host’s IOMMU to be functional. Intel VT-d and AMD-Vi are the hardware implementations; the kernel discovers them through DMAR ACPI tables that the motherboard firmware provides. Without those tables, the kernel can’t initialize IOMMU, and Proxmox refuses to assign PCI devices to VMs.
My BIOS reported VT-d enabled. The ACPI tables weren’t there:
ls /sys/firmware/acpi/tables
# APIC DSDT FACP HPET MCFG — no DMAR
ls /sys/kernel/iommu_groups
# (empty)
Older consumer motherboards frequently have this gap. VT-x (CPU virtualization) and VT-d (IOMMU) are separate features, and boards often expose the CPU bits without implementing the firmware tables that make VT-d actually work for PCI passthrough. Server-grade hardware tends to be more reliable here. After a bit of digging with the motherboard I had on hand, it was clear the IOMMU route was a dead end.
LXC passthrough sidesteps this entirely. The GPU driver runs on the Proxmox host. The container accesses it through bind-mounted device nodes with cgroup2 permissions — no IOMMU required, no VFIO modules, no firmware table dependencies. Multiple containers can also share the same GPU simultaneously, which PCI passthrough doesn’t allow.
The trade-off is that the driver stack stays on the host, which creates a version alignment requirement covered below. The upside is that it works on hardware that PCI passthrough can’t touch.
Starting on the Host Side
Before configuring any container, the Proxmox host needs a working NVIDIA driver stack. Everything the container uses comes from the host, so this is the foundation everything else depends on.
Verify the GPU is detected:
lspci | grep -i nvidia
Confirm the driver is running:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.xx Driver Version: 580.xx CUDA Version: 13.x |
| GPU Name Temp Power Usage | Memory Usage | GPU Util |
| 0 GTX 1080 35C 15W / 240W | 0MiB / 8192MiB | 0% |
+-----------------------------------------------------------------------------+
If this works, the host GPU stack is operational.
A GPU status table means the host side is ready. An error means driver installation needs attention before any container configuration will help.
Install the NVIDIA driver on the Proxmox host using the standard NVIDIA installer or via the Debian non-free repository. Do not install the full driver package inside the container — that’s the setup for the version mismatch problem (speaking from experience).
In addition, keep in mind the age of your card. The current (as of March 2026) NVIDIA driver series is 590.xx, which no longer supports the GTX 1080. The last driver series with support for Pascal architecture is 580.xx, which is still compatible with CUDA 13.x. Make sure to install a driver version that supports your specific GPU model.
One additional step worth taking: persistence mode. The NVIDIA driver by default tears down GPU state when the last client process exits, then reinitializes on the next connection. That initialization takes roughly 3 seconds. With persistence running as a systemd service, the driver stays loaded between requests and that cold start drops to under 100ms.
Using the system.d system, create a persistence daemon configuration file in /etc/systemd/system. In my example, I simply named it nvidia-persistence.service:
[Unit]
Description=Enable NVIDIA Persistence Mode
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Enable it with:
sudo systemctl enable --now nvidia-persistence.service
Now, when you run nvidia-smi you should see that persistence is on:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.18 Driver Version: 580.126.18 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1080 --> On | 00000000:01:00.0 Off | N/A |
| 0% 27C P8 9W / 240W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
For Ollama running as a continuous long-lived service, this matters less — Ollama itself holds the GPU loaded while running. Where the delay shows up is on the first request after a container restart or an extended idle period. Worth enabling regardless.
Before moving to container configuration, confirm the device nodes are present on the host:
ls /dev/nvidia*
# /dev/nvidia0
# /dev/nvidiactl
# /dev/nvidia-uvm
# /dev/nvidia-uvm-tools
# /dev/nvidia-caps/nvidia-cap1
# /dev/nvidia-caps/nvidia-cap2
Missing nodes mean the kernel modules haven’t loaded cleanly. Don’t proceed until those are all present.
Container Configuration
The container config lives at /etc/pve/lxc/<CTID>.conf. Two things need to happen: expose the GPU device nodes inside the container, and grant the cgroup2 permissions to use them.
Device nodes
Proxmox 8.1+ supports a dev* syntax that handles both the bind mount and cgroup permissions in a single directive. Use this when available:
dev0: /dev/nvidia0,gid=44
dev1: /dev/nvidiactl,gid=44
dev2: /dev/nvidia-uvm,gid=44
dev3: /dev/nvidia-uvm-tools,gid=44
dev4: /dev/nvidia-caps/nvidia-cap1,gid=44
dev5: /dev/nvidia-caps/nvidia-cap2,gid=44
Group 44 is the video group on Debian-based systems. Verify with getent group video inside the container if you’re unsure.
For older Proxmox versions, the explicit approach with separate cgroup2 directives:
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c <UVM_MAJOR>:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
Major number 195 is always the NVIDIA GPU device. The UVM major number is not fixed — it’s dynamically assigned and can change between reboots. Find the current value before filling it in:
grep nvidia-uvm /proc/devices
Community reports show values ranging anywhere from 234 to 511 across different hardware. Don’t assume it matches whatever value a guide happened to use (also, again, speaking from experience).
lxc.cgroup2 directives are the only option. The older lxc.cgroup.devices.allow syntax will be silently ignored, and you’ll wonder why the GPU isn’t accessible.Boot order
Containers can start before the NVIDIA kernel modules finish initializing during host boot. If Ollama starts before the GPU is ready, it falls back to CPU and stays there until restarted. To avoid this, another shim service can help ensure that it doesn’t kick up until your previously created persistence service confirms the GPU is ready. In our /etc/systemd/system directory, create a new service file named pve-gpu-ready.service.
[Unit]
Description=Wait for NVIDIA GPU to be ready
After=nvidia-persistence.service
Before=pve-container@<CTID>.service
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'until nvidia-smi >/dev/null 2>&1; do sleep 2; done'
[Install]
WantedBy=multi-user.target
Same as before, enable it with:
sudo systemctl enable --now pve-gpu-ready.service
The Version Mismatch Problem
This is where setups that “almost work” actually break, and where the error message is the least helpful it could be.
Even with device nodes correctly mapped and cgroup permissions in place, CUDA initialization fails if the container has NVIDIA libraries installed from the Debian package repositories. Those packages provide a version of libcuda.so and libnvidia-ml.so that almost certainly won’t match the driver version running on the Proxmox host.
CUDA’s kernel interface changes with every minor driver release. The ioctl protocol between userspace libraries and kernel modules is not stable across versions — NVIDIA’s own documentation is explicit about this. When the library version in the container and the kernel module version on the host don’t match, cuInit() returns error 803: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH. The error message isn’t helpful. It doesn’t mention driver versions. It just fails and it’s time to reach for the coffee.
Quick diagnostic from inside the container (via python):
import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print(cuda.cuInit(0))
Output of 0 means clean initialization. Output of 803 means version mismatch.
The fix is to remove any container-side NVIDIA packages and mount the host’s libraries into the container instead, guaranteeing they match the host driver exactly.
Inside the container:
apt remove libcuda1 libnvidia-ml1 2>/dev/null
rm -f /lib/x86_64-linux-gnu/libcuda.so.1 # removes any zero-byte placeholder left behind
ldconfig
Then add bind mounts in the container config, pointing at the host’s actual library files. Mounting to a neutral path avoids symbolic link conflicts inside /lib:
lxc.mount.entry: /lib/x86_64-linux-gnu/libcuda.so.1 usr/local/nvidia-host-libs/libcuda.so.1 none bind,ro,create=file
lxc.mount.entry: /lib/x86_64-linux-gnu/libnvidia-ml.so.1 usr/local/nvidia-host-libs/libnvidia-ml.so.1 none bind,ro,create=file
Inside the container, tell the linker where to find them:
echo "/usr/local/nvidia-host-libs" > /etc/ld.so.conf.d/99-nvidia-host-libs.conf
ldconfig
Restart the container and run the CUDA diagnostic again. It should return 0.
libcuda.so.1 exists inside the container as a zero-byte file after removing the package, the bind mount will silently fail over it. The rm -f step above handles this. Worth checking with ls -la /lib/x86_64-linux-gnu/libcuda* inside the container before restarting.Confirming GPU Acceleration
With device nodes mapped and host libraries mounted, re-verify the stack end-to-end.
CUDA diagnostic from inside the container (via python):
import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print("cuInit:", cuda.cuInit(0))
# Expected: cuInit: 0
Then pull a model and watch GPU activity from the Proxmox host in a separate terminal:
# Inside the container
ollama run llama3.2
# On the Proxmox host
watch -n1 nvidia-smi
VRAM usage should increase as the model loads. If it stays at 0 MiB while inference is running, Ollama fell back to CPU. Confirm which mode it’s using with:
ollama ps
The PROCESSOR column shows 100% GPU, 100% CPU, or a mixed percentage for partially offloaded models. A 7B model at Q4_K_M quantization should fit entirely in 8 GB VRAM with room for the KV cache.
If the model loads but PROCESSOR shows CPU, enable debug logging to see the GPU discovery output:
OLLAMA_DEBUG=1 ollama serve
The output will show whether CUDA initialized, what VRAM was detected, and whether the device was accepted or filtered during Ollama’s GPU discovery process. Missing device nodes and mismatched libraries produce different log entries, which helps narrow down where the configuration broke.
What Eight Gigabytes Gets You
The GTX 1080 is Pascal architecture from 2016. NVIDIA classifies it as legacy; the drivers still support it through CUDA Compute Capability 6.1, which meets Ollama’s minimum requirement of CC 5.0. For inference workloads within the VRAM ceiling, it holds up well.
For 7B and 8B parameter models at Q4_K_M quantization, community benchmarks from LocalScore show roughly 15–20 tokens per second — faster than typical reading speed and fast enough for interactive chat. Prompt processing is slower than modern architectures because Pascal lacks Tensor Cores, but generation speed is workable for the homelab and development use case.
The VRAM ceiling is the real constraint.
Model Sizes on 8 GB VRAM
Speed estimates based on LocalScore benchmarks · actual results vary by model and workload
A 13B model at Q4_K_M needs over 10 GB and won’t fit without CPU offloading, which drops performance to the point where the GPU isn’t doing much useful work. The practical ceiling is 7B–8B class models with sensible quantization.
Two SLM models worth trying out:
- phi-4-mini runs at 3.8B with a 128K context window and performs well assuming you don’t need a large scale of world knowledge (great paired with a RAG).
- gemma-3n-e2b is an “effective 2B” model that with smaller context window of 32K does a surprisingly good job at text generation, summarization, and grammar correction.
For homelab use — testing ideas and running code review pipelines against local code — 15–20 t/s on hardware already in the rack is genuinely useful. The prior post covered why zero-friction local inference changes how you work with AI during the exploratory phase. GPU acceleration closes the performance gap enough that it no longer feels like a compromise.
OLLAMA_FLASH_ATTENTION=1 won’t help here. Flash Attention requires Ampere architecture or newer. Leave that environment variable off for GTX 1080 and earlier generations.Quick Reference
For anyone landing here mid-troubleshoot:
Verify host GPU stack:
lspci | grep -i nvidia && nvidia-smi
Find the UVM major number (needed for legacy lxc.cgroup2 config):
grep nvidia-uvm /proc/devices
Test CUDA from inside the container:
import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print(cuda.cuInit(0)) # 0 = success, 803 = driver mismatch
Confirm Ollama is using the GPU:
ollama ps # check the PROCESSOR column
Remove container-side NVIDIA libs (if getting error 803):
apt remove libcuda1 libnvidia-ml1 2>/dev/null
rm -f /lib/x86_64-linux-gnu/libcuda.so.1
ldconfig
Check the container sees GPU device nodes:
ls /dev/nvidia*
Enable persistence mode on the host:
Create /etc/systemd/system/nvidia-persistence.service:
[Unit]
Description=Enable NVIDIA Persistence Mode
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Enable it with:
sudo systemctl enable --now nvidia-persistence.service
The whole setup is less complex than it sounds written out.
The IOMMU detour took the most time because the BIOS confidently reported VT-d working while the ACPI tables quietly disagreed. Once LXC was the path, the configuration is straightforward — with the one exception that the library mismatch error gives you nothing to go on.
Hopefully cuInit: 0 shows up quickly on your end.







