The GPU Part I Glossed Over

GPU acceleration in Proxmox LXC, what breaks, and why the 1080 still delivers

12 minute read

In the previous post about setting up Ollama and Open WebUI on Proxmox, I mentioned GPU passthrough as “fiddly” and recommended getting CPU mode working first. That was honest, if abbreviated. Getting the GTX 1080 working inside an LXC container took longer than expected, and nearly all of that time went to two problems I hadn’t anticipated.

  • The IOMMU dead end that led to discovering LXC was actually the better path anyway.
  • A driver library mismatch that produces a CUDA error cryptic enough to cost an afternoon.

This post covers both. Environment for context is Proxmox VE 9, an old NVIDIA GTX 1080 (Pascal architecture, 8 GB GDDR5X) I found in a storage closet, and the Debian-based LXC container I setup in the previous post.

In this post

The IOMMU Dead End

The original plan was VM PCI passthrough. The Proxmox PCI(e) Passthrough documentation describes it well, and it’s the cleaner approach when the hardware supports it — give the VM exclusive GPU access, install drivers inside, done.

PCI passthrough requires the host’s IOMMU to be functional. Intel VT-d and AMD-Vi are the hardware implementations; the kernel discovers them through DMAR ACPI tables that the motherboard firmware provides. Without those tables, the kernel can’t initialize IOMMU, and Proxmox refuses to assign PCI devices to VMs.

My BIOS reported VT-d enabled. The ACPI tables weren’t there:

ls /sys/firmware/acpi/tables
# APIC  DSDT  FACP  HPET  MCFG  — no DMAR
ls /sys/kernel/iommu_groups
# (empty)

Older consumer motherboards frequently have this gap. VT-x (CPU virtualization) and VT-d (IOMMU) are separate features, and boards often expose the CPU bits without implementing the firmware tables that make VT-d actually work for PCI passthrough. Server-grade hardware tends to be more reliable here. After a bit of digging with the motherboard I had on hand, it was clear the IOMMU route was a dead end.

LXC passthrough sidesteps this entirely. The GPU driver runs on the Proxmox host. The container accesses it through bind-mounted device nodes with cgroup2 permissions — no IOMMU required, no VFIO modules, no firmware table dependencies. Multiple containers can also share the same GPU simultaneously, which PCI passthrough doesn’t allow.

The trade-off is that the driver stack stays on the host, which creates a version alignment requirement covered below. The upside is that it works on hardware that PCI passthrough can’t touch.

Starting on the Host Side

Before configuring any container, the Proxmox host needs a working NVIDIA driver stack. Everything the container uses comes from the host, so this is the foundation everything else depends on.

Verify the GPU is detected:

lspci | grep -i nvidia

Confirm the driver is running:

nvidia-smi

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 580.xx        Driver Version: 580.xx        CUDA Version: 13.x  |
    | GPU  Name      Temp   Power Usage | Memory Usage | GPU Util                |
    | 0    GTX 1080  35C    15W / 240W  | 0MiB / 8192MiB | 0%                    |
    +-----------------------------------------------------------------------------+

If this works, the host GPU stack is operational.

A GPU status table means the host side is ready. An error means driver installation needs attention before any container configuration will help.

 

Install the NVIDIA driver on the Proxmox host using the standard NVIDIA installer or via the Debian non-free repository. Do not install the full driver package inside the container — that’s the setup for the version mismatch problem (speaking from experience).

In addition, keep in mind the age of your card. The current (as of March 2026) NVIDIA driver series is 590.xx, which no longer supports the GTX 1080. The last driver series with support for Pascal architecture is 580.xx, which is still compatible with CUDA 13.x. Make sure to install a driver version that supports your specific GPU model.

One additional step worth taking: persistence mode. The NVIDIA driver by default tears down GPU state when the last client process exits, then reinitializes on the next connection. That initialization takes roughly 3 seconds. With persistence running as a systemd service, the driver stays loaded between requests and that cold start drops to under 100ms.

Using the system.d system, create a persistence daemon configuration file in /etc/systemd/system. In my example, I simply named it nvidia-persistence.service:

[Unit]
Description=Enable NVIDIA Persistence Mode
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable it with:

sudo systemctl enable --now nvidia-persistence.service

Now, when you run nvidia-smi you should see that persistence is on:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.18             Driver Version: 580.126.18     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080   -->  On  |   00000000:01:00.0 Off |                  N/A |
|  0%   27C    P8              9W /  240W |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

For Ollama running as a continuous long-lived service, this matters less — Ollama itself holds the GPU loaded while running. Where the delay shows up is on the first request after a container restart or an extended idle period. Worth enabling regardless.

Before moving to container configuration, confirm the device nodes are present on the host:

ls /dev/nvidia*
# /dev/nvidia0 
# /dev/nvidiactl
# /dev/nvidia-uvm
# /dev/nvidia-uvm-tools
# /dev/nvidia-caps/nvidia-cap1
# /dev/nvidia-caps/nvidia-cap2

Missing nodes mean the kernel modules haven’t loaded cleanly. Don’t proceed until those are all present.

Container Configuration

The container config lives at /etc/pve/lxc/<CTID>.conf. Two things need to happen: expose the GPU device nodes inside the container, and grant the cgroup2 permissions to use them.

Device nodes

Proxmox 8.1+ supports a dev* syntax that handles both the bind mount and cgroup permissions in a single directive. Use this when available:

dev0: /dev/nvidia0,gid=44
dev1: /dev/nvidiactl,gid=44
dev2: /dev/nvidia-uvm,gid=44
dev3: /dev/nvidia-uvm-tools,gid=44
dev4: /dev/nvidia-caps/nvidia-cap1,gid=44
dev5: /dev/nvidia-caps/nvidia-cap2,gid=44

Group 44 is the video group on Debian-based systems. Verify with getent group video inside the container if you’re unsure.

For older Proxmox versions, the explicit approach with separate cgroup2 directives:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c <UVM_MAJOR>:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Major number 195 is always the NVIDIA GPU device. The UVM major number is not fixed — it’s dynamically assigned and can change between reboots. Find the current value before filling it in:

grep nvidia-uvm /proc/devices

Community reports show values ranging anywhere from 234 to 511 across different hardware. Don’t assume it matches whatever value a guide happened to use (also, again, speaking from experience).

 
On Proxmox 9, cgroup v1 was removed entirely. The lxc.cgroup2 directives are the only option. The older lxc.cgroup.devices.allow syntax will be silently ignored, and you’ll wonder why the GPU isn’t accessible.

Boot order

Containers can start before the NVIDIA kernel modules finish initializing during host boot. If Ollama starts before the GPU is ready, it falls back to CPU and stays there until restarted. To avoid this, another shim service can help ensure that it doesn’t kick up until your previously created persistence service confirms the GPU is ready. In our /etc/systemd/system directory, create a new service file named pve-gpu-ready.service.

[Unit]
Description=Wait for NVIDIA GPU to be ready
After=nvidia-persistence.service
Before=pve-container@<CTID>.service

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'until nvidia-smi >/dev/null 2>&1; do sleep 2; done'

[Install]
WantedBy=multi-user.target

Same as before, enable it with:

sudo systemctl enable --now pve-gpu-ready.service

The Version Mismatch Problem

This is where setups that “almost work” actually break, and where the error message is the least helpful it could be.

Even with device nodes correctly mapped and cgroup permissions in place, CUDA initialization fails if the container has NVIDIA libraries installed from the Debian package repositories. Those packages provide a version of libcuda.so and libnvidia-ml.so that almost certainly won’t match the driver version running on the Proxmox host.

GPU Driver Stack: Proxmox Host to LXC ContainerProxmox HostGTX 10808 GB VRAM · CC 6.1Pascal GP104NVIDIA Kernel Drivernvidia.ko · nvidia_uvm.konvidia_modeset.koDevice Nodes/dev/nvidia0 · /dev/nvidiactl · /dev/nvidia-uvmHost Librarieslibcuda.so.1 · libnvidia-ml.so.1↕ bind mount boundary ↕LXC Container — No GPU Driver Installed HereMounted Nodesbind · read-write · cgroup2 permitteddev0–dev5 in container configMounted Librariesbind · read-only · version-lockedto host driver — no mismatchOllamaGPU inference · cuInit: 0

CUDA’s kernel interface changes with every minor driver release. The ioctl protocol between userspace libraries and kernel modules is not stable across versions — NVIDIA’s own documentation is explicit about this. When the library version in the container and the kernel module version on the host don’t match, cuInit() returns error 803: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH. The error message isn’t helpful. It doesn’t mention driver versions. It just fails and it’s time to reach for the coffee.

Quick diagnostic from inside the container (via python):

import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print(cuda.cuInit(0))

Output of 0 means clean initialization. Output of 803 means version mismatch.

The fix is to remove any container-side NVIDIA packages and mount the host’s libraries into the container instead, guaranteeing they match the host driver exactly.

Inside the container:

apt remove libcuda1 libnvidia-ml1 2>/dev/null
rm -f /lib/x86_64-linux-gnu/libcuda.so.1   # removes any zero-byte placeholder left behind
ldconfig

Then add bind mounts in the container config, pointing at the host’s actual library files. Mounting to a neutral path avoids symbolic link conflicts inside /lib:

lxc.mount.entry: /lib/x86_64-linux-gnu/libcuda.so.1 usr/local/nvidia-host-libs/libcuda.so.1 none bind,ro,create=file
lxc.mount.entry: /lib/x86_64-linux-gnu/libnvidia-ml.so.1 usr/local/nvidia-host-libs/libnvidia-ml.so.1 none bind,ro,create=file

Inside the container, tell the linker where to find them:

echo "/usr/local/nvidia-host-libs" > /etc/ld.so.conf.d/99-nvidia-host-libs.conf
ldconfig

Restart the container and run the CUDA diagnostic again. It should return 0.

 
If libcuda.so.1 exists inside the container as a zero-byte file after removing the package, the bind mount will silently fail over it. The rm -f step above handles this. Worth checking with ls -la /lib/x86_64-linux-gnu/libcuda* inside the container before restarting.

Confirming GPU Acceleration

With device nodes mapped and host libraries mounted, re-verify the stack end-to-end.

CUDA diagnostic from inside the container (via python):

import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print("cuInit:", cuda.cuInit(0))
# Expected: cuInit: 0

Then pull a model and watch GPU activity from the Proxmox host in a separate terminal:

# Inside the container
ollama run llama3.2

# On the Proxmox host
watch -n1 nvidia-smi

VRAM usage should increase as the model loads. If it stays at 0 MiB while inference is running, Ollama fell back to CPU. Confirm which mode it’s using with:

ollama ps

The PROCESSOR column shows 100% GPU, 100% CPU, or a mixed percentage for partially offloaded models. A 7B model at Q4_K_M quantization should fit entirely in 8 GB VRAM with room for the KV cache.

If the model loads but PROCESSOR shows CPU, enable debug logging to see the GPU discovery output:

OLLAMA_DEBUG=1 ollama serve

The output will show whether CUDA initialized, what VRAM was detected, and whether the device was accepted or filtered during Ollama’s GPU discovery process. Missing device nodes and mismatched libraries produce different log entries, which helps narrow down where the configuration broke.

What Eight Gigabytes Gets You

The GTX 1080 is Pascal architecture from 2016. NVIDIA classifies it as legacy; the drivers still support it through CUDA Compute Capability 6.1, which meets Ollama’s minimum requirement of CC 5.0. For inference workloads within the VRAM ceiling, it holds up well.

For 7B and 8B parameter models at Q4_K_M quantization, community benchmarks from LocalScore show roughly 15–20 tokens per second — faster than typical reading speed and fast enough for interactive chat. Prompt processing is slower than modern architectures because Pascal lacks Tensor Cores, but generation speed is workable for the homelab and development use case.

The VRAM ceiling is the real constraint.

Model Sizes on 8 GB VRAM

1–3B models
Q4_K_M · ~1–2 GB
30–60+ t/s
✓ Fits easily
7B–8B models
Q4_K_M · ~4–5 GB
15–20 t/s
✓ Sweet spot
7B–8B models
Q8_0 · ~7–8 GB
7–12 t/s
~ Tight on VRAM
13B+ models
Q4_K_M · ~10+ GB
1–3 t/s (CPU offload)
✗ Exceeds VRAM

Speed estimates based on LocalScore benchmarks · actual results vary by model and workload

A 13B model at Q4_K_M needs over 10 GB and won’t fit without CPU offloading, which drops performance to the point where the GPU isn’t doing much useful work. The practical ceiling is 7B–8B class models with sensible quantization.

Two SLM models worth trying out:

  • phi-4-mini runs at 3.8B with a 128K context window and performs well assuming you don’t need a large scale of world knowledge (great paired with a RAG).
  • gemma-3n-e2b is an “effective 2B” model that with smaller context window of 32K does a surprisingly good job at text generation, summarization, and grammar correction.

For homelab use — testing ideas and running code review pipelines against local code — 15–20 t/s on hardware already in the rack is genuinely useful. The prior post covered why zero-friction local inference changes how you work with AI during the exploratory phase. GPU acceleration closes the performance gap enough that it no longer feels like a compromise.

 
One thing specific to Pascal architecture cards: OLLAMA_FLASH_ATTENTION=1 won’t help here. Flash Attention requires Ampere architecture or newer. Leave that environment variable off for GTX 1080 and earlier generations.

Quick Reference

For anyone landing here mid-troubleshoot:

Verify host GPU stack:

lspci | grep -i nvidia && nvidia-smi

Find the UVM major number (needed for legacy lxc.cgroup2 config):

grep nvidia-uvm /proc/devices

Test CUDA from inside the container:

import ctypes
cuda = ctypes.CDLL("libcuda.so.1")
print(cuda.cuInit(0))   # 0 = success, 803 = driver mismatch

Confirm Ollama is using the GPU:

ollama ps   # check the PROCESSOR column

Remove container-side NVIDIA libs (if getting error 803):

apt remove libcuda1 libnvidia-ml1 2>/dev/null
rm -f /lib/x86_64-linux-gnu/libcuda.so.1
ldconfig

Check the container sees GPU device nodes:

ls /dev/nvidia*

Enable persistence mode on the host: Create /etc/systemd/system/nvidia-persistence.service:

[Unit]
Description=Enable NVIDIA Persistence Mode
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable it with:

sudo systemctl enable --now nvidia-persistence.service

The whole setup is less complex than it sounds written out.

The IOMMU detour took the most time because the BIOS confidently reported VT-d working while the ACPI tables quietly disagreed. Once LXC was the path, the configuration is straightforward — with the one exception that the library mismatch error gives you nothing to go on.

Hopefully cuInit: 0 shows up quickly on your end.