Self-Hosted AI Stack on Rocky Linux: Ollama, Open WebUI, and GPU Monitoring from Bare Metal to Production
Stop cobbling together outdated blog posts and half-finished YouTube tutorials. Most "self-host Ollama" guides get you to a running container and stop. They don't cover GPU passthrough in a hypervisor, NGINX reverse proxying with TLS, Prometheus/Grafana monitoring for GPU thermals and inference latency, or what to do when nvidia-smi shows your VRAM but Ollama can't see it.
This guide does.
What you'll have at the end: A fully production-ready AI stack on Rocky Linux 9 — Ollama serving local LLMs with GPU acceleration, Open WebUI behind NGINX with TLS, and a complete monitoring stack (Prometheus, Grafana, NVIDIA GPU exporter) with a pre-built dashboard. Plus fail2ban, automated backups, and log rotation so it doesn't quietly eat your disk.
What's inside:
- 7 chapters + appendix (~2,200 lines of markdown)
- Chapter-by-chapter walkthrough from GPU driver installation through monitoring and hardening
- Architecture decisions explained — why each component, why this order, what the trade-offs are
- A full troubleshooting chapter based on real failures, not theoretical ones
- Quick-reference appendix: every file path, port, systemd unit, and common command in one place
Who this is for: Intermediate sysadmins and home lab enthusiasts who know their way around Linux but want an opinionated, end-to-end guide that actually gets them to production — not just "installed."