GPU-Enabled Platforms on Kubernetes
In Linux, user-space applications can't interact with hardware directly. Every interaction must go through the Linux kernel via system calls. GPUs break this model completely -- they bypass the kernel, manage their own memory, and resist every isolation mechanism containers rely on.
This ebook starts from first principles -- how containers actually work at the kernel level -- and builds up to why GPU multi-tenancy is fundamentally harder than anything else in Kubernetes.

Get your free copy
What's Inside
Six parts that take you from Linux kernel fundamentals to production GPU platform architecture:
- Foundations -- How GPUs Meet Kubernetes: How containers work through syscalls, cgroups, and namespaces. Why GPUs break every assumption about resource isolation. How device plugins bridge GPUs into Kubernetes.
- Why GPU Multi-Tenancy Is Hard: The trust problem -- when two teams share a GPU, one can crash the other's workloads. CUDA memory isn't isolated. There's no cgroup for GPU compute.
- Orchestrating GPU Sharing: Time-slicing, MPS, and how Kubernetes manages turn-taking on GPU hardware. What happens when two pods try to use the same GPU simultaneously.
- Hardware Isolation and Enforcement: MIG (Multi-Instance GPU), HAMi, and the trade-offs between software-level and hardware-level isolation. Why MIG profiles can't be changed without draining the node.
- Monitoring GPU Clusters: Why
nvidia-smishows 87% utilisation when you're doing almost nothing. Real metrics that matter: SM activity, tensor core utilisation, memory bandwidth. - Multi-Tenant GPU Platforms with vCluster: Architecting GPU infrastructure with virtual Kubernetes clusters for isolation and efficiency.
Who This Is For
This ebook is for platform engineers building internal GPU platforms and infrastructure teams running AI/ML workloads on Kubernetes. If you need to give multiple teams access to expensive GPU hardware without them stepping on each other, this is for you.