Back to Cookbook
BeginnerEdge / Local 8 min read

8GB GPU Starter Guide: 3060 / 4060 / 3070

The most common local LLM hardware tier — which models, quants, and context lengths actually fit in 8GB VRAM.

8GB VRAMRTX 3060RTX 4060GGUFOllama

What fits comfortably

At 4K context, 8GB cards handle 7–8B models at Q4_K_M with headroom. Push to 3B at Q8 or 14B at Q3 only if you lower context to 2K. Avoid 32B+ entirely on 8GB.

text
✓ Qwen2.5 7B Q4_K_M     → ~7.4 GB @ 4K ctx
✓ Llama 3.1 8B Q4_K_M    → ~7.7 GB @ 4K ctx
✓ Phi-4 Mini Q4_K_M        → ~4.8 GB @ 4K ctx
△ DeepSeek-R1 14B Q3_K_M  → ~9.5 GB @ 2K ctx (tight)
✗ Qwen2.5 32B any quant    → needs 16GB+

Quick start with Ollama

Ollama auto-picks a quant for your VRAM. Start small, verify speed, then try a larger model. Use /api/ps to check live VRAM usage.

bash
ollama pull qwen2.5:7b
ollama run qwen2.5:7b

# Check loaded models and VRAM
curl http://localhost:11434/api/ps
Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.