Back to Quant Hub

Llama 3.2 3B Instruct

3B

Meta Llama 3.2

Tiny but capable. Runs on 4GB VRAM or 8GB RAM, even on phones via llama.cpp.

⬇ 228.7K HF downloads♥ 215 likesbartowski/Llama-3.2-3B-Instruct-GGUF· stats from 6/24/2026

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.8%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	2.2 GB	3.5%	320 tok/s	Calc HF
GGUF	Q8_0	8.5	3.6 GB	0.2%	285 tok/s	Calc HF
AWQ	INT4	4	2.0 GB	4.8%	420 tok/s	Calc HF