Back to Quant Hub

Llama 3.2 3B Instruct

3B

Meta Llama 3.2

Tiny but capable. Runs on 4GB VRAM or 8GB RAM, even on phones via llama.cpp.

228.7K HF downloads215 likesbartowski/Llama-3.2-3B-Instruct-GGUF· stats from 6/24/2026
Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.8%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.852.2 GB3.5%320 tok/s
GGUFQ8_08.53.6 GB0.2%285 tok/s
AWQINT442.0 GB4.8%420 tok/s