Back to Quant Hub

Gemma 2 2B Instruct

2B

Google Gemma 2

Ultra-compact Gemma 2. Runs on 4GB VRAM; great for edge prototyping.

301.8K HF downloads98 likesbartowski/gemma-2-2b-it-GGUF· stats from 6/24/2026
Consumer GPUMac / Apple SiliconCPU / VPS

8K

Max Context

3

Quant Variants

GGUF Q8_0

Best Quality

99.7%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.852.0 GB3.8%380 tok/s
GGUFQ8_08.53.2 GB0.3%320 tok/s
AWQINT441.8 GB5.0%450 tok/s