Gemma 2 2B Instruct

Google Gemma 2

Ultra-compact Gemma 2. Runs on 4GB VRAM; great for edge prototyping.

⬇ 301.8K HF downloads♥ 98 likesbartowski/gemma-2-2b-it-GGUF· stats from 6/24/2026

Consumer GPUMac / Apple SiliconCPU / VPS

Max Context

Quant Variants

GGUF Q8_0

Best Quality

99.7%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	2.0 GB	3.8%	380 tok/s	Calc HF
GGUF	Q8_0	8.5	3.2 GB	0.3%	320 tok/s	Calc HF
AWQ	INT4	4	1.8 GB	5.0%	450 tok/s	Calc HF