Back to Quant Hub

Qwen2.5 32B Instruct

32B

Alibaba Qwen2.5

Near-GPT-4 reasoning on a 24GB VRAM card (Q4_K_S). Groundbreaking value.

32.5K HF downloads45 likesQwen/Qwen2.5-32B-Instruct-GGUF· stats from 6/24/2026
Consumer GPUPro GPU

131K

Max Context

3

Quant Variants

GGUF Q4_K_M

Best Quality

97.3%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ3_K_M3.8717.5 GB7.8%52 tok/s
GGUFQ4_K_M4.8522.0 GB2.7%44 tok/s
EXL23.5bpw3.516.4 GB4.8%68 tok/s