Back to Quant Hub

Qwen2.5 72B Instruct

72B

Alibaba Qwen2.5

Flagship Qwen2.5. Requires dual 4090 or A100 80G. Exceptional reasoning at scale.

3.0K HF downloads44 likesQwen/Qwen2.5-72B-Instruct-GGUF· stats from 6/24/2026
Pro GPU

131K

Max Context

4

Quant Variants

GGUF Q5_K_M

Best Quality

98.9%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8543.6 GB2.5%28 tok/s
GGUFQ5_K_M5.6850.1 GB1.1%24 tok/s
AWQINT4438.5 GB3.5%42 tok/s
EXL23.5bpw3.533.8 GB4.8%48 tok/s