Quantize
Everything.
Bridge the gap between research papers and real-world deployment. Run state-of-the-art LLMs on consumer hardware.
Editor's Picks
Curated standout quant releases — click to view model details
- NEWQwen2.5 72B InstructGGUFQ4_K_M · 43.6 GB·bartowskiRTX 4090 ×2
- HOTDeepSeek-R1-Distill-Qwen-14BEXL24.65bpw · 9.8 GB·turboderpRTX 4090
- NEWLlama 3.3 70B InstructGGUFQ5_K_M · 50.1 GB·unslothA100 80G
- UPDMistral Small 24B InstructAWQINT4 · 14.2 GB·city96RTX 3090
- HOTQwen2.5-Coder 32B InstructGGUFQ4_K_M · 22.0 GB·bartowskiRTX 4090
Format Heat Index
Community adoption · this week
- 1GGUF89%+3%
- 2AWQ45%+7%
- 3EXL232%0%
- 4GPTQ28%-2%
- 5HQQ18%+12%
vs last week
Editorial estimate based on Hugging Face GGUF download share and r/LocalLLaMA discussion volume — not a live feed
Quick Tools
VRAM Calculator
Precise memory requirements for any model × quant × context combination. Red/yellow/green hardware verdict.
CLI Generator
Generate ready-to-run llama.cpp, Ollama, vLLM, ExLlamaV2 commands. One-liner or Docker Compose.
Format Wizard
Answer 3 questions — get a personalised GGUF, AWQ, or EXL2 recommendation.
Format Intelligence Radar
Compare quantization formats across 6 key dimensions
Data Changelog
Last updated 2026-06-24
- 2026-06-24Expanded model index to 30+ entries; added format wizard, hardware profile, ExLlamaV2 CLI, SEO (sitemap/OG), data transparency
- 2026-06-24Model detail pages, GPU reverse lookup, shareable VRAM calculator URLs, real homepage stats
- 2025-06-10Initial launch: Quant Hub, VRAM calculator, CLI generator, benchmarks, cookbook
All data is manually curated and verified against community sources. Always cross-check with official Hugging Face model cards before deployment.