Open Source · Edge Deployment · Geek-First

Quantize
Everything.

Bridge the gap between research papers and real-world deployment. Run state-of-the-art LLMs on consumer hardware.

GGUF

AWQ

EXL2

GPTQ

HQQ

& more

30Models Indexed

5Formats Tracked

33GPUs in Database

98.3%Avg Accuracy Retained

Editor's Picks

Curated standout quant releases — click to view model details

Community adoption · this week

vs last week

Editorial estimate based on Hugging Face GGUF download share and r/LocalLLaMA discussion volume — not a live feed

Precise memory requirements for any model × quant × context combination. Red/yellow/green hardware verdict.

Generate ready-to-run llama.cpp, Ollama, vLLM, ExLlamaV2 commands. One-liner or Docker Compose.

Answer 3 questions — get a personalised GGUF, AWQ, or EXL2 recommendation.

Compare quantization formats across 6 key dimensions

Last updated 2026-06-24

2026-06-24Expanded model index to 30+ entries; added format wizard, hardware profile, ExLlamaV2 CLI, SEO (sitemap/OG), data transparency
2026-06-24Model detail pages, GPU reverse lookup, shareable VRAM calculator URLs, real homepage stats
2025-06-10Initial launch: Quant Hub, VRAM calculator, CLI generator, benchmarks, cookbook

All data is manually curated and verified against community sources. Always cross-check with official Hugging Face model cards before deployment.