Back to Quant Hub

Mixtral 8x7B Instruct

47B MoE

Mistral AI

Classic MoE model. ~13B active params per token; needs 32GB+ VRAM for Q4.

Pro GPU

33K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.2%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	28.5 GB	2.8%	48 tok/s	Calc HF
AWQ	INT4	4	25.2 GB	3.8%	62 tok/s	Calc HF