Back to Quant Hub

Mixtral 8x7B Instruct

47B MoE

Mistral AI

Classic MoE model. ~13B active params per token; needs 32GB+ VRAM for Q4.

Pro GPU

33K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.2%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8528.5 GB2.8%48 tok/s
AWQINT4425.2 GB3.8%62 tok/s