Back to Quant Hub

DeepSeek-V2-Lite Chat

16B

DeepSeek

MoE general model (~2.4B active). Long context and strong multilingual chat.

Consumer GPUMac / Apple Silicon

164K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.0%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8511.0 GB3.0%142 tok/s
AWQINT449.6 GB4.0%188 tok/s