Back to Quant Hub
Llama 3.1 8B Instruct
8BMeta Llama 3.1
Meta's flagship 8B model with 128K context. Best-in-class for local deployment.
Consumer GPUMac / Apple SiliconCPU / VPS
131K
Max Context
6
Quant Variants
GGUF Q8_0
Best Quality
99.9%
Accuracy Retained
Quantization Variants
Per-quant VRAM, quality loss, and inference speed on RTX 4090