Back to Quant Hub

Llama 3.1 70B Instruct

70B

Meta Llama 3.1

Meta's frontier 70B model. Requires 40GB+ VRAM; dual 3090 or M2 Ultra.

15.6K HF downloads71 likesbartowski/Meta-Llama-3.1-70B-Instruct-GGUF· stats from 6/24/2026
Pro GPUMac / Apple Silicon

131K

Max Context

4

Quant Variants

GGUF Q5_K_M

Best Quality

98.8%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8543.5 GB2.8%38 tok/s
GGUFQ5_K_M5.6849.8 GB1.2%32 tok/s
AWQINT4438.2 GB3.9%55 tok/s
EXL23.5bpw3.533.4 GB5.2%62 tok/s