Back to Quant Hub

Nous Hermes 3 Llama 3.1 8B

8B

NousResearch

Fine-tuned Llama 3.1 8B with improved roleplay and instruction following.

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

2

Quant Variants

EXL2 4.65bpw

Best Quality

97.7%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.855.7 GB3.0%148 tok/s
EXL24.65bpw4.655.4 GB2.3%232 tok/s