Back to Quant Hub

Nous Hermes 3 Llama 3.1 8B

8B

NousResearch

Fine-tuned Llama 3.1 8B with improved roleplay and instruction following.

Consumer GPUMac / Apple SiliconCPU / VPS

131K

Max Context

2

Quant Variants

EXL2 4.65bpw

Best Quality

97.7%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	5.7 GB	3.0%	148 tok/s	Calc HF
EXL2	4.65bpw	4.65	5.4 GB	2.3%	232 tok/s	Calc HF