Back to Quant Hub

Command R 35B

35B

Cohere

Cohere's RAG-optimised model. Excellent retrieval-augmented generation.

Pro GPU

131K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.0%

Accuracy Retained

Calculate VRAM Hugging Face Compare

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

Format	Level	BPW	VRAM	PPL Loss	Speed	Actions
GGUF	Q4_K_M	4.85	22.8 GB	3.0%	42 tok/s	Calc HF
GPTQ	INT4	4	20.5 GB	4.5%	55 tok/s	Calc HF