Back to Quant Hub

Command R 35B

35B

Cohere

Cohere's RAG-optimised model. Excellent retrieval-augmented generation.

Pro GPU

131K

Max Context

2

Quant Variants

GGUF Q4_K_M

Best Quality

97.0%

Accuracy Retained

Quantization Variants

Per-quant VRAM, quality loss, and inference speed on RTX 4090

FormatLevelBPWVRAMPPL LossSpeedActions
GGUFQ4_K_M4.8522.8 GB3.0%42 tok/s
GPTQINT4420.5 GB4.5%55 tok/s