IntermediateServer / VPS 11 min read

Nginx Reverse Proxy for Local LLM APIs

Put Ollama or llama.cpp behind Nginx with TLS, rate limiting, and a stable /v1 endpoint for your apps.

NginxAPITLSOllamallama.cpp

Basic proxy to Ollama

Ollama exposes an OpenAI-compatible /v1/chat/completions endpoint. Proxy it with long timeouts — LLM responses are slow.

nginx

server {
    listen 443 ssl;
    server_name llm.example.com;

    ssl_certificate     /etc/letsencrypt/live/llm.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm.example.com/privkey.pem;

    location /v1/ {
        proxy_pass http://127.0.0.1:11434/v1/;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        client_max_body_size 10m;
    }
}

Rate limiting

Add a limit_req zone to prevent abuse on a public-facing VPS. Adjust rate for your expected users.

nginx

limit_req_zone $binary_remote_addr zone=llm:10m rate=10r/m;

location /v1/ {
    limit_req zone=llm burst=5 nodelay;
    proxy_pass http://127.0.0.1:11434/v1/;
    proxy_read_timeout 300s;
}

Deployment guides are educational. Each model is subject to its own license — read the official Hugging Face model card before downloading or deploying.