Point at any endpoint that implements OpenAI's `/v1/chat/completions` with `tool_calls`. Covers Groq's sub-second inference, Together's open-weight catalog, self-hosted Ollama on your LAN, and vLLM clusters. Our streaming adapter handles the incremental `tool_calls` chunk format.
Install
1 · Config
Append to ENABLED_PLUGINS in wrangler.toml:
openai-compatible
2 · Secrets
Run each from your Worker project:
-
wrangler secret put OPENAI_COMPATIBLE_URLBase URL, e.g. https://api.groq.com/openai/v1 -
wrangler secret put OPENAI_COMPATIBLE_KEYoptional — Bearer token for the endpoint. Optional for local Ollama.
Copy-paste .dev.vars template
OPENAI_COMPATIBLE_URL=Base URL, e.g. https://api.groq.com/openai/v1 OPENAI_COMPATIBLE_KEY=# optional — Bearer token for the endpoint. Optional for local Ollama.
3 · Steps
- Add the endpoint's hostname to ALLOWED_HOSTS before your first call.