This plan prioritizes contributor experience and quality controls while expanding Cloudflare-native integrations and Project Think primitives.
§Phase 1 — Governance and contribution quality (Week 1) ✅
✅ Published contribution standards and anti-slop rules.
✅ Added issue and PR templates requiring scope, tests, and security impact.
✅ Defined maintainer review checklist and ownership model.
§Phase 2 — Runtime hardening (Weeks 1-2) ✅
✅ Enforced outbound host allow-list in plugin HTTP helpers.
✅ Added startup validation that each enabled plugin has required secrets.
✅ Added request schema validation for
/invoke/{pluginId}payloads.✅ Added structured error IDs for easier debugging/support.
§Phase 3 — Plugin SDK and examples (Weeks 2-3) ✅
✅ Added a plugin authoring guide with a minimal template plugin.
✅ Added test utilities/mocks for plugin integration tests.
✅ Published compatibility contract for capabilities and metadata.
✅ Fixed scaffold template bug (
create_plugin.mjs).
§Phase 4 — Integrations and MCP expansion (Weeks 3-4) ✅
✅ Expanded Cloudflare API MCP plugin to cover common ops workflows.
✅ Deepened
mpp.devplugin (model listing, health checks, error mapping).✅ Added Artifacts plugin (Git-for-agents).
✅ Added connector architecture for additional CF-native services.
✅ Added
workers-ai,mcp-client,browser,sandboxfirst-party plugins.
§Phase 5 — Release and operations readiness (Week 4) ✅
✅ CI for typecheck/tests and required PR checks.
✅ Versioning and changelog policy.
✅ Deployment runbook for Cloudflare Workers.
§Phase 6 — Observability and reliability (Week 5) ✅
✅ Structured audit logs for plugin/skill invocations and request failures.
✅ Error-rate counters + alert-check endpoint with optional webhook fanout.
✅ Incident response playbook templates.
§Phase 7 — Project Think primitives (Weeks 6-7) ✅
Shipped
✅
AgentSessionDOextendsDurableObject<Env>with real SQLite storage (ctx.storage.sql).✅
messages+fiberstables with indexes onparent_id,created_at,status.✅
/sessions/{name}/*HTTP surface:init,messages,tree,fork,compact,search.✅ Durable execution (
POST /sessions/{name}/fibers) with idempotency-key upsert — first writer wins.✅ Workers AI plugin with AI Gateway routing (
AI_GATEWAY_ID).✅ MCP client plugin (outbound JSON-RPC, SSE fallback).
✅ Browser Rendering plugin (tier-3) and Sandbox plugin (tier-4).
✅ Runtime instance cached per-env via WeakMap.
✅
/playgroundHTML UI and/openapi.jsondocument.
Roadmap (Phase 7.x)
✅ FTS5 virtual table with trigger-synced
messages_fts(replacesinstr(lower(), lower())).⏳
POST /sessions/{name}/fibers/{id}/execute— DO invokes a registered handler (service binding) and stores the result, with retry via alarms.⏳ WebSocket streaming for
chat— hibernation-friendly DO WebSocket.
§Phase 8 — One-click deploy and DX polish (Week 8) ✅
Shipped
✅ Deploy to Cloudflare button +
package.json#cloudflare.bindingsdescribing each binding (the official Cloudflare mechanism — no fictional manifest).✅
nodejs_compat_v2compatibility flag (current best practice).✅
.dev.vars.exampledrives secret prompting during deploy.✅ Cross-platform Node-based
cf:bootstrapwizard.✅ Interactive
/playgroundand/openapi.json.✅ Typed SDK client at
src/sdk/client.ts(OpenThinkClient,SessionHandle).✅ Skill catalog covers all first-party plugins.
Roadmap
⏳ Publish
@open-think/sdkas a standalone npm package.⏳
npx create-open-think@latestscaffolding CLI.⏳ VS Code extension reading
/openapi.jsonfor typed skill calls from the editor.
§Phase 9 — Ecosystem and migration paths (Week 9+)
Shipped
✅
anthropicplugin (Claude Messages API).✅
openai-compatibleplugin (Groq / Together / Ollama / any OpenAI-compatible endpoint).
Roadmap
⏳ Drop-in Claude Code / OpenClaw / Hermes connector plugins (bring your existing agent flows to Open Think without rewriting).
⏳
agents-sdk-compatplugin bridging to Cloudflare's officialagentspackage.⏳ Community plugin registry with verified publisher badges and
schema.jsontyped contracts.⏳ Self-authored extensions — LLM writes a TypeScript tool, we compile-and-deploy to a per-session Dynamic Worker.
§Phase 10 — Meta-agent + UI (Week 10) ✅
Shipped
✅
adminplugin withintrospect,health-check,suggest-plugins,env-templateactions.✅
RuntimeIntrospectionhandle threaded throughPluginContext.✅ Conductor meta-agent (
POST /conductor/message) — reads the live skill catalog, proposes skill invocations via fencedopen-think-actionblocks, persists toAgentSessionDO.✅ New
/appUI — editorial broadsheet aesthetic (Fraunces + Newsreader + JetBrains Mono, ivory paper, ink black, Cloudflare-orange accent), keyboard-nav (g p,g c, …), approve-to-run exhibit cards, debug dashboard, session inspector, skill runner.✅ API responses normalized to
{ ok, data }(UI + SDK depend on consistent shape).
§Phase 17 — Cross-provider streaming + update-style rollback (Week 17) ✅
Shipped
✅
src/tool-stream-types.ts— provider-agnosticLoopEventunion +ToolLoopConfigsurface. Every streaming adapter emits the same shape so downstream consumers (HTTP/SSE handler, UI, durable hubs) are provider-free.✅
src/openai-stream.ts—runOpenAICompatibleToolStreamparses OpenAI SSE, accumulatestool_callsacross delta chunks (id only on first, arguments appended), drives the full multi-turn tool-use loop, honors thefinish_reason: "tool_calls"signal, and supports per-turn extra headers (e.g.cf-aig-authorizationfor gateway BYOK).✅
src/anthropic-stream.tsrefactored to import shared types; no behavior change, unlocks provider-polymorphic consumers.✅
src/conductor-tool-stream.tsgained aselectProvider()+createGenerator()dispatch. Default priority:anthropic→cf-ai-gateway→openai-compatible. Response headers carryx-stream-id+x-stream-provider. cf-ai-gateway path reuses the OpenAI adapter against the compat endpoint (/v1/{acct}/{gw}/compat).✅
/appUI — stream mode auto-picks the first capable provider from the enabled plugin set; label in the assistant bubble shows which one drove the turn.✅
src/rollback.ts—RegistryEntrypromoted to a discriminated union ofCreateDeleteEntry+UpdateEntry. Three update entries shipped:dns_record_update,kv_namespace_update,hyperdrive_config_edit.✅ New
getUpdateCaptureStrategy()API returns{captureCall, buildHint}. Themcp-clientplugin runs the capture tool before any registered update-style mutation, attaches a rollback hint that re-applies the captured fields. Capture-failure path leaves the mutation in place but skips the hint (non-fatal).✅
listMcpRollbackSupport()now tags each entry with itskindso the UI can render create-delete vs. restore Undo cards differently if desired.✅ 13 new tests (5 OpenAI streaming + 7 update-style rollback + 1 listMcpRollbackSupport shape) covering: text-only turn, tool_call chunk accumulation, dangerous-skill hold, extra-headers injection,
[DONE]sentinel handling, update capture/restore for all three shipped entries, null-safety on missing ids + empty captures. Total tests 102 → 115.✅
docs/HELM.mdupdated with cross-provider support;docs/ROLLBACK.mdadds update-style registry table + pre-mutation capture flow.
Roadmap
⏳ Google AI Studio native streaming adapter (Gemini SSE has its own shape).
⏳ Multi-subscriber tool-streaming — promote the streaming generators into the
StreamHubDOso multi-tab works for stream-tools, not just/conductor/stream.⏳ More update-style entries (
worker_settings_put,zone_setting_update,queue_settings_update) once we verify CF MCP tool names.⏳ Before/after diff renderer in the UI — show captured state side-by-side with current so "undo" is visually concrete.
⏳ Multi-step batch rollback.
§Phase 16 — Native tool-use streaming + MCP rollback (Week 16) ✅
Shipped
✅
src/anthropic-stream.ts— async generator that talks to Anthropic's Messages API withstream: true, parses SSE frames, drives the tool-use loop end-to-end, and yields typedLoopEventunions (text-delta, tool-use-start/input/stop, tool-result, tool-held, turn-start/stop, loop-done, error).✅
src/conductor-tool-stream.ts— HTTP handler that loads session history, primes the generator, and pipes every event into a browser SSE response. Persists the final assistant text back to the session DO onloop-done.✅ New route
POST /conductor/stream-tools; the/appstream mode auto-picks it whenanthropicis in the enabled plugins, falls back to/conductor/stream(codex app-server) otherwise.✅ Selective-mode dangerous-skill hold: the loop emits
tool-heldand continues without executing, letting the UI render an approve-to-run exhibit card mid-stream.✅
src/rollback.ts— registry of 8 Cloudflare MCP inverse operations (KV, D1, R2, Hyperdrive, AI Gateway, DNS, Workers, Queues) withplanMcpRollback(name, input, response)that synthesises the inverse call or returns null.✅
mcp-clientplugin now callsplanMcpRollbackon every successfulcall-tooland attaches_rollbackto the result payload.✅
SessionMessageschema extended withrollback?+rollbackStatus?;AgentSessionDOcolumns added withALTER TABLEguards so existing sessions migrate cleanly.✅ New session routes:
GET /sessions/:name/rollbacks,POST /sessions/:name/apply-rollback, plusGET /rollback/supportfor registry introspection.✅
/appUI — stream mode distinguishes Anthropic (stream-tools) vs codex (stream) visually; text deltas land in the bubble as they arrive; dangerous tool-use blocks appear as exhibit cards mid-stream; post-turn rollback cards render with warn-colored borders and an ↻ Undo button.✅ 11 new tests (4 Anthropic streaming + 7 rollback registry) covering SSE parse, text-only turn, tool-loop-with-feedback, selective-mode hold, missing-key error, 6 registry mappings. Total test count 89 → 100.
✅
docs/ROLLBACK.md(new) and streaming-tool-use section added todocs/HELM.md.
Roadmap
⏳ OpenAI-compatible streaming adapter emitting the same
LoopEventshape — unlocks Groq/Together/Ollama guided setup.⏳ cf-ai-gateway adapter that routes Anthropic-native streaming through the gateway for observability + caching.
⏳ Multi-subscriber tool-streaming (current handler is single-sub) — requires moving the generator into
StreamHubDOand promoting the event union to SSE.⏳ Before/after diffing for update-style MCP mutations.
⏳ Multi-step rollback (undo last N as a batch).
⏳ Rollback preview / dry-run mode.
§Phase 15 — Multi-subscriber streaming + Settings panel (Week 15) ✅
Shipped
✅
src/durable/eventBus.ts— bounded replay buffer + fan-out, fully unit-testable in isolation. 6 tests covering fan-out, late-joiner replay, buffer bounds, close semantics, post-close rejection, explicit disconnect.✅
src/durable/streamHub.ts—StreamHubDOowns one upstream WebSocket per turn, fans frames out viaEventBus, persists final assistant text back to the session DO, retires cleanly on terminal frame or interrupt.✅ Wrangler migration
v2addsStreamHubDOas a new (non-SQLite) DO class; types + binding (STREAM_HUBS) wired.✅ New stream routes:
POST /conductor/streamcreates a hub and opens first subscription (unchanged UX),GET /conductor/stream/:idreconnects,POST /conductor/stream/:id/interruptcancels viaturn/interrupt,GET /conductor/stream/:id/statepeeks without subscribing.✅ UI — stream mode bubble gains a ◈ Cancel turn button, stream id label for debugging, graceful cancellation event handling.
✅
src/setup.ts—collectStatus()aggregates 10 capability checks with configured/missing detection,generateSnippet()emits merge-friendly wrangler.toml + .dev.vars fragments,guidedStart()creates a setup session primed with full runtime context + MCP availability.✅ New routes
GET /setup/status,POST /setup/snippet,POST /setup/guided-start.✅ New
/appSettings tab (g tkeyboard shortcut): readiness score bar, recommended actions callout, guided-setup CTA that primes the Conductor and jumps to#/conductor, capability checklist with per-card status + hints, live snippet generator with checkbox picker and two-column wrangler/.dev.vars panes.✅ 12 new tests covering EventBus (6) and setup (6). Total test count up to 93.
✅ Docs: new
docs/SETUP.md, streaming section indocs/HELM.mdrewritten to reflect DO-backed hub + cancellation.
Roadmap
⏳ Stream the guided-setup session itself via
/conductor/streamso MCP tool calls animate live.⏳ Bulk-apply snippets — a button that calls CF API directly to create all listed resources in one shot (needs audit trail).
⏳ Rollback — capture pre-turn state so unintended MCP mutations can be reversed.
⏳ Hub idle eviction — currently DOs auto-retire on close; consider a hard TTL for safety.
§Phase 14 — Streaming + companion bridge (Week 14) ✅
Shipped
✅
rpcStream()insrc/oauth/codexRpc.ts— async generator that opens one outbound WebSocket and yields every incoming JSON-RPC frame until the terminal result/error for the request id.✅
src/conductor-stream.ts— newPOST /conductor/streamhandler that:- Validates
CODEX_APP_SERVER_URLisws(s)://. - Persists the user message to the session DO.
- Starts a thread via
thread/start, then pipesturn/startframes into a Server-Sent Events response. - Emits five distinct SSE event types:
ready,thread,notification,result,error,done. - Persists the final assistant text back to the session DO when the stream closes cleanly.
- Validates
✅ UI — new stream mode on the Conductor composer. Uses
fetch().body.getReader()+ SSE parsing, rendersturn/deltatext live into the assistant bubble while the full event ledger appears in the trace panel.✅
companion/codex-bridge/— portable Node.js bridge (~250 lines) that wrapscodex app-serverstdio:POST /rpc— single JSON-RPC request/responsePOST /stream— SSE stream of every response frameGET /healthz— liveness check- Bearer-auth via
BRIDGE_TOKEN - Graceful signal handling + 120s per-RPC timeout
✅ Dockerfile + README — three deployment recipes (Cloudflare Containers, Fly.io / Render, local + cloudflared). Healthcheck baked into the image.
✅ Docs: streaming section added to
docs/HELM.md, companion deployment replaces the old stub indocs/CODEX_APPSERVER.md.
Roadmap
⏳ DO-held persistent WebSocket to the app-server — lets multiple browser tabs watch one turn and avoids re-upgrading per request.
⏳ Native Cloudflare Sandbox deployment (API is still in beta; the container recipe covers 95% of users in the meantime).
⏳ Turn-cancellation from the UI (forwards
turn/interruptover the live stream).⏳ Client-side replay — resume an interrupted stream by session + threadId without losing progress.
§Phase 13 — Fallback chains + Codex app-server bridge (Week 13) ✅
Shipped
✅
cf-ai-gateway:chat-with-fallbacks— pass an orderedmodels: string[]; each entry isprovider/model-name; return on first success with a fullattempts[]ledger. Per-model timeout supported.✅ New skill
cf-gateway-chat-fallbackswith typedinputSchema.✅ Conductor
fallbackModels?: string[](+perModelTimeoutMs) that bypasses the primary provider and routes through the fallback chain, with the winning model surfaced viaproviderUsed: "cf-ai-gateway/<model>".✅
codexplugin gainsapp-serverauth mode (takes precedence over api-key / chatgpt-tokens).✅
src/oauth/codexRpc.ts— JSON-RPC 2.0 client that auto-detects transport by URL scheme (http(s)://→ POST,ws(s)://→ single-shot WebSocket upgrade viafetch(url, { headers: { Upgrade: "websocket" } })).✅ Five new Codex skills:
codex-thread-start,codex-thread-list,codex-models,codex-rpc(raw passthrough,dangerous: true), pluscodex-chattransparently upgrades tothread/start+turn/startin app-server mode.✅ Env + binding metadata for
CODEX_APP_SERVER_URL,CODEX_APP_SERVER_TOKEN,CODEX_APP_SERVER_TIMEOUT_MS.✅
docs/CODEX_APPSERVER.md— three deployment recipes: cloudflared-tunneled local dev, companion Sandbox Worker (skeleton), remote dev box.
Roadmap
⏳ Durable-Object-held WebSocket so
turn/startstreaming progress notifications survive request boundaries.⏳ Native Universal-endpoint dispatch (array-of-provider-requests body) for AI Gateway fallbacks — reduces Worker CPU by pushing retry into CF edge.
⏳ Companion Sandbox Worker reference implementation that actually wires
codex app-serverstdio into an HTTP/rpchandler.⏳ Automatic
CODEX_APP_SERVER_URLhealth-probe on Worker cold start, with fallback to paste-in tokens if the bridge is unreachable.
§Phase 12 — Cloudflare AI Gateway + Codex subscription auth (Week 12) ✅
Shipped
✅
cf-ai-gatewayplugin — one plugin, 23+ providers via Cloudflare AI Gateway. Two routing paths:- Binding path (
env.AI.run('provider/model', ..., { gateway: { id } })) — preferred. - Compat REST path (
gateway.ai.cloudflare.com/v1/{acct}/{gw}/compat/chat/completions) — fallback orforceCompat: true.
- Binding path (
✅ BYOK support via
cf-aig-authorization: KEY_NAME(Secrets Store reference) orAuthorization: Bearer ...(direct).✅
provider/model-namevalidation with a catalog of 23 known providers.✅ Three new skills:
cf-gateway-chat,cf-gateway-list-providers,cf-gateway-status.✅
codexplugin — two working auth modes plus scaffolded OAuth:api-key— classicOPENAI_API_KEYagainstapi.openai.com/v1/chat/completions.chatgpt-tokens— pasteCODEX_ACCESS_TOKEN(+CODEX_ID_TOKEN) from~/.codex/auth.jsonaftercodex login; routes tochatgpt.com/backend-api/codex/responsesso calls bill against your ChatGPT Plus/Pro subscription.oauth-device— roadmap; routes + handler scaffolded at/oauth/codex/device/{start,poll}.
✅ Three new Codex skills:
codex-chat,codex-status,codex-setup-instructions.✅ Conductor
ConductorProviderunion widened to includecf-ai-gatewayandcodex; auto-mode tool loop works against both (both speak OpenAI function-calling via the compat endpoint).✅ New
/appProviders tab — one card per path, live enabled/not-enabled state, one-tap "Probe" against the plugin'sstatusaction, gateway provider catalog listing.✅
docs/PROVIDERS.md— full five-path matrix with setup snippets for each.
Roadmap
⏳ Live OAuth device-code dispatch (needs registered OpenAI OAuth client id).
⏳
cf-ai-gatewayuniversal endpoint with fallback chains across providers.⏳ Cache-control exposure via
cf-aig-*headers (cacheTtl, skipCache).⏳ Streaming SSE passthrough for auto-mode (currently buffers entire response).
⏳
codex app-serverbridge via Sandbox service binding — JSON-RPC passthrough for full Codex thread/turn API.
§Phase 11 — Native tool-use + three setup paths (Week 11) ✅
Shipped
✅
SkillDefinitionextended with optionalinputSchema(JSON Schema) anddangerous: boolean. Schemas declared for every first-party skill.✅
AnthropicPluginacceptstools[],tool_choice, and rich content blocks (tool_use,tool_result).✅
OpenAICompatiblePluginacceptstools[],tool_callson assistant messages, androle: "tool"result messages.✅
buildAnthropicTools()/buildOpenAITools()descriptor builders map the live catalog to each provider's tool shape.✅ Conductor gains three modes:
propose— plans only; returnssuggestedActions[]as exhibit cards (default, works with every provider).selective— auto-runs safe skills, halts ondangerous: truewith a pending proposal.auto— full end-to-end tool loop withmaxIterationscap (default 6, ceiling 12).
✅ Every tool call persisted as a
role: "tool"message in the session DO (idempotent replay).✅
ConductorReplyreturnstrace[]of{ skill, input, ok, result|error, durationMs }— same shape for the UI and external agents.✅ UI composer has a three-state mode toggle; trace steps render as a timeline under the assistant reply; halted bannners for iteration-cap and dangerous-skill.
✅ External-agent pattern documented in
HELM.md— Claude Code / OpenClaw can drive Helm as a planner without embedding the runtime.
Roadmap
⏳ Workers AI native tool-use (llama-3.3 via
env.AI.run) — currently falls back to propose mode.⏳ Streaming replies via DO WebSockets (
GET /sessions/{name}/chatupgrade).⏳ Multi-agent fork — Conductor spawns narrow task agents per plan step into their own session DOs.
⏳ Per-skill rate limits to cap token spend on an external provider during runaway loops.
⏳ Session browser UI with tree visualisation (D3 dendrogram) instead of raw JSON.
⏳ Settings panel with
wrangler secret putorchestration via Cloudflare API MCP.
§Exit criteria summary
Any developer with a Cloudflare account can deploy a fully working agent runtime in <5 minutes.
New first-party plugins can be added in <30 minutes with tests.
Sessions survive Worker restarts and can be forked/compacted without code changes.
Every incoming request is traceable by
x-request-idwith structured audit logs.Projects migrating from other agent frameworks can point existing tool contracts at Open Think via MCP with no rewrites.