How I Built a Hot-Swappable Backend Proxy for Claude Code

Claude Code is my daily driver for coding. It's an agentic CLI tool from Anthropic that reads your codebase, edits files, runs commands — and it's genuinely good at it. But it has one limitation that kept bothering me: it only talks to one API backend at a time.

That's fine until Anthropic is rate-limited. Or down. Or you want to try a cheaper provider for routine tasks. Or you want teammates on a different backend than your main agent. Every time you need to switch, it's the same ritual: edit environment variables, change config files, restart the session, lose your flow.

I built AnyClaude to fix this. It's a TUI wrapper that sits between Claude Code and the API, letting you hot-swap backends with a single hotkey — no restarts, no config edits, no interruptions. Press Ctrl+B, pick a provider, keep working.

Sounds simple. It wasn't.

The Architecture

The core idea is straightforward: run a local HTTP proxy, point Claude Code at it via ANTHROPIC_BASE_URL, and route requests to whichever backend is currently active.

┌─────────────────────────────┐
│        AnyClaude TUI        │
└──────────────┬──────────────┘
               │
        ┌──────▼──────┐
        │ Claude Code │ (main agent + teammate agents)
        └──────┬──────┘
               │ ANTHROPIC_BASE_URL
        ┌──────▼──────┐
        │ Local Proxy │
        └──┬───────┬──┘
           │       │
      /v1/*│       │/teammate/v1/*
           │       │
     ┌─────▼──┐  ┌─▼──────────┐
     │ Active │  │  Teammate   │
     │Backend │  │  Backend    │
     └────────┘  └─────────────┘

AnyClaude starts a local proxy (port auto-assigned), sets ANTHROPIC_BASE_URL, and spawns Claude Code in an embedded terminal powered by alacritty_terminal. All API requests flow through the proxy, which applies transformations and forwards them to the active backend.

The proxy is built with axum and reqwest. Backends are defined in a TOML config:

[[backends]]
name = "anthropic"
display_name = "Anthropic"
base_url = "https://api.anthropic.com"
auth_type = "passthrough"

[[backends]]
name = "alternative"
display_name = "Alternative Provider"
base_url = "https://your-provider.com/api"
auth_type = "bearer"
api_key = "your-api-key"

Switching backends is just updating an atomic pointer. The next request goes to the new backend. No connection draining, no session restart — the Anthropic API is stateless, so Claude Code sends the full conversation history with every request and context carries over automatically.

Request lifecycle

Every request from Claude Code goes through a pipeline of middleware before reaching the backend:

Routing. The proxy inspects the request path. /v1/* goes to the main pipeline, /teammate/v1/* to the teammate pipeline. The routing decision is attached to the request as an extension so downstream middleware knows which backend to target.
Authentication. The proxy rewrites auth headers based on the target backend's config. Three modes: passthrough forwards Claude Code's original headers (useful for Anthropic's OAuth), bearer replaces with Authorization: Bearer <key>, and api_key sets x-api-key.

Thinking pipeline. This is where thinking block filtering and adaptive thinking conversion happen. The middleware deserializes the request body, strips foreign thinking blocks from conversation history, and optionally converts the thinking format. It also initializes a ThinkingSession for tracking new thinking blocks in the response.

Before (what Claude Code sends):

{
  "model": "claude-opus-4-6",
  "thinking": {"type": "adaptive"},
  "messages": [
    {"role": "assistant", "content": [
      {"type": "thinking", "thinking": "Let me analyze...", "signature": "backend-A-sig"},
      {"type": "text", "text": "Here's my analysis..."}
    ]},
    {"role": "user", "content": "Continue"}
  ]
}

After (what the backend receives — switched to Backend B with thinking_compat):

{
  "model": "claude-opus-4-6",
  "thinking": {"type": "enabled", "budget_tokens": 10000},
  "messages": [
    {"role": "assistant", "content": [
      {"type": "text", "text": "Here's my analysis..."}
    ]},
    {"role": "user", "content": "Continue"}
  ]
}

The thinking block from Backend A is stripped entirely, and adaptive is converted to enabled with an explicit budget.

Model mapping. If the target backend has model family mappings configured, the middleware rewrites the model name in the request body and stashes the original name for reverse mapping in the response.

Before:
```
{"model": "claude-opus-4-6", ...}
```
After (backend has model_opus = "provider-large"):
```
{"model": "provider-large", ...}
```
The original name claude-opus-4-6 is saved in request extensions so the reverse mapper can rewrite it back in the response stream.
Upstream forwarding. The proxy builds a new request to the target backend, copies relevant headers, sets timeouts, and sends it via a shared reqwest client with connection pooling. For streaming responses, it wraps the response body in an ObservedStream that monitors thinking blocks and applies reverse model mapping as chunks flow through.

Each step is a separate axum middleware or extractor, so pipelines can be composed differently. The teammate pipeline skips thinking block filtering entirely — teammates are on a fixed backend, so there's nothing to filter.

The TUI layer

The proxy is only half of AnyClaude. The other half is a terminal multiplexer — the TUI that hosts Claude Code and provides the interactive controls.

Claude Code runs inside a pseudo-terminal (PTY) managed by the portable-pty crate. The PTY output feeds into an alacritty_terminal emulator, which maintains the terminal grid state — cells, colors, cursor position, scrollback buffer. The TUI renders this grid using ratatui, overlaying popup dialogs for backend switching, status metrics, and settings.

Terminal input was one of the harder parts to get right (Challenge 6 below). The input system needs to simultaneously handle raw PTY input, hotkey detection for the TUI, and mouse events for text selection — without any of these interfering with each other. Mouse tracking mode from Claude Code complicates things further: when Claude Code enables mouse tracking, the TUI must forward mouse events to the PTY instead of handling them as selection.

That's the easy part. Here's where it gets interesting.

Challenge 1: Thinking Block Signatures

Claude models produce "thinking blocks" — internal reasoning visible in the API response. Each provider signs these blocks with cryptographic signatures tied to their infrastructure. The signatures are opaque to the client, but the API validates them on the next request when they appear in conversation history.

Here's the problem: you start a session on Backend A. Claude produces several responses with thinking blocks, each signed by Backend A. You switch to Backend B mid-conversation. Claude Code sends the next request with the full conversation history — including all of Backend A's signed thinking blocks. Backend B sees foreign signatures, can't validate them, returns 400. Your session is broken.

The proxy solves this with session-aware tracking. Each backend switch starts a new "thinking session". The proxy observes response streams as they flow through, hashing thinking block content in real-time and associating each block with the current session. When a request comes in, the proxy checks each thinking block in the conversation history against its registry. Only blocks from the current session pass through. Everything else — blocks from previous sessions, regardless of which backend produced them — is stripped entirely from the request, as if that turn had no thinking.

This means switching from A to B and back to A doesn't restore old blocks. The signatures are tied not just to the provider but to the session context, so previously seen blocks aren't guaranteed to be valid even on the same backend. A clean session on each switch is the only safe approach.

This works automatically for all backends with no configuration. The proxy never modifies thinking blocks in responses — it only filters them in requests, and only after a backend switch has occurred.

Challenge 2: Adaptive Thinking Compatibility

Anthropic recently introduced adaptive thinking for Opus 4.6 — instead of a fixed token budget, the model decides when and how much to think on its own. Claude Code uses this by default, sending "thinking": {"type": "adaptive"} in requests.

The problem: not all third-party backends support adaptive thinking yet. Some still require the explicit "thinking": {"type": "enabled", "budget_tokens": N} with a fixed budget.

For non-Anthropic backends, AnyClaude converts on the fly:

Request body: adaptive → enabled with a configurable token budget
Header: anthropic-beta: adaptive-thinking-* → interleaved-thinking-2025-05-14

[[backends]]
name = "alternative"
base_url = "https://your-provider.com/api"
auth_type = "bearer"
api_key = "your-api-key"
thinking_compat = true
thinking_budget_tokens = 10000

This is a per-backend flag. Anthropic's own API handles adaptive thinking natively — you only enable thinking_compat for third-party backends that don't support it yet.

Challenge 3: Routing Agent Teams

Claude Code has an experimental Agent Teams feature where the main agent spawns teammate agents — independent Claude instances that work in parallel on subtasks, coordinating through a shared task list and direct messaging. One session acts as the team lead, others work on assigned tasks and communicate with each other.

I wanted to route teammates to a different backend than the main agent. The use case: main agent on a premium provider for complex reasoning, teammates on something cheaper for parallel subtasks. Agent Teams can use significant tokens — each teammate has its own context window — so cost control matters.

The challenge: Claude Code spawns teammates as child processes. There's no hook, no callback, no plugin system to intercept their API target. They inherit the parent's environment, including ANTHROPIC_BASE_URL, which points at AnyClaude's proxy — but they all hit the same route. From the proxy's perspective, a request from the main agent looks identical to a request from a teammate.

I explored several approaches. Trying to distinguish agents by request content (model name, headers) was fragile — Claude Code doesn't mark teammate requests differently. Modifying Claude Code itself was out of scope. The environment variable is the only control point, and it's set once at process spawn.

The solution came in two parts.

PATH shim. AnyClaude generates a tmux wrapper script at startup and places it in a temporary directory ahead of the real tmux binary in PATH. In split-pane mode, Claude Code spawns teammates via tmux — it exec's what it thinks is tmux, but it's actually a shim. The shim rewrites ANTHROPIC_BASE_URL to point at a different proxy route (/teammate/v1/* instead of /v1/*) and then exec's the real tmux binary. The teammate process has no idea it's been redirected. This required studying how Claude Code actually spawns tmux sessions — the exact flags and environment propagation varied between display modes.

Nested pipelines. The proxy uses axum's nested routing to separate traffic. Requests to /v1/* go through the main pipeline (active backend, switchable via Ctrl+B). Requests to /teammate/v1/* go through a fixed teammate pipeline that always routes to the configured teammate backend. Each pipeline has its own middleware stack — the teammate pipeline skips thinking block filtering entirely since teammates are on a fixed backend and never experience a switch.

[agent_teams]
teammate_backend = "alternative"

You can enable Claude Code's Agent Teams feature directly from AnyClaude's settings menu (Ctrl+E) — no need to manually edit Claude Code's config files.

Challenge 4: Thinking Pipeline Isolation

The thinking block filter from Challenge 1 uses a registry of seen thinking blocks to decide what to strip. But who owns that registry?

In the initial implementation, it was a single shared structure behind a mutex. One registry for the entire proxy. This worked fine with a single agent — but Agent Teams broke it immediately.

The scenario: main agent is on Backend A, teammate is on Backend B. Both are making requests concurrently. The shared registry accumulates thinking blocks from both backends, tagged by backend name. Now the main agent switches to Backend C. The filter sees Backend B's thinking blocks in the registry and flags them as foreign — but those blocks belong to the teammate's conversation, not the main agent's. The teammate's next request gets its own valid thinking blocks stripped, and the session breaks.

The fundamental problem: thinking state is per-session, but the registry was global. The main agent and each teammate have independent conversation histories with independent thinking blocks, and they switch backends independently (or in the case of teammates, not at all).

The solution: ThinkingSession as a per-request handle. Instead of one global registry, each logical agent session gets its own isolated thinking block tracker. The proxy creates a ThinkingSession for the main agent and separate ones for each teammate, attached to requests via axum's request extensions. The main agent's backend switch only affects the main agent's ThinkingSession. Teammates' sessions are completely isolated — they never see filtering triggered by the main agent's actions.

Challenge 5: Model Mapping in Both Directions

Different providers use different model names. Anthropic has claude-opus-4-6, your provider might call it provider-large. AnyClaude remaps model names per backend:

[[backends]]
name = "my-provider"
model_opus = "provider-large"
model_sonnet = "provider-medium"
model_haiku = "provider-small"

Request rewriting is straightforward — match the model name against family keywords (opus, sonnet, haiku), substitute the configured name, done. Only configured families are remapped; omitted ones pass through unchanged.

The interesting part is reverse mapping in responses. The backend returns its own model name (e.g. provider-large) in the response body. If Claude Code sees a different model name than what it sent, it gets confused about which model it's talking to. So the proxy needs to rewrite it back to the original Anthropic name.

For non-streaming JSON responses, this is straightforward — parse the entire body, replace the model field, serialize back.

But most Claude Code interactions use streaming SSE, where the response arrives as a series of data: {...} events over a chunked HTTP connection. The model name appears in the message_start event — the very first SSE event in the stream.

AnyClaude handles this with a ChunkRewriter — a stateful closure plugged into the ObservedStream that wraps the response body. Each chunk passes through the rewriter as it arrives. The rewriter first does a fast byte-level check for the string "message_start" — if not present, the chunk passes through untouched (zero-copy). When the target event is found, the rewriter converts the chunk to text, splits into lines, parses only the data: lines as JSON, rewrites the message.model field, and re-serializes. After the first successful rewrite, the rewriter flips a done flag and becomes a no-op for all remaining chunks — the model name only needs to be rewritten once.

Challenge 6: Terminal Input

This one surprised me. I was using crossterm to capture terminal input events and forward them to the PTY running Claude Code. Certain key combinations — Option+Backspace, Ctrl+Arrow, Alt+Arrow — simply didn't work.

The root cause: crossterm parses raw terminal input into structured events, then re-encodes them back into escape sequences for the PTY. But the re-encoding doesn't perfectly round-trip. Some escape sequences that terminals emit don't have a crossterm event representation, so they're silently dropped.

The fix was writing a new crate (term_input) that forwards raw bytes directly to the PTY. No parsing, no re-encoding, no information loss. For special key detection (Option+Backspace, Shift+Enter), it uses macOS CGEvent APIs to check modifier state without interfering with the byte stream.

Getting Started

cargo install --path .

Create ~/.config/anyclaude/config.toml:

[defaults]
active = "anthropic"

[[backends]]
name = "anthropic"
display_name = "Anthropic"
base_url = "https://api.anthropic.com"
auth_type = "passthrough"

[[backends]]
name = "alternative"
display_name = "Alternative Provider"
base_url = "https://your-provider.com/api"
auth_type = "bearer"
api_key = "your-api-key"
thinking_compat = true

Run anyclaude. Press Ctrl+B to switch backends. That's it.

Key	Action
Ctrl+B	Backend switcher
Ctrl+S	Status/metrics
Ctrl+H	Switch history
Ctrl+E	Settings
Ctrl+Q	Quit

What's Next

AnyClaude is open source. It only supports Anthropic API-compatible backends — if a provider speaks the same protocol, it works. The project is actively developed and I use it daily.

If you're using Claude Code with multiple providers — or want to start — give it a try and let me know what breaks.

GitHub: github.com/arttttt/AnyClaude