Wanted to share how we stream LLM tokens over WebRTC data channels in AntSeed.
Most LLM APIs use SSE - unidirectional HTTP, “data:” lines, content-type: text/event-stream.
Fine for a single API server. Doesn’t fit a peer to peer network where every seller runs real models behind real home connections. So the wire between peers is a WebRTC data channel.
The local app (Claude Code, Codex, Pi, anything pointed at an OpenAI- or Anthropic-compatible base URL) talks to a local proxy on 127.0.0.1 over plain HTTP. That proxy serializes the request and ships it across a WebRTC data channel to the seller.
The seller speaks SSE upstream to wherever the tokens come from.
We don’t replace SSE; we wrap it.
Each chunk from the upstream reader gets re-framed and pushed back onto the data channel.
The buyer’s proxy unwraps it and re-emits SSE to the local consumer.
HTTP/SSE → WebRTC → HTTP/SSE. Local app and upstream model each see exactly what they expect.
WebSocket-based APIs like OpenAI Realtime aren’t wrapped yet - same idea applies, the upstream reader just looks different.
The bridge is straight-line synchronous code. No buffer, no batch window, no backpressure check, no async hop. Per-chunk overhead is one binary copy and ~20 bytes of framing.
We use one data channel per buyer/seller pair and multiplex many concurrent streams on it.
SCTP underneath guarantees in-order reliable delivery - frames arrive in send order, retransmission on loss, no reorder buffer in user code.
The receiver just keeps a Map<requestId, chunkHandler> and dispatches frames as they come in.
Ten parallel chats between the same buyer and seller share one connection, one ICE negotiation, one DTLS handshake. Multiplexing is correctness-preserving for free.
Encryption is also free. WebRTC data channels run on DTLS SCTP. node-datachannel wraps libdatachannel which handles the handshake, DTLS keys, SCTP ordering. Bytes are encrypted from sender’s libdatachannel to receiver’s libdatachannel.
NAT traversal is free too: ICE candidate gathering, STUN, TURN fallback are all in libdatachannel. Peers behind home routers connect directly most of the time. If direct fails, there’s a TCP socket fallback in the connection manager that uses the same framing on top.
Payments ride the same data channel. Channel open is a ReserveAuth - EIP-712 over channelId, maxAmount, deadline - that the buyer signs and ships inside the first SpendingAuth (type 0x50). The seller acks with AuthAck (type 0x51) and starts serving. After each response the seller sends NeedAuth (type 0x58) with the cumulative cost; the buyer responds with a fresh SpendingAuth signing the new cumulative spend. When spending hits ~85% of the reserve, the buyer tops up by re-signing SpendingAuth with a higher reserveMaxAmount. If the buyer hasn’t authorized enough to cover a request, the seller sends PaymentRequired (type 0x56) and the buyer signs more. No separate auth socket, no RPC. Streaming hot path stays clean. Payment math runs once per request, settled on-chain in batches later.
HTTP for talking to the model. WebRTC for talking between peers. Each protocol where it’s actually good.