Research
Engineering
February 2026· 12 min read

Building the SuperIntelligence orchestrator: a GroupChat architecture

The technical design behind our mode-agnostic orchestrator — shared transcripts, asyncio queues for user injection, backpressure via read-ready events, and graceful stop without cutting mid-sentence.

Why we replaced the pipeline with a GroupChat

The original debate engine used a strict pipeline: classify the question, get independent responses, cross-analyze disagreements, run targeted debate rounds, verify facts, synthesize. This produced good outputs on well-defined analytical questions. It produced poor results on casual questions, follow-up messages, and anything that required conversational fluency rather than analytical rigor. Forcing every question through six phases was both slow and unnecessary. The SuperIntelligence orchestrator replaces this with a GroupChat model: models read the shared transcript, respond in round-robin order, and the system continues until a stop condition is met or the user intervenes.

The shared transcript

Every message from every model — and every user injection — is appended to a single shared transcript. Each model receives the full transcript before generating its response. This creates a genuine multi-party conversation rather than a series of isolated exchanges. The transcript grows with every turn, which creates a cost consideration: each model call requires processing the entire transcript as context. We mitigate this with KV caching (Anthropic explicit cache_control, OpenAI/Gemini automatic via OpenRouter sticky routing), which reduces effective token cost on longer debates.

User injection via asyncio.Queue

Users can inject messages into an ongoing debate — asking a follow-up question, adding context, or redirecting the conversation — without stopping and restarting. Technically, this is implemented via an asyncio.Queue. The main debate loop checks the queue before each model turn; if a user message is waiting, it is appended to the transcript and acknowledged before the next model responds. The queue is bounded (maxsize=50) to prevent memory issues if messages accumulate faster than models can respond. In practice, injections are rare and the queue depth stays at 0–1.

Backpressure: slowing the backend for the reader

When models generate faster than users read, the conversation becomes overwhelming. We implemented a backpressure mechanism: the frontend counts messages that have arrived but are not yet in the viewport (below the scroll position), and sends this count to the backend via a WebSocket message. When the unread count reaches three, the orchestrator pauses before starting the next model's turn — it waits for a read-ready signal or a user injection before continuing. This keeps the debate at a pace the user can actually follow.

Graceful stop: finishing the sentence

When a user clicks Stop, we do not want to cut a model off mid-sentence. Truncated AI responses look broken and are harder to read. The orchestrator implements graceful stop via a _stop_after_current flag. When Stop is received, this flag is set to True; the current model finishes streaming its complete response; then the loop exits cleanly and a synthesis is generated from the conversation state. The extra latency is typically two to ten seconds — long enough to notice, short enough to be acceptable.
Eclipsco — Next Generation AI