Research
Research
January 2026· 6 min read

Backpressure in streaming AI: slowing the backend when the user can't read

When AI models generate faster than users can read, buffering messages is not enough. We built a real-time unread-count signal that pauses the orchestrator when messages accumulate below the viewport.

The problem with unlimited streaming speed

Multi-model debates can generate content faster than humans read. In a three-model debate where each model produces a 300-word response, a full round completes in under a minute. If the user is still reading the first model's response when the second and third finish, they now have two unread messages below the viewport. By round two, they may have four or five. The natural reaction is to scroll quickly past everything to catch up — which means reading nothing carefully. The debate becomes a stream of content the user skips, not a reasoned exchange they follow.

Why buffering alone does not help

One approach is to buffer messages on the client and release them at a controlled rate. This hides the problem but does not solve it — the user still ends up with the same cognitive load; it just arrives more smoothly. The fundamental issue is that the backend is generating faster than the user is consuming, and we need the backend to slow down rather than the frontend to pretend it has not received the content yet.

The unread-count signal

We implemented a simple real-time signal: the frontend counts how many AI messages have arrived but are currently below the visible viewport (the user has not scrolled to see them yet). This count is sent to the backend via the existing WebSocket connection as a small JSON message. When the count reaches 3 or more, the orchestrator sets a pause flag before starting the next model's turn. It waits for the count to drop below 3 — either because the user scrolled down, or because they injected a message — before continuing.

Implementation details

On the frontend, we attach a data attribute to each AI message div as it is streamed. A debounced scroll handler runs every 200ms and counts how many of these elements are currently below the visible viewport boundary. We only count live messages (not historical ones loaded from the database), because historical messages were already read in a previous session. The debounce prevents the scroll handler from firing on every pixel of scroll movement, which would cause excessive WebSocket messages.

Results and tradeoffs

In user testing, debates with backpressure enabled received significantly higher satisfaction scores than identical debates without it. Users reported feeling in control of the pace and said they actually read the models' reasoning rather than skimming. The tradeoff is latency: a debate that would have completed in four minutes might now take six or seven if the user is a slow reader. We considered making this configurable, but found that the users who most needed the pacing were also the ones who would never have adjusted a setting to get it. The default is now always-on.
Eclipsco — Next Generation AI