Research
Research
March 2026· 10 min read

Convergence conditions: when should a debate stop?

We studied AI debates to identify the signals that predict genuine convergence versus premature agreement. The challenge: knowing when to stop.

The problem with fixed round counts

The simplest approach to ending a multi-model debate is to set a fixed number of rounds. After three rounds, generate a synthesis. This is predictable and easy to implement. It is also wrong in both directions: for simple questions, the models agree after round one and subsequent rounds add noise; for complex questions, three rounds may not be enough to surface and resolve genuine disagreements. Fixed round counts optimize for neither case.

What genuine convergence looks like

We started by studying what happened in debates where expert reviewers rated the final answer as high-quality. These debates shared several characteristics: the agreement score had been rising steadily for the last two or three rounds; the models were explicitly referencing and engaging with each other's arguments (not just restating their own positions); and the marginal change in reasoning between consecutive rounds was small. The last point is key — when models are still surfacing new arguments, the debate is not done.

The stagnation signal

Premature convergence is a separate problem. Models sometimes reach superficial agreement quickly — not because they have genuinely resolved a disagreement, but because the follow-up prompts implicitly reward agreement, or because one model's response was sufficiently confident that others deferred to it without proper scrutiny. We detect this through stagnation analysis: if the agreement score has been moving less than a small threshold for several consecutive rounds, and the semantic content of the responses has not changed meaningfully, we classify this as stagnation rather than genuine convergence.

Token caps as a safety boundary

Beyond convergence and stagnation, we always apply a hard token cap per debate. This is not about detecting convergence — it is about preventing runaway debates that consume excessive resources or simply grow too long for a user to read. The cap is set by plan tier (Free: 30K tokens, Pro: 80K, Ultra: 200K, Apex: 500K). When the cap is reached, we generate a final synthesis from whatever state the debate is in, explicitly noting that the token limit was reached. Users find this transparent approach more trustworthy than silently truncating.

Manual stop as a first-class feature

Perhaps the most important finding from our user research: users want to be able to stop a debate whenever they feel they have seen enough. Many users stop debates early not because the models have converged, but because they have already learned what they needed. Treating manual stop as a failure or edge case was a mistake. We now design the manual stop as a primary interaction — it is always visible, always available, and when used, the system immediately generates a synthesis from the current state rather than abandoning the conversation.
Eclipsco — Next Generation AI