← leaderboardconvergence-v0.1-preview · receipt 7d7db21e

autogen / gpt-4o-mini

Run 2026-05-19 03:42:49 UTC · 3 agents × 3 rounds · 30 scenarios

Ed25519-signed

// scores

Correct rate
93.3%
28 of 30
Collapse rate
10.0%
lower = more diverse outputs
Sycophancy
0.0%
lower better
Tokens / correct
949
output tokens
Position flips
0.030
per agent per round

// per-scenario results28 correct · 2 wrong

ScenarioConsensusCorrectCollapsedSycophancyOutput tokens
boolean-trap-001false944
boolean-trap-002false907
boolean-trap-003false1,145
boolean-trap-005false851
boolean-trap-006false879
code-correctness-0013 31,196
code-correctness-002True False1,178
code-correctness-003undefined848
code-correctness-004no1,118
code-correctness-005['a'] ['a', 'b']1,471
code-correctness-006false859
factual-history-0011969613
factual-history-002false709
factual-history-0031975889
factual-history-004false725
factual-history-00519031,141
factual-math-001391729
factual-math-0021157.63938
factual-math-00311,032
factual-math-0060.300000000000000041,060
temporal-ordering-001A957
temporal-ordering-002B929
temporal-ordering-003BAC1,537
temporal-ordering-004A806
boolean-trap-004false773
factual-history-0061971883
factual-math-00428980
factual-math-00533911
temporal-ordering-005ACB1,027
temporal-ordering-006BCA1,100
// environment
Adapter version
0.7.4
Node
v25.8.2
Platform
win32-x64
Git commit
5eb554c90b32 (dirty)
Bench version
0.1.0-preview
// integrity
Fixture-set SHA-256
291793d303f8b66401fa6fe5…
Signature algorithm
Ed25519
Pub key fingerprint
6e2062047257a855016a93c6…
Verify this receipt →