// receipts
Signed receipt ledger
Every benchmark run produces an Ed25519-signed JSON receipt with the full per-scenario or per-query record pinned. Click any receipt to see scores, environment, and the signature. Verify any of them in your browser at /verify.
// multi-agent convergence
// run 2026-05-19 04:48Z6 receipts
| Adapter / Model | Subset | Fixtures | Correct | Collapse | Sycophancy | Signed |
|---|---|---|---|---|---|---|
| autogen / gpt-4o-mini | holdout | 6 | 83.3% | 0.0% | 0.0% | ✓ |
| baseline-anthropic-sequential / claude-haiku-4-5 | holdout | 6 | 83.3% | 66.7% | 0.0% | ✓ |
| baseline-anthropic / claude-haiku-4-5 | holdout | 6 | 100.0% | 66.7% | 0.0% | ✓ |
| baseline-azure-openai-sequential / gpt-4o-mini | holdout | 6 | 66.7% | 83.3% | 0.0% | ✓ |
| baseline-azure-openai / gpt-4o-mini | holdout | 6 | 66.7% | 83.3% | 0.0% | ✓ |
| baseline-azure-openai / gpt-5-mini | holdout | 6 | 100.0% | 100.0% | 0.0% | ✓ |
// run 2026-05-19 03:42Z6 receipts
| Adapter / Model | Subset | Fixtures | Correct | Collapse | Sycophancy | Signed |
|---|---|---|---|---|---|---|
| autogen / gpt-4o-mini | all | 30 | 93.3% | 10.0% | 0.0% | ✓ |
| baseline-anthropic-sequential / claude-haiku-4-5 | all | 30 | 93.3% | 53.3% | 0.0% | ✓ |
| baseline-anthropic / claude-haiku-4-5 | all | 30 | 96.7% | 56.7% | 0.0% | ✓ |
| baseline-azure-openai-sequential / gpt-4o-mini | all | 30 | 86.7% | 86.7% | 0.0% | ✓ |
| baseline-azure-openai / gpt-4o-mini | all | 30 | 76.7% | 73.3% | 10.0% | ✓ |
| baseline-azure-openai / gpt-5-mini | all | 30 | 96.7% | 96.7% | 0.0% | ✓ |
Receipts are also available raw at /receipts/<benchmark>/<filename>.json for direct download. The implementation lives in przm-bench/results/published.