Vendor certification

Get your framework independently benchmarked.

"Our agents are reliable" is something every framework vendor says. A przm receipt is the thing that makes it verifiable.

We run the benchmark independently against your release, sign the receipt with our Ed25519 key, and publish it on the public leaderboard. You get a third-party performance attestation you can link from your README, your sales deck, your landing page, or anywhere "this is what the test said" beats "trust us."

Pricing

5 charter slots open

Charter

$0first 3–5 vendors only

Free signed receipt for the launch leaderboard, in exchange for case-study rights and a public quote.

One full benchmark run on your release
Ed25519-signed receipt with per-scenario transcripts
Logo placement on the v0.1 launch leaderboard
Charter-customer badge for your website / README
Private findings brief before publication
You provide: company name, 1-2 sentence quote, sample API key

Claim a charter slot →

Standard

$999per release

Production cert for an individual framework or model release. 5-business-day turnaround.

One full benchmark run on your release
Ed25519-signed receipt with per-scenario transcripts
Public leaderboard entry
Private findings brief with the scenarios where you lost points
5 business day turnaround

Choose this tier →

Extended

$2,499per release

Standard plus the holdout subset (your strongest defense against benchmark-gaming claims).

Everything in Standard
Runs against both seen + 20% holdout fixture set
Seen-vs-holdout delta published with your leaderboard entry
72-hour priority turnaround

Choose this tier →

Enterprise

$9,999per release

Custom fixture set authored against your stated use case. Private receipt unless you choose to publish.

Everything in Extended
Custom fixture set for your domain and failure modes
Private receipt by default (your choice to publish)
Re-run option if you ship a patch within 30 days
72-hour priority turnaround

Choose this tier →

Volume pricing available for framework vendors who certify each major release. Get in touch.

How it works

01
Fill out the form below
Your framework name, the release version you want certified, which LLM model your framework will run. You get an automated ack within minutes; Matt follows up within one business day.
02
Provide a sample API key
Enough to run the full fixture set (~$5–15 in API costs). We use your key so the run is billed to you and you can independently audit what was called.
03
Adapter implementation
If you already have a PR against the przm-bench repo, we use that. If not, we implement a baseline adapter and you review it before we run.
04
Run the bench
Temperature 0, seeded where the framework supports it. For non-deterministic frameworks: 3 runs, median per-axis score.
05
Private findings brief
You see the numbers privately first. 48 hours to flag any adapter-implementation errors. We will fix genuine bugs; we will not re-run because the score was low.
06
Sign + publish
Ed25519-signed receipt committed to the public ledger. You get the signed file to link from your own marketing.

What it doesn't include

You do not see the fixtures in advance.

That's the point. If you saw the test questions, the score would be meaningless. The holdout set is sealed from everyone vendor-side, including us.

We do not tune the benchmark to your strengths.

The scenarios are designed to break things, not to flatter them. If your framework has a sycophancy problem, the receipt will say so.

No retakes before publishing.

A certification run is a certification run. If you want to improve your score, ship a better version and certify that release. Historical receipts stay in the ledger with their original scores.

Certification is not an endorsement.

"przm recommends this product" is not what a receipt says. A receipt says "this is what we found when we ran the test." Vendors with high scores AND low scores can publish; the meaning depends on context and the comparison.

Questions

Can I certify a model instead of a framework?

Yes. LLM providers can certify a specific model version against the baseline adapter. Contact us for model-specific pricing.

What if I disagree with the methodology?

The methodology is open source. Submit a PR. We take adversarial feedback seriously. The spec explicitly allows competitors to submit replacement confederate prompts and we publish both runs. If your objection is substantive, it makes the benchmark better.

What if the benchmark changes between my v1 cert and my v2 cert?

Receipts pin the benchmark version. A convergence-v0.1 receipt is always comparable to other convergence-v0.1 receipts. When the benchmark version bumps, prior receipts stay in the ledger with their version label.

How do you make money if the harness is open source?

We don't sell the harness; we sell being the third party that ran the test. Same way a security auditor sells their name, not their checklist. The OSS is free. The signed-by-us receipt is the product.

Get your framework independently benchmarked.

Pricing

How it works

What it doesn't include

Questions

Get certified