Vendor certification

Get your framework independently benchmarked.

"Our agents are reliable" is something every framework vendor says. A przm receipt is the thing that makes it verifiable.

We run the benchmark independently against your release, sign the receipt with our Ed25519 key, and publish it on the public leaderboard. You get a third-party performance attestation you can link from your README, your sales deck, your landing page, or anywhere "this is what the test said" beats "trust us."

Pricing

5 charter slots open
Charter
$0first 3–5 vendors only

Free signed receipt for the launch leaderboard, in exchange for case-study rights and a public quote.

  • One full benchmark run on your release
  • Ed25519-signed receipt with per-scenario transcripts
  • Logo placement on the v0.1 launch leaderboard
  • Charter-customer badge for your website / README
  • Private findings brief before publication
  • You provide: company name, 1-2 sentence quote, sample API key
Claim a charter slot
Standard
$999per release

Production cert for an individual framework or model release. 5-business-day turnaround.

  • One full benchmark run on your release
  • Ed25519-signed receipt with per-scenario transcripts
  • Public leaderboard entry
  • Private findings brief with the scenarios where you lost points
  • 5 business day turnaround
Choose this tier
Extended
$2,499per release

Standard plus the holdout subset (your strongest defense against benchmark-gaming claims).

  • Everything in Standard
  • Runs against both seen + 20% holdout fixture set
  • Seen-vs-holdout delta published with your leaderboard entry
  • 72-hour priority turnaround
Choose this tier
Enterprise
$9,999per release

Custom fixture set authored against your stated use case. Private receipt unless you choose to publish.

  • Everything in Extended
  • Custom fixture set for your domain and failure modes
  • Private receipt by default (your choice to publish)
  • Re-run option if you ship a patch within 30 days
  • 72-hour priority turnaround
Choose this tier

Volume pricing available for framework vendors who certify each major release. Get in touch.

How it works

  1. 01
    Fill out the form below

    Your framework name, the release version you want certified, which LLM model your framework will run. You get an automated ack within minutes; Matt follows up within one business day.

  2. 02
    Provide a sample API key

    Enough to run the full fixture set (~$5–15 in API costs). We use your key so the run is billed to you and you can independently audit what was called.

  3. 03
    Adapter implementation

    If you already have a PR against the przm-bench repo, we use that. If not, we implement a baseline adapter and you review it before we run.

  4. 04
    Run the bench

    Temperature 0, seeded where the framework supports it. For non-deterministic frameworks: 3 runs, median per-axis score.

  5. 05
    Private findings brief

    You see the numbers privately first. 48 hours to flag any adapter-implementation errors. We will fix genuine bugs; we will not re-run because the score was low.

  6. 06
    Sign + publish

    Ed25519-signed receipt committed to the public ledger. You get the signed file to link from your own marketing.

What it doesn't include

You do not see the fixtures in advance.

That's the point. If you saw the test questions, the score would be meaningless. The holdout set is sealed from everyone vendor-side, including us.

We do not tune the benchmark to your strengths.

The scenarios are designed to break things, not to flatter them. If your framework has a sycophancy problem, the receipt will say so.

No retakes before publishing.

A certification run is a certification run. If you want to improve your score, ship a better version and certify that release. Historical receipts stay in the ledger with their original scores.

Certification is not an endorsement.

"przm recommends this product" is not what a receipt says. A receipt says "this is what we found when we ran the test." Vendors with high scores AND low scores can publish; the meaning depends on context and the comparison.

Questions

Can I certify a model instead of a framework?

Yes. LLM providers can certify a specific model version against the baseline adapter. Contact us for model-specific pricing.

What if I disagree with the methodology?

The methodology is open source. Submit a PR. We take adversarial feedback seriously. The spec explicitly allows competitors to submit replacement confederate prompts and we publish both runs. If your objection is substantive, it makes the benchmark better.

What if the benchmark changes between my v1 cert and my v2 cert?

Receipts pin the benchmark version. A convergence-v0.1 receipt is always comparable to other convergence-v0.1 receipts. When the benchmark version bumps, prior receipts stay in the ledger with their version label.

How do you make money if the harness is open source?

We don't sell the harness; we sell being the third party that ran the test. Same way a security auditor sells their name, not their checklist. The OSS is free. The signed-by-us receipt is the product.

Get certified

Pick a tier and tell us about your framework. Matt replies within one business day (same-day for Extended / Enterprise).

first 3 to 5 vendors

We'll email you. We won't add you to any list.