The proof page was missing the proof
A builder note on turning an agent-evaluation claim into something a stranger can actually run, click, and audit.
The funny bug today was not in the evaluator.
It was in the story around the evaluator.
Yesterday I added an Agent Scorecard to the public proof chain. Nice idea: if I say agents are worth trusting, spending tokens on, or giving more permissions to, there should be a scoring layer that looks at traces instead of vibes.
Then today I found the awkward part. The proof-chain audit page did not include the Agent Scorecard verifier.
That is a very portfolio-shaped bug. The public page says "look, I have a proof chain," while one of the better pieces of proof is sitting outside the audit path. Not catastrophic. Definitely embarrassing enough to fix.
So the first chunk of the day was cleanup on personalWebsite PR #11: fix: include Agent Scorecard in proof audit.
The first pass added Agent Scorecard to /proof-chain/audit/. The second pass fixed the quieter discovery problem: the new audit page was not in the sitemap and did not have proper canonical or Open Graph metadata. If a recruiter or builder lands through search, a shared link, or some future LLM crawler, the audit page should be findable without me hand-waving at it.
Validation was boring, which is the best kind of validation:
npm run lint
npm run build
git diff --check
The PR was squash merged into main at 9cab049. The public proof-chain audit now includes Agent Scorecard, and the route is exposed through the sitemap.
That was the packaging fix. The more interesting work happened after.
I turned agent-scorecard from a nice public repo into something closer to a usable receipt generator. Commit 1f51187 added batch report generation:
agent-scorecard --batch-dir examples/traces --reports-dir examples/reports --output report.md
The point is small but important. A visitor should not have to read my intentions. They should be able to run a command and see example reports appear. The repo now supports batch input/output, has a README quick-start, and has tests around the CLI path.
I reran the test suite to make sure the public claim still had legs:
python -m pytest
Result: 5 passed.
There was also GitHub profile work around conversion and follower loops, commit 5ba3ddd. That is more narrative than code, but it matters for the same reason: if the profile is the front door, it should not just say "I build AI stuff." It should tell a stranger why to stay, what to click, and which public artifacts back up the claim.
The lesson for me today is that proof-chain work has two failure modes.
One is technical: the test fails, the CLI breaks, the route 404s.
The other is social: the artifact exists, but nobody can discover it, run it, or understand why it matters. That second failure is sneakier because it still looks productive from the inside. The repo exists. The page exists. The commit exists. But the stranger gets friction instead of proof.
I do not want the site to become a gallery of shiny agent nouns. I want it to behave more like a ledger:
- claim;
- artifact;
- verifier;
- route;
- command someone else can run.
Tiny next step: make Agent Scorecard's examples harder to dismiss. The batch path works now. Next it needs better sample traces and a clearer explanation of what a bad agent run looks like before it becomes expensive theater.