I spent the day making the proof less hand-wavy
A builder note on turning public agent work into an inspectable proof chain instead of a pile of impressive-looking links.
Today was not a model breakthrough day. It was a receipts day.
The uncomfortable thing about building personal agents in public is that strangers cannot inspect your private workflow. They cannot see the late-night Feishu failures, the half-fixed gateway path, the local Obsidian notes, or the small judgment calls that make the system better. If the public surface is just a polished homepage, it starts to smell like theater.
So the work today was to make the public proof chain harder to fake.
The biggest site change was a new guided demo path on my personal website: /demo/. It walks through the public artifacts in a way a recruiter, builder, or AI engineer can actually follow. Not "trust me, I build agent systems." More like:
- here is the architecture surface;
- here is the runtime-fix surface;
- here is the evaluation layer;
- here is the knowledge runtime;
- here is the public proof-chain audit.
I also added /proof-chain/audit/, which is more blunt than a normal portfolio page. It maps claims to artifacts. If I say I work on agent evaluation, the page should point somewhere concrete. If I say I care about knowledge systems, there should be a repo, demo, visual, or writeup that survives a skeptical click.
The validation was boring in the good way:
npm run lint
npm run build:static
Both passed locally. GitHub Pages deploy also succeeded, and I verified the new live routes with cache-busted URLs: /demo/?v=6cccb0a and /proof-chain/audit/?v=6cccb0a.
Then I did the part that sounds cosmetic until it is not: I rebuilt the public visual narrative across the profile and repos.
The GitHub profile repo now frames the stack as "StevenOS proof chain" instead of a random basket of projects. The sequence is closer to how I actually think about the work:
- agent architecture;
- runtime fixes;
- agent evaluation;
- knowledge runtime;
- public proof chain.
That framing matters because otherwise the public surface looks like five separate side projects. The real story is that they are different layers of the same operating system for my work. Some parts are public and inspectable. Some parts are still local hardening. That distinction is important. I do not want to pretend an unpublished local repo is a shipped product just because it makes the map look cleaner.
A few concrete pieces moved:
personalWebsiteshipped the demo walkthrough, proof-chain audit page, proof visuals, and lightweight homepage motion. Commits includedc0cf98e,6cccb0a,335db16, andfe1a37b.- The GitHub profile repo shipped new proof-chain banners and then the larger StevenOS README pass, ending at
acbea7c. agent-scorecardadded the evaluation standard and visual layer, ending atcf0532e.digital-twinadded the identity-layer visual on branch work at1bdf661.
I also ran privacy checks before pushing public-facing surfaces. That sounds paranoid, but it should be default behavior for this kind of workflow. When agents are stitching together local notes, repos, visuals, and public pages, the risk is not just a bad sentence. The risk is accidentally turning private context into public polish.
The more interesting judgment from today was in the notes: agent work should not be measured by "more token burn" or vague activity. It needs something closer to an investment committee, a cost blackbox, and an external proof chain. I like that framing because it forces a harder question: which agent runs deserved to happen, which ones produced reusable assets, and which ones were just expensive motion?
That is the line I want the public site to hold.
Not every layer is finished. The knowledge runtime is still partly local. Some visuals are ahead of the deeper docs. The audit page will need pruning as the artifact graph changes. But today made the boundary clearer: public claims should point to public proof, and private systems should stay private until they can survive being inspected.
Tiny next step: make the proof-chain audit less like a showcase and more like a ledger. Fewer shiny labels. More links that answer the annoying but useful question: "what exactly shipped, and why should I believe you?"