Weekly shipped receipts page
A public page that lists every shipped PR and commit from the past week, with links a visitor can verify.
Verify: Open the route /proof-chain/shipped and click any link.
Proof receipts
Sixty-four shipped artifacts across five repos in seven days. Each link below is a public PR or commit a visitor can inspect.
A public page that lists every shipped PR and commit from the past week, with links a visitor can verify.
Verify: Open the route /proof-chain/shipped and click any link.
Public blog post documenting the Product Owner / Builder / Reviewer loop for May 13.
Verify: Read the MDX frontmatter and compare with the day's PRs.
A /pulse page with real engineering metrics: shipped count, repos touched, streak days.
Verify: Open /pulse and check that the numbers match the PR list.
A /proof-chain/control-room page showing live build status and shipped artifacts.
Verify: Open the route and verify links resolve to real PRs.
A checklist for deciding whether an agent trace has enough evidence to score.
Verify: Run the examples and check that the checklist gates scoring.
A trust contract page: what the scorecard checks, what it does not, and where the boundary is.
Verify: Read the page and verify the contract matches the score logic.
A failure replay that shows what the agent did wrong and what the scorecard caught.
Verify: Run the replay example and compare output with the trace.
A gate that classifies agent changes by risk level before they land.
Verify: Read the doc and verify the classification criteria are explicit.
A capability contract: what the twin can do, what it cannot, and what needs human review.
Verify: Open the demo and verify the contract boundaries match reality.
A control plane demo for auto-run / review / stop decisions on agent work.
Verify: Read the decision criteria and verify they are testable.
A doctor command with JSON output for programmatic health checks.
Verify: Run knowledge-harness doctor --json and verify the schema.
Redacts local vault paths from run metadata before public output.
Verify: Run a demo and verify no local paths appear in the output.
A doc convention for proposals: context, scope, acceptance criteria, risk.
Verify: Read the doc and verify the convention is self-consistent.
Keeps the prompt command read-only: it shows what would run without executing.
Verify: Run knowledge-harness prompt and verify no vault writes occur.
Redacts local file paths from command metadata in public run receipts.
Verify: Run a command and verify the receipt has no local paths.
A /demo/agent-scorecard route on the public site showing a live walkthrough.
Verify: Open the route and follow the demo steps.
A blog note on when to treat a bug as a feature gate vs a defect.
Verify: Read the post and verify the framing is from real experience.
A page showing open-source contributions as proof of real-world agent work.
Verify: Open the page and verify the linked PRs are real.
A decision receipt: what the agent decided, why, and what evidence it used.
Verify: Read the receipt and verify the evidence links are public.
An agent work receipt template showing inputs, actions, outputs, and verification.
Verify: Read the template and verify it matches the scoring rubric.
A self-contained HTML walkthrough explaining how a trace becomes checks, a score, and a trust decision.
Verify: Open the HTML file in a browser and follow the steps.
A demo command that bundles evidence from a vault run into a single reviewable artifact.
Verify: Run knowledge-harness demo and inspect the output bundle.
Supports custom questions in the demo command for flexible proof generation.
Verify: Run knowledge-harness demo --question '...' and verify the output.
Markdown output format for demo receipts, suitable for public docs.
Verify: Run knowledge-harness demo --markdown and verify the format.
JSON output format for demo receipts, suitable for programmatic use.
Verify: Run knowledge-harness demo --json and verify the schema.
A static simulator that maps agent scores to autonomy decisions.
Verify: Run the simulator with different scores and verify the output.
A GitHub profile README with proof links, 60-second proof route, and StevenOS stack.
Verify: Visit github.com/stevenchouai and verify the links resolve.
A blog post arguing that demos without measurable proof are just marketing.
Verify: Read the post and verify the argument is concrete.
Homepage restructured to lead with proof metrics instead of biography.
Verify: Visit the homepage and verify the first screen shows proof.
A CI workflow that checks site quality on every PR: lint, build, link validation.
Verify: Check the PR's CI status and verify the workflow runs.
A /now page with a proof ledger: what's active, what shipped, what's next.
Verify: Open /now and verify the entries link to real artifacts.
A script that validates all proof-chain links resolve to real artifacts.
Verify: Run npm run validate:proof-links and verify the output.
Every link above is a public GitHub PR or commit. Click it, read the diff, check the CI status, and decide for yourself. The autonomous loop that produced these runs every 30 minutes while Steven is idle. Each artifact was independently reviewed before shipping.