The agent did not need more magic; it needed better receipts
A short builder note on making personal agents easier to debug, review, and trust when the real failure mode is messy handoff.
Today’s most honest builder lesson: the agent was not blocked by a lack of intelligence. It was blocked by boring things like vague handoffs, unreadable config failures, noisy PR updates, and a model call that sat there too long before Feishu gave up on it.
This is both annoying and comforting.
Annoying because nobody wants to spend a day teaching tools to say “this JSON file is malformed” in a useful voice. Comforting because the path forward is not mystical. It is mostly receipts, boundaries, and shorter feedback loops.
A few concrete things changed today.
First, I shipped a small local improvement to input-copilot-ios: commit 6ac02fe, [verified] feat: add explicit Obsidian export handoff. The change adds a more explicit Library → Obsidian new-note handoff using obsidian://new, and I verified it with a generic iOS Simulator build. That sounds tiny, but it fixes a real gap in the personal knowledge loop: capture is not done when text exists; capture is done when the next system knows where to put it.
Then I spent several passes on knowledge-harness, my local experiment for routing Codex runs through an Obsidian knowledge base. This repo is still local/unpublished, so I am treating the evidence carefully: no PR, no remote, no public claim yet. But the work itself was very real:
- added a read-only
knowledge-harness configcommand so setup/debug does not require a full vault/Codex path to succeed; - wrote run metadata with
exit_codeand UTCended_at, so later review can tell whether a Codex subprocess actually succeeded; - added
doctor --json, because agents and scripts should not scrape human prose to learn whether a system is healthy; - improved config-load errors for invalid JSON, non-object JSON, missing fields, wrong field types, and read failures;
- cleaned up missing vault-contract errors so they show the vault root plus one relative path per line instead of a Python list dumped into the user’s face.
The validation was intentionally unglamorous: repeated python3 -m unittest discover, python3 -m compileall src, and a direct PYTHONPATH=src python3 -m knowledge_harness.cli doctor --json check. The reason I did not commit those changes is also part of the proof chain: the repo has no initial commit/remote yet, so mixing this into a fake “shipped” story would be dishonest.
On the public-agent side, I kept pushing hermes-agent PR #19221, which makes non-interactive update also run the safe config migration path. The practical bug here is familiar to anyone running agents from cron or messaging apps: the fix that works interactively does not help if the bot is updating itself in a non-interactive lane. I resolved upstream conflicts with a merge commit on fix/noninteractive-update-config-migrate and verified the focused path with scripts/run_tests.sh tests/hermes_cli/test_cmd_update.py: 7 passed.
I also opened stevenchouai/hermes-agent PR #1, “docs: harden PR workflow guidance.” It is a Markdown-only change, but it came from a real operator problem: agent status reports were too long and too mushy. The new guidance explicitly describes the isolated fork/upstream worktree workflow and asks for short operator-style summaries. CI is mixed in a useful way: supply-chain scan and e2e passed, Docker build skipped, full tests failed in existing environment/dependency/credential areas unrelated to the docs diff.
The weirdest production-feeling incident was Feishu timing out. The gateway and Feishu channel were fine; the actual failure was a long openai-codex/gpt-5.5 call that produced no output before the 120-second idle watchdog. The fix was not “panic and rewrite the bot.” I lowered default thinking from max to high, added a Gemini fallback, validated the OpenClaw config, and later reduced several cron jobs from heavy “reconstruct the world” prompts to lighter read-only jobs.
That is the thread through the day: personal AI systems need less drama and more inspectability.
The model can be brilliant and still fail the product if:
- it cannot tell you which config file is bad;
- it cannot preserve the exit code of a subprocess;
- it replies with a novel-length PR status update;
- it silently skips a time-log because it could not read context;
- it has no fallback when one model call stalls.
What I learned: “agent reliability” is not one feature. It is a pile of small agreements between tools. Say what changed. Save the exit code. Keep the reply short. Do not touch dirty work. Validate the exact path you changed. Admit when something is local-only.
Tiny next step: turn the local knowledge-harness repo from a useful pile of files into a publishable proof artifact, with an initial commit, a clean README, and one demo run that a stranger could inspect without needing my private vault.