Back
Đọc bằng Tiếng Việt

Cheap Enough to Own

The market for agent harnesses keeps shipping bigger, glossier wrappers around the same loop. I started sketching my own — and the answer wasn't features. It was budget, flexibility, and the quiet weight of owning the loop that runs my workflow.

I just finished building my own tiny CLI to replace Claude Code — three days, over a long holiday weekend. Just a simple Python file: load skills, run hooks, call MCP, route through OpenRouter. But the closing line of that plan was the part that stuck: "Cheap enough to own — to the point that the next time the SDK changes, we don't even bother to notice."

That's the whole point. Not faster. Not smarter. Cheaper to own.

The market for agent harnesses has gotten too loud. Claude Code, Codex, Cursor, Aider, OpenHarness, Cline — every month another name wraps the same old loop. Each imposes its own opinions about how skills get written, how hooks behave, MCP servers, context windows, billing. And every quarter, they ship a release that breaks what you were already running.

I still use them. They're convenient and the default ergonomics are good.

But here's the thing: when your daily workflow runs through someone else's harness, you're effectively renting your own habits. The day they change the skill format, your habits break. The day your favorite model is deprecated, you have to migrate. The day the price list shifts, your budget moves with it. You never feel the chains tying you down — until you give them a hard tug.

What "owning" actually buys you

When I started writing my own harness, I expected to get pulled into features. I didn't. It came down to three quieter things:

  • First, budget. Most tools today bill per token at the vendor's price. Point your own loop at OpenRouter and free models like Gemma-4 can handle 90% of daily needs at zero. Even paying, the floor sits around $0.06 per million input tokens — roughly 13× cheaper than Haiku-4.5. You hold every dial of the cost knob: $1/day global cap, two and a half cents per call, max five turns. The decision lives in your config file, not the vendor's.
  • Second, flexibility. A homemade harness does the things no commercial tool would bother to: mask wallet addresses in every outbound message, refuse any tool call outside an allowlist, dump full traces to disk for later forensic replay. You can do them because you wrote them — no need to wait for a vendor to approve.
  • Third, personal. Your own codebase is something you can read end-to-end in an afternoon. You know exactly what happens between prompt in and result out. No bloated plugin system, no hidden telemetry, no anxiety about what the next upstream patch is going to change.

What the sketch grew into

The plan was six files. Later it had grown into a proper package — nine MCP servers in a separate toolshed, the core split across a dozen modules. The house grew — that's fine. Growth you understand is better than a black box you trust.

Here's what's in it now:

  1. Front Doors (cli.py + bot.py): Two entry points — terminal and Telegram. The same loop runs behind both. A second front door costs almost nothing when you own the hallway.
  2. Memory Layer (memory.py): Assembles the system prompt from a base instruction, per-user preferences, and a live skill catalog. Long-term memories live in SQLite with full-text search — the agent can recall past context without re-reading the whole log.
  3. The Agent Loop (loop.py): The while True heart. Sends messages to the model, receives tool calls, dispatches them, feeds results back. Stuck-loop trap: three identical calls in a row and it bails with an explanation.
  4. Budget Guard (budget_guard.py): The only module that ever speaks to OpenRouter. Enforces a fallback chain across models, per-call cost cap, daily cap, and a five-minute cooldown after a retryable failure. All four dials in one place.
  5. Security Filter (security.py): Runs on every tool output before the model sees it, and on every reply before it reaches the user. Wallet addresses masked. Private keys redacted. BIP-39 seed phrases caught. Prompt-injection attempts wrapped and flagged.
  6. MCP Registry (mcp_client.py): Connects to external servers over the Model Context Protocol. Current toolshed: calc, EVM tools, on-chain technical analysis, web search, URL fetch, strategy engine, memory read/write, Telegram outbound, skill loader.
  7. Persistence (persistence.py): SQLite-backed state. Conversation history, cost ledger, model cooldowns, long-term memories with full-text search — all in one file you can sqlite3 open and inspect in ten seconds.
  8. Trace (trace.py): The airplane black box. Every byte in and out, written to disk. When something goes wrong, you open the log — no replay fee.

longai architecture diagram

Eight rooms and a toolshed. You can still stand in the kitchen and see the front door. No plugin system pretending to be a foundation, no event bus you can't trace, no marketplace you can't audit.

Why this minimalism is powerful

The philosophy of "you can stand in the kitchen and see the front door" gives you two enormous advantages that thousand-line frameworks don't:

  • Absolute auditability. In large systems, when a bug happens, it gets swallowed by a complex Event Bus or a mess of overlapping plugins. Here, if something breaks, you just walk the hallway between the six rooms. You know exactly which room the bug is in.
  • No black magic. No telemetry phoning home to the vendor's servers, no "smart" features quietly editing your prompt behind your back.

Once you own your own loop, you stop being an AI user and become an AI operator. That's the difference between driving a factory automatic and assembling your own go-kart — the kart may be rough, but you know every bolt, and it runs exactly the way you want.

The trade-off, plainly

You write the loop. You own the failure modes. Before I started, my biggest budget worry was prompt caching — the system prompt is long and it repeats on every turn. I assumed I'd need to implement cache management myself. Turned out I didn't: OpenRouter's providers handle cache hits on repeated prefixes transparently. One concern crossed off the list before I'd even written the code.

The things that did bite: OpenRouter returns prompt_tokens while your accounting code expects input_tokens. Forget the rename once and the cost ledger innocently reports $0 spent. You patch it. You move on.

You become responsible for everything those polished tools used to absorb on your behalf: the stuck-loop trap, the fallback chain, the cooldown logic, the prompt-injection scan over tool output. None of it is hard. All of it is yours.

The first week, things will be slower than just installing Claude Code and getting on with it. By the second month, you stop noticing the harness exists. That's the destination.

A few things to sit with

I'm not telling you to throw out the tools you already use. Most days, the most convenient thing is the right thing.

But somewhere in your stack, there's almost certainly a piece of infrastructure you've stopped questioning — because it's free, or default, or because "everyone uses it." The model behind your agent. The skill format. The hook system. The way it bills you. Each of those is a quiet decision someone else made on your behalf.

Not every layer needs to be yours. But the core loop, sometimes, does.

The AI industry will keep shipping bigger, glossier harnesses — with dashboards, marketplaces, fancy streaming. Some of it is genuinely useful. Most of it is lock-in dressed up as convenience.

Your ugly, cheap, home-built harness won't compete on any of that. It does only one thing: it stays yours, forever.

And honestly? That might be the only feature that actually matters.

The repo is public if you want to read the loop yourself.


What's the smallest piece of your daily workflow you could rewrite over a weekend — and would doing it actually change anything?