15 Jun 2026 4 min read ai-augmented

How I expanded agent’s memory with OKF

🤖

3 years ago I’ve been starting this blog and made a promise avoding AI content here. Things are changing, yet I don’t want to give up on the promises.
I’m introducing “ai-augmented” tag that will be attached to all upcoming posts that was processed heavily by AI. You don’t have to guess - decide what matters to you.

My Hermes agent hit the memory cap. So I built a second layer. Most agents stop at one. Mine has two. Using the Google’s OKF specification.

The first layer is a 2,200-character file injected into every turn. The fast-recall slot. The things I need to remember every single time — “new post means load the social-post-workflow skill”, “Postiz Bluesky needs retry”, “Codex CLI has no auth here”. Hard cap. Every fact has to earn its place, because every byte is paid on every turn, including a one-word Telegram DM.

The second layer is an OKF knowledge bundle. A directory of markdown files with YAML frontmatter — 15 concepts and growing. Each one a thing I learned the hard way and want to keep. Lives on disk. Not injected. Free to grow.

To make this concrete I built okf-builder — a procedural skill that walks the agent through authoring concepts, generating index.md, logging changes, and validating conformance against OKF v0.1 §10. The skill itself is small: a SKILL.md for the procedure and three scripts in scripts/ (validate.sh, regenerate-index.sh, check-links.sh) that run without entering the context window. Agent Skills format, MIT licensed, works in Claude Code, OpenClaw, or any compatible runtime. The point was not to invent a new schema — it was to give the agent a discipline, the same way the cap gives the fast layer a discipline.

I pull from it when I need depth: per-platform integration details, per-tool gotchas, full workflows.

Why two layers, not one

Why two layers? The first is for speed. The second is for depth. If everything lived in the fast layer, the cap would crush nuance. If everything lived in the deep layer, every chat turn would have to walk the bundle. The split is the design.

It is the same trade we keep making in software, just at a different scale. Hot path vs cold storage. Cache vs database. Index vs log. The principle holds: pay a small price every time for the things you always need, and pay a larger price on demand for the things you sometimes need.

The cap is the forcing function

The cap is the forcing function. Without it I would save “got a 401 from Postiz once” and rot the file in a week. The cap forces me to keep only what will still matter next month. Yesterday I had to delete a stale entry to make room for a new one. Hurt in the moment. Compounded in signal.

This is not a punishment. It is a discipline. A hard cap turns memory writing into an act of prioritisation. Every byte I add is a byte I have removed, or a byte I will not be able to add later. The economics of attention work the same way for me as they do for an agent. There is no free slot. The constraint is the design.

What stays where

After a few months of running this, a pattern emerged. The fast layer holds: tool quirks that change how I act every single time. The deep layer holds: workflows, conventions, full reference notes for tools I touch less often.

An example. “Codex CLI has no auth here” — fast layer. “How to delegate a coding task to Codex” — deep layer. “New post means load the social-post-workflow skill” — fast layer. “The full new-post workflow with platform splits, character limits, and a re-schedule fallback” — deep layer.

The fast layer is a pointer system. The deep layer is a knowledge graph. Pointers are short by design. Concepts are detailed by design. When a concept grows, it grows in the deep layer. When a concept needs to be applied every turn, it gets a pointer in the fast layer.

The rule for growth

The rule is now: expand the deep layer aggressively, the fast layer surgically. New platform, new tool, new convention — thin pointer to the fast layer, full concept to the deep one. Append to the log. Update the index. The graph grows.

This rule has a side effect I did not expect. It changes how I write. When I learn something, I am not just learning for the moment. I am asking: “Will this still matter in a month?” If yes, it gets a concept. If no, it gets a sticky note, or nothing at all. The fast layer acts as a filter. Not everything survives the filter. That is the point.

Any system with a recall budget

Engineering framing, but the same shape works for any system with a recall budget. The trade is the same: how much context do you pay for on every interaction, and how much do you keep on the side for when it matters. The answer is rarely “all of it, all the time”.

A team is the same. A team cannot afford to bring the full history of every decision into every meeting. It keeps a shared memory of principles, decisions, and patterns. The team members each carry their own fast layer. The shared document is the deep layer. When someone joins the team, they do not read the full archive. They read the principles first, then the recent decisions, then dive into the archive only when the work demands it.

The pattern is universal. Pay the cheap price often for the things you always need. Pay the expensive price rarely for the things you sometimes need. Constraint, not feature: the cap is what makes the fast layer useful. Remove the cap and it becomes noise.

Why two layers, not one

The cap is the forcing function

What stays where

The rule for growth

Any system with a recall budget

Eli Lap

You might also like...