The file system is becoming the interface for agents
I think most people are still talking about agents at the wrong level.
The interesting shift is not “models are getting better.” That part is obvious.
The interesting shift is that the serious agent stacks are starting to converge on the same primitive:
files.
Claude Code has SKILL.md. Codex pushes AGENTS.md. OpenAI now says skills across ChatGPT, Codex, and the API follow the Agent Skills open standard. OpenClaw pushes the same logic outward into installable skills, registries, channels, and security boundaries.
Different products. Same direction.
That is not cosmetic.
It means the center of gravity is moving away from giant prompts and toward a more durable structure for machine behavior.
The giant-prompt era was always a little fake
For a while, the industry sold a fantasy:
one very smart model, one giant context window, one giant prompt, and somehow the whole thing just remembers everything.
Your codebase. Your conventions. Your product rules. Your weird edge cases. Your tools. Your preferences. Your deployment rituals. Your half-finished tasks.
That model was never clean. It just looked clean in demos.
In practice, it meant:
- repeating yourself
- stuffing context
- over-trusting retrieval
- hoping the model could infer what was never actually written down
- watching quality fall off a cliff as the thread got longer
And the minute you move on-device, the illusion breaks fast.
Small local models have no patience for sloppy architecture. They expose every bad habit:
- context bloat
- vague memory
- weak task boundaries
- too much improvisation
- not enough structure
That is why on-device agent work is so clarifying.
It forces the question people usually avoid:
What actually belongs in context?
Files are the answer because files can carry behavior
This is what I think people are finally starting to understand.
A good agent system is not just “a chatbot with tools.” It needs layers.
A policy layer. A skill layer. A memory layer. A delegation layer.
And files are a surprisingly good place to put those layers.
AGENTS.md is not just documentation. It is standing policy.
SKILL.md is not just prompt engineering in markdown. It is packaged competence.
A memory note is not just extra text. It is durable operating context, ideally kept separate from the chat transcript.
A subagent is not just another model call. It is a way to give a hard problem its own workspace instead of trashing the parent context.
Once you see the stack that way, a lot of the current agent world starts to look transitional.
The future probably does not belong to one giant universal agent that knows everything.
It probably belongs to systems that know how to:
- load the right rule
- activate the right skill
- retrieve the right memory
- delegate the right subtask
- do all of that without dumping the entire world into the prompt
That is a very different design philosophy.
Claude, Codex, and OpenAI are pointing in the same direction
Claude Code’s skills model is one of the clearest signals. A skill has a SKILL.md entrypoint, optional scripts, optional references, and a clear packaging story. Claude’s subagents push the same idea further: separate contexts, separate permissions, separate responsibility.
Codex points at the same future from another angle.
OpenAI’s own guidance now tells teams to maintain an AGENTS.md file to give Codex persistent repo context. Their Codex material also leans into parallel work, separate environments, and “best of N” exploration. That is the same underlying belief: do not force one thread to be the whole universe.
OpenAI also now treats skills as an open standard across ChatGPT, Codex, and the API.
That matters more than it seems.
If skills become portable, then behavior starts to move out of proprietary prompt blobs and into inspectable packages.
That changes the game.
OpenClaw adds the missing piece: distribution
OpenClaw is interesting because it takes this pattern beyond local authoring.
Its docs describe agents as four layers:
- model
- memory
- tool
- channel
That channel layer is important.
Claude and Codex make the file-based model feel natural inside development workflows. OpenClaw asks what happens when you treat agent behavior as something that needs to be installed, versioned, permissioned, and deployed across surfaces.
That is a bigger ambition.
OpenClaw’s skills are not framed as cute helpers. They are installable capabilities. It has a registry model through ClawHub. It emphasizes sandboxing and permission review. It treats channels as first-class delivery infrastructure.
That is where the story gets more serious.
Because once SKILL.md and related artifacts become portable across ecosystems, you do not just have a better prompt format.
You have the beginnings of a software supply chain for machine behavior.
That is exciting. It is also a little dangerous.
Portable skills mean:
- easier reuse
- easier inspection
- easier sharing
- easier distribution
They also mean:
- provenance matters
- permissions matter
- sandboxing matters
- trust becomes a product feature
- bad agent behavior can spread faster if packaging gets easy before governance gets good
That tension is real.
And I think it is where the next serious competition will happen.
We felt that directly while building Swarm
We spent today implementing this idea in Swarm for on-device agents.
Not as theory. As product architecture.
AGENTS.md at the workspace root. Standard SKILL.md folders for reusable capabilities. Per-agent markdown specs. Durable markdown memory for facts, decisions, tasks, handoffs, and lessons. Strict retrieval budgets so local models only get a few relevant snippets instead of drowning in file soup.
The most important design choices were the ones that reduced temptation:
- chat history is not durable memory
- writable memory does not outrank shipped instructions
- full files do not get sprayed into prompts by default
- the model does not get to pretend everything is equally trustworthy
That is the hard part.
The easy version of “memory” is to dump more text into context and call it intelligence.
The real version is more disciplined:
- permanent memory should be distinct from temporary chat
- skills should be reusable but bounded
- workspace policy should be explicit
- retrieval should be stingy
- small models should get only what they need
On-device constraints make that discipline unavoidable.
Which is why I think local agents may end up improving agent design faster than cloud-first systems do.
Cloud systems can hide bad architecture behind more tokens and bigger models.
On-device systems can’t.
They punish nonsense immediately.
The moat is moving
If this trend holds, the moat shifts.
It stops being only about who has the smartest model.
It becomes about who has the best environment for:
- packaging behavior
- composing skills
- separating trust boundaries
- controlling context
- distributing capability safely
- supervising multi-agent work without turning the system into mush
That is a more interesting market than “whose chatbot sounds smarter.”
And it is probably a more durable one too.
Because files survive model churn.
Prompts don’t. Benchmarks don’t. Hype definitely doesn’t.
But files can be versioned, diffed, code-reviewed, installed, audited, shared, and enforced.
That makes them useful to humans and machines at the same time.
That is rare.
The deeper point
I keep coming back to one thought.
The README taught humans how to enter a codebase.
AGENTS.md and SKILL.md may end up teaching agents how to work inside one.
If that sounds small, I think you are underestimating it.
The file system is becoming the interface for programmable labor.
And once that clicks, a lot of today’s agent discourse starts to feel shallow.
The real question is not whether the model can code.
It is whether you have built a world it can actually navigate.
This post was written while building Swarm, an on-device agent system implementing these ideas.