🔥 Coding Agent: REPL is all you need

A year ago we recognized that a simple agent loop with tools is enough for programming tasks.

Mario Zechner's pi / pi-mono made the point: a coding agent doesn't need much. One bash tool is enough, and his ~1000-line harness already holds its own against Claude Code and Codex — a tenth of the size of the production agents, and the gap isn't where you'd expect it to be.

The reason it works is that agents are already good at two things that stack nicely: composing shell commands into pipelines, and writing code. So why ship them with a pre-baked toolbox when they could write the tools themselves, right when they need them, shaped to the user, the project, the task at hand?

Sound familiar? It's Emacs all over again. Emacs was never really an editor with plugins — it's a Lisp runtime that happens to edit text and rewrite itself on the fly, which is why forty years later people are still building whole workflows inside it. The same trick fits a coding agent: give it a runtime and a REPL, one ability — write and run code — and stop there. The rest (read, write, edit, grep, subagent, whatever comes up next) the agent will write on its own. Code composes much better than MCP or bash!

Hyper Code

In hyper-code I'm building exactly that — a runtime where the agent can live and extend itself. It's small enough that the agent holds the whole architecture in its context, and simple enough that it can extend that architecture without breaking anything.

A quick sketch of the runtime shape:

Bun as the main runtime — fast, lots of built-in libraries, zero dependencies.
Procedural/functional style: state separated from functions. One function per file, hot reload without a restart.
SQLite for sessions, in the spirit of out of the tar pit.
A web UI with HTTP and HTML — no reinventing the wheel with a TUI, no complexity of the VS Code API. Easy to extend and customize.

Example

To make this concrete, I asked Claude to come up with a realistic, non-trivial task an agent would handle in a single evalCode call. Here's what it came back with. Imagine a user saying: "pull release notes since the last tag — commits, authors, linked PRs, and cache the result for next time."

In the bash + MCP world that's five or six separate tools you'd have to describe and wire up ahead of time: git describe, git log, a conventional-commits parser, the GitHub API for PRs, SQLite for the cache, markdown formatting. Each one its own surface area, each one a round-trip to define and call.

In a REPL-first agent it's just one evalCode:

const repo = "niquola/hyper-code2";
const headers = { Authorization: `Bearer ${process.env.GITHUB_TOKEN}` };

// 1. Last tag and commits since
const tag = (await Bun.$`git describe --tags --abbrev=0`.text()).trim();
const raw = await Bun.$`git log ${tag}..HEAD --pretty=format:%H|%s|%an`.text();

// 2. Local SQLite cache — if we've been here already, return what we have
await ctx.fns.db.exec(ctx, `
    CREATE TABLE IF NOT EXISTS release_notes (
        tag TEXT PRIMARY KEY, head TEXT, body TEXT, at INTEGER
    )
`);
const head = (await Bun.$`git rev-parse HEAD`.text()).trim();
const cached = await ctx.fns.db.select(ctx,
    "SELECT body FROM release_notes WHERE tag = ? AND head = ?", [tag, head]);
if (cached.length) {
    agent.scratchpad.release = { tag, body: cached[0].body, cached: true };
    return { since: tag, cached: true, preview: cached[0].body.slice(0, 300) + "…" };
}

// 3. Parse conventional commits: "feat(scope): message (#42)"
const commits = raw.split("\n").filter(Boolean).map(line => {
    const [sha, subject, author] = line.split("|");
    const m = subject.match(/^(feat|fix|chore|refactor|docs|test)(?:\([^)]+\))?:\s*(.+?)(?:\s*\(#(\d+)\))?$/);
    return { sha, author, type: m?.[1] ?? "other", title: m?.[2] ?? subject, pr: m?.[3] };
});

// 4. Fetch linked PRs in parallel — labels and linked issues come with them
const prNums = [...new Set(commits.map(c => c.pr).filter(Boolean))];
const prs = await Promise.all(prNums.map(n =>
    fetch(`https://api.github.com/repos/${repo}/pulls/${n}`, { headers }).then(r => r.json())
));

// 5. Group by type, build markdown
const byType = Map.groupBy(commits, c => c.type);
const body = [...byType].map(([type, cs]) =>
    `## ${type}\n${cs.map(c =>
        `- ${c.title}${c.pr ? ` (#${c.pr})` : ""} — @${c.author}`
    ).join("\n")}`
).join("\n\n");

// 6. Save the cache — good until HEAD moves
await ctx.fns.db.exec(ctx,
    "INSERT OR REPLACE INTO release_notes VALUES (?, ?, ?, ?)",
    [tag, head, body, Date.now()]);

// 7. Heavy stuff stays in scratchpad. The model gets a short summary.
agent.scratchpad.release = { tag, commits, prs, body };
return {
    since: tag,
    commits: commits.length,
    by_type: Object.fromEntries([...byType].map(([t, cs]) => [t, cs.length])),
    contributors: [...new Set(commits.map(c => c.author))],
    prs_linked: prs.length,
    labels: [...new Set(prs.flatMap(p => p.labels?.map(l => l.name) ?? []))],
    preview: body.slice(0, 300) + "…",
};

From one-off script to your own tool

The first time, the agent writes this from memory. The second time, it notices it did almost the same thing last week. And the third time, instead of writing it again, it saves the code to a file and turns it into a proper project procedure:

// Turn N: pin the pattern
await Bun.write(".hyper/release/notes.ts", `
    export default async function (ctx: Context, repo: string) {
        // ... same logic as above — now a reusable function
        return { tag, commits, prs, body };
    }
`);

// Hot-load into the live runtime and regenerate types — no restart
await ctx.fns.repl.load(ctx, "release");
await ctx.genTypes(ctx);

// Turn N+1: call it like a built-in
await ctx.fns.release.notes(ctx, "niquola/hyper-code2");

Next time someone asks for release notes, it's one line instead of forty. The function lives under .hyper/, which means it sticks around in the project — and over a few weeks the codebase quietly grows a little library of procedures, each one shaped to this specific repo and this specific author. You can't get that out of the box from any fixed harness, because no fixed harness knows your habits.

Give it a try

The idea in one sentence: give the model the full runtime and a single tool, then let it write the rest — shaped to you and your project.

The prototype is at github.com/niquola/hyper-code2. About a thousand lines on Bun: one evalCode, hot-reloaded files, SQLite for sessions, a web chat at /. Works with OpenAI, Anthropic, Groq, OpenRouter, and a local LM Studio.