a flock of claudes
1. linear issues
I’ve been tinkering with coding agents for a few years now, and so far, they’ve felt burdened by the conceptual weight of their tools.
however, that’s been changing. primarily due to three things:
(1) they’ve been trained on tool traces, so they actually know when and how to call functions
(2) post‑training moved from pure ‘be helpful’ rlhf to reasoning‑centric feedback, verifier/unit‑test rewards, and scaled rl that favors planning and self‑correction)
(3) the runtime is sturdier: function calling, strict structured outputs, and agentic loops make multi‑step tool use feel more natural.
when I tried sonnet‑3.7 via the claude code cli, it was the first model that seemed genuinuly eager to use tools (and could do so coherently) for significant stretches of time, iteratively editing, testing, and retrying without being asked.
opus 4 feels smart but leans deferential in free‑form chat, unless you keep it engaged. however, when its in a console with well‑shaped tools? it’s a beast. comfortably wielding the toolbox, quick to plan, surgical with edits, and noticeably better the more tools you give it.
lately, I’ve been giving it a whole lot of tools.
issue tracking
the first limitation I ran into was constantly needing to explain everything.
initially, every conversation started from scratch: “this is a next.js project, use bun to build, etc.” eventually I added some CLAUDE.md files with general project info, which helped.
but I was still constantly providing context like
“so we were working on this bug in this file and it looked like we fixed it, but now it’s happening again”
my role was mainly that of narrator, and I wasn’t very good at keeping track of things.
so, I added the linear mcp server.
linear is a project management platform designed for software teams. it provides a structured model where each entity (issues, projects, cycles, teams, and users) has well-defined relationships and properties accessible through a graphql api. the platform’s github integration automatically syncs pull requests with issues and enables linking between code and tasks.
this interconnected data structure makes it well-suited for llm orchestration. an agent can query the api to understand the context of a team’s work, from high-level project roadmaps down to individual task dependencies. the server gives claude a comprehensive set of tools to view and edit issues in a workspace, and it laid the foundation for the rest of my “higher-order system.”
I created a new linear team for my project and updated the user-level CLAUDE.md to explain the workflow:
- Every task starts with a linear issue (create one if not provided)
- Track all progress via comments on the linear issue
now, whenever I had a task for claude, instead of typing it into the cli, I’d document it in a new issue with logs, screenshots, and context. then I’d note the issue id (e.g., GUI-10) and open the cli with “Let’s get started on GUI-10.”
this approach seemed to provide tremendous clarity of purpose. output code became more focused, and claude was less likely to veer off into major refactors of unrelated sections.
as each issue progresses, claude leaves a persistent trail of comments documenting insights, realizations, and decisions. once complete, the code comes with a condensed report on the entire decision-making process.
beyond the utility of external memory, I suspect that including linear tools, and operating through them, subtly contextualizes that claude is working on a “real team” contributing to a professional codebase.
as I settled into this workflow, most of my explanations shifted from ephemeral cli queries to durable issues and comments with easy-to-reference ids. for example:
hey that bug from GUI-12 is happening again
this concisely gives the model detailed records of the issue and attempted solutions without needing to retype everything from memory.
legibility issue
to an llm, the world is nothing more than the tokens you provide.
context-providing tools should deliver relevant information and nothing else. yet we keep offering clunky, unintuitive tools that output tons of useless structure and fluff, and require robotic interactions to use.
it might seem strange that we need to make machine structures human-readable before feeding them back to a machine, but I think this is an incredibly important and underappreciated aspect of creating llm-facing software.
consider the official linear mcp server: it provides a standard implementation of their graphql api, which, while perfectly functional, refers to all objects (issues, comments, labels, states) via 32-character hex uuids.
this means that to move TEST-123 into “Planning,” claude must make the following tool calls:
find_issue('TEST-123')
> uuid 'fcb357b6-7719-4550-b6e0-8fa5d8554d69'
team_uuid 'a210c784-b5d2-4dad-9dab-2ddd404b831e'
find_status('Planning', team='a210c784-b5d2-4dad-9dab-2ddd404b831e')
> 'de936f4d-5d04-41bd-8063-c5d85a319db6'
update_issue('fcb357b6-7719-4550-b6e0-8fa5d8554d69', 'de936f4d-5d04-41bd-8063-c5d85a319db6')
try reading this out loud and imagine keeping track of all these 32-character uuids and what each one refers to. it’s an immense waste of these models' cognitive effort to spend so many tokens on illegible tool calls.
further, the model messes up these complicated strings fairly often. it figures things out eventually, but each failed tool call introduces a history of failure into the chat record that it may be inclined to continue.
I want my agents to cluster around the “efficient teams building robust solutions” area of vector space rather than “stupid robot can’t even call basic tool.”
the easiest way to achieve this is to make things as intuitive and familiar as possible.
if a tool isn’t intuitive, why not?
I try to talk to opus like a coworker: friendly, knowledgeable, casual, and using as many human words as possible:
"hey, what are you working on?"
"oh just `CLA-110: Fix Backend Test Structure`"
"ah is it `Ready to Merge` or still `In Review`?"
instead of:
"hey, what are you working on?"
"oh just `fcb357b6-7719-4550-b6e0-8fa5d8554d69`"
"ah is it `de936f4d-5d04-41bd-8063-c5d85a319db6`
or `a210c784-b5d2-4dad-9dab-2ddd404b831e`?"
but the uuid problem was just half. the linear mcp also has a 65kb default limit on their api responses. this means that once an issue gets long enough (with claude leaving detailed progress comments, they get long fast) anything beyond that threshold becomes invisible to the model.
overall, it was complex to get information, and even once you got it, you might not actually have all of it.
so I built my own, or rather, claude did.
a better bridge
this linear mcp server provides claude with direct, legible access to linear, designed to limit the complexity of information / interaction while ensuring no data is truncated or lost.
everything is readable:
- team keys (SOFT)
- issue identifiers (SOFT-123)
- state names (“In Progress”)
- label names (“Sonnet”)
the server handles all uuid resolution internally, so claude never has to juggle those 32-character hex strings again.
to move TEST-123 to “Planning”, just call
update_issue('TEST-123', 'Planning')
when claude requests an issue, it gets back clean markdown formatting with just the essential information, no JSON clutter. the server strips out the noise and presents issues as readable documents with threaded comments. in addition, any images attached to linear issues are automatically downloaded and cached locally.
the difference in practice is dramatic. instead of struggling with each update, it treats linear as a natural extension of the TodoWrite tool that it’s already comfortable using.
2. scope & concurrency
after a few days with the linear mcp, I settled into a routine.
user: Let's work on GUI-32
claude: [gets issue via mcp]
[reads files]
Oh! I get it now!
[edits files]
[build/test/commit/pr]
this loop worked great for isolated tasks. but as the changes got more complex, two problems kept surfacing:
-
for cross-file work, claude would max out its context and “compact” midway through, losing track of details. I’d find half-finished refactors with dangling imports.
-
I was finding issues faster than a single agent could fix them. only one claude could safely edit the codebase at a time.
two-step workflow
the fix was to split each task into planning and implementation.
planning: claude operates read-only on main branch. a slash command tells it to identify critical files, document existing patterns, and build a focused roadmap. everything gets compressed into linear comments.
implementation: a bash script spins up a fresh claude with:
- dedicated branch and worktree
- the condensed plan from linear
- full edit privileges
- clear instructions to build, test, and pr
this killed both problems at once. multiple agents could work in parallel on different branches. and each implementation agent started with pre-digested context instead of burning tokens on exploration.
the planning agent became a “context compiler”, reading widely / summarizing tightly, while the implementation agent got to spend its entire contextual budget on actually building things.
$ claude "/plan TEAM-123"
...
[plan complete]
$ ./cimplement.sh TEAM-123
...
[pr ready]
automation
typing commands got repetitive, so I automated it.
I spun up n8n in docker and wrote a fastapi server with a /dispatch/[task]/[issue_id]
endpoint. when linear issues move to “Plan”/“Build” state, n8n catches the webhook and dispatches the right agent.
the downside: my visibility dropped to just linear comments. usually these were end-of-work summaries, making it hard to catch misunderstandings early.
idea-having
traditionally, having ideas is expensive. not the idea itself, but everything after: implementation, testing, documentation, convincing people. the overhead becomes a subconscious tax on innovation, and we self-censor.
but when implementation is nearly free, the math changes. ideas can turn into code in the background. parallel agents mean you don’t need to finish one idea before starting the next.
the constraint shifts from “can I build this” to “what should exist.”
part 3 - swarm soon