Meta-planning: from baseline to phased agent workspaces

Open Table of contents

Why a single agent thread does not scale to a baseline
Layer 1: Generate and refine the meta-plan—and put humans on the record
Layer 2: Break the meta-plan into testable chunks and phase prompts
Layer 3: One workspace per phase—prompt plus operations
Layer 4: A staff- or principal-level readiness pass (still an agent—still not infallible)
Execution: press “build” one phase at a time
- Plans and prompts are not set in stone
- Version history belongs in Git
Further reading

Why a single agent thread does not scale to a baseline

Production readiness is a system problem. It includes the repo, the pipeline, secrets and config, data migrations, metrics and alerts, access controls, rollback story, and the manual checks your team trusts. A single prompt thread tends to interleave those concerns, lose track of global constraints, and optimize for whatever looks like progress in the moment. You get merges that are hard to review and a story that is even harder to audit later.

Meta-planning means naming the baseline first—what “done” means for this initiative—and then decomposing everything that must become true to reach it. Examples of baselines:

The service runs in the production environment with the right routing and auth.
Observability exists at a level you would defend in an incident.
Data paths are safe to run twice (idempotency) or safe to roll back.
Operational runbooks and ownership are clear enough that on-call is not guessing.

AI agents (like Cursor or Claude Code) are good at drafting inventories, suggesting sequencing, and surfacing forgotten steps. They are not good at owning the truth about your environment. That split is the backbone of the rest of this workflow.

Layer 1: Generate and refine the meta-plan—and put humans on the record

Start by using an agent (or several short sessions) to build a meta-plan anchored to your baseline. Good meta-plans usually include:

A crisp definition of done for the baseline, in language your team would actually use in a review.
A dependency map: what must exist before what, including external systems and human gates (security review, data classification, etc.).
Workstreams that can progress in parallel versus steps that are strictly serial.
Risk and rollback notes: what breaks if you ship this increment, and how you would know quickly.

Refine the meta-plan iteratively. Ask for gaps, ask for “what would a skeptic ask,” and ask for failure modes—not because the model is always right, but because those questions produce a better draft.

Here is the part to internalize: errors at this layer compound. A wrong assumption in week-one sequencing does not stay in week one. It becomes the premise for phase prompts, branch strategy, and test plans. Every downstream workspace inherits it. So before you treat the meta-plan as real, stop and put humans on it: owners who know the system confirm that the scope, sequencing, and definition of done match reality. This is not a courtesy review. It is where you buy down multiplicative risk.

Layer 2: Break the meta-plan into testable chunks and phase prompts

Once the meta-plan is human-approved at the level of “this is the work,” translate it into chunks that are testable on their own. A chunk should have a clear falsifiable outcome: you can demonstrate completion without debating the entire program. Examples:

“CI builds and deploys to staging with the new config shape.”
“Read path hits the database with the new schema; write path is still behind a flag.”
“Dashboards and alerts exist for the golden signals we agreed on.”

Map those chunks to phases. For each phase, write a phase prompt: instructions that load codebase context and spell out what this slice must do—but do not paste the entire meta-plan every time. The phase prompt should include:

Objective for this phase only.
Constraints (compatibility, flags, “do not touch X yet”).
Verification: commands, checks, or manual steps that prove the phase is done.
Interfaces with other phases: contracts, file paths, or APIs that must stay stable.

If the prompt repeats the whole master narrative, agents will re-plan instead of execute, and you will fight drift. If the prompt is too narrow without the right repo context, they will miss implicit invariants. The balance is the craft.

The block below is a real shape we have used: it names the phase, ties it to a larger plan without dumping the whole document, and asks for exploration plus an implementation plan that includes git mechanics and prerequisites.

This is one slice of our production-readiness work for Golf Round Finder. A fuller breakdown lives in a separate planning doc; your job here is to go deeper only for this phase.

Phase name: “Phase J — Provision production resources (real accounts and infrastructure)”

Do a detailed exploration of this codebase and produce a concrete implementation plan for this phase: what to build, what to touch, and in what order.

The plan must include how we will use git: branching, commits, and PRs that match the steps (small, reviewable changes where possible).

It must also list everything we need before we start: environment variables, keys, cloud accounts, services, quotas, DNS, or access—anything that could block provisioning or deployment.

Attach the excerpt from our meta-plan that defines scope for this phase (goals, constraints, dependencies on other phases, “done” criteria). Treat that text as authoritative for boundaries; push back in the plan if the repo contradicts it.

When you run this prompt, paste the real excerpt after the quoted instructions (or inline in the same message). A placeholder might look like:

Phase J — goals: …
Out of scope: …
Depends on: …
Done means: …

You would replace that stub with text copied from your approved meta-plan. The phase name in the prompt doubles as the workspace label when you split work across directories or Cursor roots.

Layer 3: One workspace per phase—prompt plus operations

Give each phase its own workspace (or an equivalent isolation pattern). The goal is bounded context: the agent should not “helpfully” refactor adjacent areas or blur two phases into one diff.

Each workspace should carry not only the phase prompt but the operational glue your team already relies on:

Git conventions: branch naming, commit granularity, when to rebase versus merge, PR size expectations.
Migrations and flags: how you roll forward safely, and what “off” looks like.
Tophatting: exploratory manual validation—clicking the path, sending the request, reading the trace—whatever your culture calls proving it in a real environment.
Org-specific gates: security scans, change windows, approvals.

Then another human gate, at detail level: someone reads the phase plan the way they would read a design doc. What files move? What could break in production? How would we notice? Agents are fast; review is still O(human attention), and that is appropriate for the layer where execution starts to touch users and data.

%%{init: {"flowchart": {"curve": "basis", "padding": 8}, "themeVariables": {"fontFamily": "inherit"}}}%%
flowchart TB
  MP[Meta_plan human approved]
  CH[Testable chunks]
  PP[Phase prompts codebase scoped]
  WS[Workspace per phase plus ops context]
  HR[Human review per phase]
  SR[Staff_level readiness pass]
  EX[Execute phases in order verify]

  MP --> CH --> PP --> WS --> HR --> SR --> EX

Layer 4: A staff- or principal-level readiness pass (still an agent—still not infallible)

Even with good phase prompts, blind spots cluster around integration seams: partial deploys, two services’ assumptions about timeouts, the difference between “works in staging” and “safe in prod,” and the boring steps nobody wants to put in a ticket. That is where a staff-engineer persona prompt helps.

Use an agent session structured to challenge the combined plan: missing failure modes, unclear ownership, places where observability or rollback is weak, ordering risks, and anything that assumes perfect humans on the other side of a handoff. Ask it to improve the meta-plan and phase prompts until the critique stabilizes—fewer new gaps each pass.

This is not a rubber stamp. It is compression: you are hiring pattern-matching against a large slice of “how big systems fail,” with the understanding that the model can still hallucinate confidence. Treat the output as a review artifact to fold back into human-owned plans, not as authority.

A staff-style readiness prompt we have used looks like this (edited for clarity). You attach the current meta-plan or phase bundle as context; the model’s job is to score readiness and revise the plan text, not only to compliment it.

Review the attached plan as if you were a principal or staff engineer who knows this system and cares about shipping the best code and architecture we can—maintainability, operability, and correct sequencing included.

First, give an explicit readiness assessment for implementation: what is solid, what is ambiguous, what is missing or risky, and whether you would let the team execute as written.

Then update the plan itself: add, remove, reorder, or sharpen sections so the document is fit to execute. Prefer concrete edits over generic advice.

Run this after humans have done a first pass, not instead of one. The rating is a forcing function; it is still a model, not a staff engineer’s sign-off.

Execution: press “build” one phase at a time

When the layers above are in place, execution should feel almost boring. You have context-bound workspaces, scoped prompts, and explicit verification. Run phases in order. After each phase, run the checks you wrote when you were thinking clearly—not the checks the agent suggested at 11pm.

If something fails, fix forward in the smallest responsible step, and update the plan when reality diverges.

Plans and prompts are not set in stone

Meta-plans and phase prompts are hypotheses, not tablets. They should change as you put work into the world: each phase teaches you something about the codebase, the platform, and the seams between services. When what you ship diverges from what you wrote down, fix the document—otherwise the next agent session starts from a map everyone quietly knows is wrong, and compound error shows up again.

Treat updates as normal. A phase finishing is a good moment to reconcile the meta-plan and any phase prompts: what did we learn, what is no longer in scope, what new risk appeared, what order should shift? The staff-style pass (Layer 4) is not only for day one; you can run a lighter version after major milestones if the program is still large.

Version history belongs in Git

The easiest way to keep that history honest and inspectable is to keep plans in a Git repository. That might be the same repo as the product—many teams use a docs/, planning/, or runbooks/ tree—or a separate internal repo if that matches how you manage design docs. What matters is that changes are committed, not pasted into chat and lost.

Git gives you diffs and blame: you can see how the plan evolved, recover earlier wording when someone asks “why did we decide that?”, and open pull requests for substantive edits so plan changes get a second pair of eyes—similar to code. Linking plan commits to phase completions or merge commits is cheap traceability: future you (or a new teammate) can walk from “what we thought” to “what we shipped” without archaeology in Slack.

That is the practical meaning of meta-planning here: humans anchor truth at the top, agents accelerate drafting and iteration in the middle, and discipline at the bottom—one phase, one verification, one merge at a time, with documents that stay tied to reality—is what keeps the system honest.

Table of contents