Skip to content

Delegation & Parallelization

The Three Pillars

Every effective prompt — whether it's a one-liner or a full user story — needs three elements:

Pillar What It Does Why It Matters
Scope Defines boundaries — what's included, what's not Without scope, AI guesses what you're asking for
Intent States the "why" — what outcome you're optimizing for Without intent, AI can't make informed tradeoffs
Structure Specifies the format of the output Without structure, you get whatever AI defaults to

These three pillars are the foundation for clear communication with AI. They apply to everything from a quick question to a complex feature spec.

User Stories: Empathy as Engineering

The format that reliably delivers all three pillars is the user story:

As a [type of user], I want [capability], so that [outcome].

Notice that every story starts with a person — not a feature, not a technology, not a screen. That's intentional. The Explore step in the Explore → Plan → Implement → Verify workflow begins with understanding who you're building for: what problems they face, what would make their work easier, what behavior you're trying to change. A user story is a small act of empathy — you step into someone else's shoes and describe the world from their perspective.

That empathy isn't just a nice-to-have. It's what makes the "so that" clause work. "So that" is the steering wheel — when AI makes implementation decisions (and it will), the outcome you're optimizing for guides those decisions. Without it, AI guesses. With it, AI makes informed tradeoffs aligned to real user needs.

Acceptance criteria define when the story is done:

Given [starting condition], when [action happens], then [expected result].

Example:

As a forecast operations manager, I want to see a danger summary across all UAC zones on a single dashboard, so that I can quickly identify which zones need attention and where to deploy field resources.

Given forecast data has been ingested for all zones, when I open the dashboard, then each zone displays its current danger rating and the number of active avalanche problems.

Given a zone's danger rating is "considerable" or higher, when the dashboard loads, then that zone is visually highlighted and sorted to the top.

The user story format guarantees Scope ("danger summary across all UAC zones"), Intent ("quickly identify which zones need attention"), and Structure (implicit — a dashboard with specific display requirements). The acceptance criteria turn that into a verifiable contract. Together, they're a delegation contract — the agreement between you and AI on what "done" looks like before work starts.

At this level, delegation contracts aren't just communication tools — they're what enable autonomy. When a spec is precise enough, you don't need to watch the work happen. You verify the output against the criteria. The spec is the contract; the tests are the enforcement.

The delegation-ready test from Blue Square still applies:

Question If Yes If No
Can you write 2-4 specific acceptance criteria? Ready to delegate Needs more Explore/Plan work
Is the scope bounded? Ready to delegate Decompose further
Can it be built and tested independently? Safe to parallelize Sequence it
Would you know a good result if you saw one? Delegate with confidence Understand the problem first

What changes at scale: you're not running one delegation at a time. You're running five.

Parallel Execution

The Blue Square track introduces background execution and sub-agents. Here's the elevation for Black Diamond:

Background execution lets AI work while you focus elsewhere. You can ask for work to be done in the background through your prompt, or push a running task to the background on the fly.

Include "in the background" in your prompt, or press Ctrl+B while a task is running. Check status with /tasks. Start a fresh conversation with /clear — your project context and skills load automatically.

Run separate Codex instances in parallel terminals. Each instance works independently with its own context.

Launch separate pi instances for parallel work. Each instance maintains its own conversation and context.

Sub-agents are focused AI instances that handle specific parts of a complex task. Your AI coding assistant recruits them behind the scenes — one for research, one for tests, one for implementation. Each gets a full context window dedicated to its slice of the work.

The Visibility Problem

Here's the wall you're about to hit.

You can decompose, spec, and parallelize effectively. Your context engineering keeps every workstream focused. Your skills enforce conventions across contributors. Your deployment pipeline gates on tests. You're producing more in a single sprint than most teams produce in a week.

But: is the output correct?

Not "does the code run" — your tests check that. Not "does it match the spec" — your acceptance criteria check that. The question is: when your AI analysis layer generates a briefing about Salt Lake, does it accurately reflect the forecaster's published assessment? When an AI-generated contextual alert summarizes conditions, did it capture the right risk factors? When five workstreams deliver results simultaneously, do they integrate correctly?

Manual review doesn't scale to this velocity. You can't read every line of code across five parallel workstreams. You can't eyeball every analysis result against the actual forecast. You need a way to measure correctness systematically — not just "does it work" but "does it work right."

That's what Lift 2 is about.

Team Discussion: The Visibility Problem

Format: Team Discussion Time: ~2 minutes

Think about the project you're about to build: a forecast intelligence platform that ingests published forecasts alongside weather and snowpack data, then uses AI to generate enriched analysis, contextual alerts, and cross-zone synthesis.

Discuss: If you parallelized the build — one workstream for data ingestion, one for the AI analysis layer, one for alerts, one for the dashboard — how would you know if the AI-generated analysis accurately reflects the published forecasts? What would "correct" even mean for an AI-generated briefing? How would you test it?

Key Insight

Delegation at scale requires precision at the contract level. The same frameworks — Three Pillars, delegation contracts, Explore → Plan → Implement → Verify — that work for single tasks also work for parallel workstreams. What changes is that manual review stops scaling. You can parallelize the building. You can parallelize the testing. But verifying that the system produces correct results — not just code that runs — requires measurement. That's the gap this track closes.