Your Criteria Are Already Tests¶
The Honest Tradeoff¶
In Lift 2, you learned criteria-based review — walking through each acceptance criterion, checking the actual output, making a clear pass/fail call. That discipline works. But you also heard the honest tradeoff: manual review is slow, and it doesn't catch regressions.
Now feel it concretely. Every feature reviewed by hand. Every acceptance criterion walked through individually. And the question that keeps growing: when you add a new feature, did it break something you already verified? You won't know unless you re-check everything. And re-checking everything doesn't scale.
The Two-Week Cliff¶
Here's what happens when manual review is your only safety net:
Week 1: You build an observation submission form. It works. You verify it manually — all acceptance criteria pass.
Week 2: You add zone-based filtering to the feed. It works too. You verify the new feature — all criteria pass.
The cliff: A user tries to submit an observation and the form is broken. The filtering change touched shared code, and you only verified the new feature. No automated checks warned you that the submission form broke.
This is the two-week cliff — rapid progress followed by a collapse when changes silently break things that used to work. As one practitioner put it: "You can make insane progress in about a week. I have yet to see that code function beyond the two-week mark." The pattern is consistent: AI can build features fast, but without automated checks, one change can undo a week of verified work.
Manual verification is a point-in-time check. It tells you "this worked when I looked at it." It doesn't tell you "this still works after the last three changes." That gap between what you checked and what's still true is the validation gap — and it grows with every change you make.

The Insight You Already Have¶
Here's the good news: you already know how to write test specifications. You've been writing them since Lift 1.
Look at an acceptance criterion from Lift 2:
Given I'm on the observation form, when I select "avalanche" as the observation type, then additional fields appear for aspect, elevation band, and avalanche size.
Now look at how an automated test works:
- Given (setup): Navigate to the observation form
- When (action): Select "avalanche" as the observation type
- Then (check): Verify that fields for aspect, elevation band, and avalanche size appear
Same structure. Same logic. Same words. Your acceptance criteria in Given/When/Then format are test specifications — they just need to be translated into code. And that's exactly what your AI coding assistant can do.
This is the "double duty" concept from Lift 2 taken one step further. In Lift 2, your criteria served as spec (telling AI what to build) and manual test (telling you what to check). Now they serve as spec and automated test — a test that checks itself every time you make a change.
The Shift¶
| Manual Review (Lift 2) | Automated Tests (Lift 3) | |
|---|---|---|
| Who checks | You, walking through each AC | Code that runs through them automatically |
| When it checks | When you remember to | Every time anything changes |
| What it catches | What you look at right now | Regressions across the entire project |
| How it scales | It doesn't — more features = more manual work | It does — more tests = more coverage, same effort to run |
You're not replacing your judgment — you're extending it. You still write the acceptance criteria. You still decide what "done" means. But instead of being the only one who can verify, you teach the machine to verify for you.
Team Discussion: Your Biggest Fear¶
Format: Team Discussion Time: ~2 minutes
Think back to Run 2. You built features, verified them manually, and moved on.
Discuss: What's the feature you're most worried about breaking when you add something new in Run 3? Why? What would it take to feel confident that it still works after every change — without re-checking it by hand?
Key Insight¶
Your acceptance criteria are already test specifications. The Given/When/Then format you learned in Lift 1 and practiced in Lift 2 maps directly to automated test structure — setup, action, check. The only difference is who runs the check: you (manual review) or code (automated tests). In the next section, you'll hand your criteria to AI and get that code back.