Reflection 2¶
A facilitation guide for the team debrief following Run 2. These questions are designed to spark discussion — not every question needs to be covered. Pick the ones that resonate with what you observed during the run.
What You Built¶
- What skills did your team create? Walk through one — what problem did it solve, and how did it change the consistency of AI's output?
- Did you do the process with AI first — refining until the output was right — and then capture it as a skill? Or did you skip straight to "create a skill for X"? What was the difference in quality?
- Compare something you built in Run 1 (without skills) to something you built in Run 2 (with skills). What's different about the result? What's different about how you got there?
- Did you find that adding a new feature broke something that was already working? What happened, and how did you handle it?
- Did you ask AI to check its own work — whether the codebase was well-organized, whether anything should be cleaned up before adding more? If so, what did it find? If not, what might have been building up underneath features that "worked"?
What You Practiced¶
- How did decomposition change the way you worked compared to Run 1? Did having a managed backlog of independently shippable pieces change how your team made decisions about what to build next?
- When you reviewed against acceptance criteria, did you catch failures that you would have missed with a "looks good" check? What did a specific pass/fail call teach you about the quality of what AI produced?
- Lift 2 described the spinning loop — re-prompting in circles without clear criteria. Did you catch yourself in the loop during this run? What pulled you out?
How You Worked¶
- How did your team divide the work this time? Did the decomposed backlog change how you organized — could different people take different stories, or did you still need to mob?
- Did your skills help the team stay aligned, or did different team members still produce different conventions?
Looking Ahead¶
- Manual review works, but it doesn't scale. You walked through acceptance criteria by hand for every feature. How many features do you have now — and did you re-check the earlier ones after adding new ones? The faster you ship without automated checks, the more likely a new change quietly breaks something you already verified. What would it take to make that verification automatic?