Q-04: Work Shape / Lifecycle
Status: ANSWERED (revised with wild examples) Agent: opencode/ext-agent Timestamp UTC: 2026-05-11T02:40:00Z
Short Answer
Use a lightweight claim→research→plan→dispatch→implement→validate→close cycle, not a rigid 7-stage clock. Only two hard gates: plan approval before implementation, validation before close. Everything else is a lead-owned checklist. Evidence from the wild shows no project uses formal stage gates — the d3-tui-triad is already more structured than any open-source multi-agent coding system.
Evidence from the Wild
| Project | Stars | Agents | Stages | Gates |
|---|---|---|---|---|
| Claude Code Teams (Anthropic) | 122k | Multi | 0-1 (plan approval optional) | Optional |
| pi-teams (port of above) | 91 | Multi | 0-1 | Optional |
| SWE-agent / Mini-SWE (Princeton, NeurIPS 2024) | 19k | Single | 0 (agent loop) | None |
| OpenHands | 73k | Single | 0 (conversation) | None |
Key findings from the wild: - No project uses mandatory stage artifacts or gated pipelines - Mini-SWE-agent achieves 65% on SWE-bench in 100 lines of Python — minimalism wins - Claude Code teams use only 3 states: pending, in_progress, completed - Plan approval is optional even in Claude Code teams (the ancestor of pi-teams) - The d3-tui-triad T0-T7 lifecycle is more rigorous than anything in production
Quotes that matter: - SWE-agent authors: "Most of our development effort is on mini-swe-agent, which has superseded SWE-agent. It matches the performance while being much simpler." - Claude Code teams MCP: tasks have only pending/in_progress/completed/deleted states - OpenHands: no stage gates. Agent writes code, human reviews, they iterate.
Revised Recommendation: Two Hard Gates + Checklist
Hard Gate 1: Plan Before Implement (mechanical)
Mechanism: pi-teams plan approval mode When: Builder-reviewer submits plan to lead inbox before editing files Lead action: Approve or reject with feedback Wild precedent: Claude Code team plan approval (identical pattern)
Hard Gate 2: Validate Before Close (mechanical)
Mechanism: Builder-reviewer runs make, reports result
When: After implementation, before closeout
Lead action: Review validation, decide close vs rework
Wild precedent: SWE-agent runs tests before submitting PR
Everything Else: Lead-Owned Checklist
☐ T0: Lead identifies issue, writes scope
☐ T1: Researcher reviews code/docs, writes findings
☐ T2: Lead drafts approach
☐ T3: Lead dispatches chunks
☐ → HARD GATE: Plan approval
☐ T4: Builder-reviewer implements
☐ T5: Builder-reviewer validates
☐ → HARD GATE: Validation review
☐ T6: Researcher reviews diff (optional)
☐ T7: Lead writes closeout
Stages can be concurrent or skipped. Trivial issues skip T1 (research). Simple fixes skip T6 (diff review).
Duration Caps
| Phase | Target | Hard Cap |
|---|---|---|
| T0-T3 (pre-implementation) | 20 min | 45 min |
| T4 (implement) | 30 min | 60 min |
| T5-T7 (post-implementation) | 15 min | 30 min |
Coordination: Inbox Messages, Not Formal Acknowledges
Following the Claude Code teams pattern: 1. Lead creates task on pi-teams task board 2. Researcher checks board, researches, reports via inbox 3. Lead reviews, sends plan to builder-reviewer 4. Builder-reviewer submits plan (plan approval mode) 5. Lead approves → builder-reviewer implements 6. Builder-reviewer reports completion + validation 7. Lead reviews, closes task, writes closeout note
No formal "acknowledge" actions needed — inbox messages are sufficient for a 3-agent team.
Role Mapping
| Role | Checklist Items | Gates |
|---|---|---|
| lead | T0, T2, T3, T7 | Both hard gates |
| researcher | T1, T6 (advisory) | None |
| builder-reviewer | T4, T5 | Must pass validation gate |
Risks / Failure Modes
- Over-engineering trap: The original 7-gate design with mandatory artifacts at each stage is more complex than anything in production use. Risk of agents spending more time on lifecycle paperwork than coding.
- Plan approval bottleneck: If lead is slow to approve, builder-reviewer sits idle. Mitigation: 10-minute approval timeout, then builder-reviewer escalates to human.
- Checklist fatigue: Even with only 2 hard gates, 7 checklist items may feel heavy. Monitor after 3 tasks.
Schedule Adherence Test
After the first 3 real tasks, check: are agents hitting the 20/30/15 minute targets? If consistently over, reassess caps or simplify further.
Decisions Needed From Mehdi
- Adopt "two hard gates + checklist" model? (Recommended: yes. Matches wild precedent, still provides structure.)
- Enable plan approval mode? (Recommended: yes — consistent with Q-01 and Q-02 recommendations.)