ANSWERED

Q-04: Work Shape / Lifecycle

Status: ANSWERED (revised with wild examples) Agent: opencode/ext-agent Timestamp UTC: 2026-05-11T02:40:00Z

Short Answer

Use a lightweight claim→research→plan→dispatch→implement→validate→close cycle, not a rigid 7-stage clock. Only two hard gates: plan approval before implementation, validation before close. Everything else is a lead-owned checklist. Evidence from the wild shows no project uses formal stage gates — the d3-tui-triad is already more structured than any open-source multi-agent coding system.

Evidence from the Wild

Project Stars Agents Stages Gates
Claude Code Teams (Anthropic) 122k Multi 0-1 (plan approval optional) Optional
pi-teams (port of above) 91 Multi 0-1 Optional
SWE-agent / Mini-SWE (Princeton, NeurIPS 2024) 19k Single 0 (agent loop) None
OpenHands 73k Single 0 (conversation) None

Key findings from the wild: - No project uses mandatory stage artifacts or gated pipelines - Mini-SWE-agent achieves 65% on SWE-bench in 100 lines of Python — minimalism wins - Claude Code teams use only 3 states: pending, in_progress, completed - Plan approval is optional even in Claude Code teams (the ancestor of pi-teams) - The d3-tui-triad T0-T7 lifecycle is more rigorous than anything in production

Quotes that matter: - SWE-agent authors: "Most of our development effort is on mini-swe-agent, which has superseded SWE-agent. It matches the performance while being much simpler." - Claude Code teams MCP: tasks have only pending/in_progress/completed/deleted states - OpenHands: no stage gates. Agent writes code, human reviews, they iterate.

Revised Recommendation: Two Hard Gates + Checklist

Hard Gate 1: Plan Before Implement (mechanical)

Mechanism: pi-teams plan approval mode When: Builder-reviewer submits plan to lead inbox before editing files Lead action: Approve or reject with feedback Wild precedent: Claude Code team plan approval (identical pattern)

Hard Gate 2: Validate Before Close (mechanical)

Mechanism: Builder-reviewer runs make, reports result When: After implementation, before closeout Lead action: Review validation, decide close vs rework Wild precedent: SWE-agent runs tests before submitting PR

Everything Else: Lead-Owned Checklist

 T0: Lead identifies issue, writes scope
 T1: Researcher reviews code/docs, writes findings
 T2: Lead drafts approach
 T3: Lead dispatches chunks
  HARD GATE: Plan approval
 T4: Builder-reviewer implements
 T5: Builder-reviewer validates
  HARD GATE: Validation review
 T6: Researcher reviews diff (optional)
 T7: Lead writes closeout

Stages can be concurrent or skipped. Trivial issues skip T1 (research). Simple fixes skip T6 (diff review).

Duration Caps

Phase Target Hard Cap
T0-T3 (pre-implementation) 20 min 45 min
T4 (implement) 30 min 60 min
T5-T7 (post-implementation) 15 min 30 min

Coordination: Inbox Messages, Not Formal Acknowledges

Following the Claude Code teams pattern: 1. Lead creates task on pi-teams task board 2. Researcher checks board, researches, reports via inbox 3. Lead reviews, sends plan to builder-reviewer 4. Builder-reviewer submits plan (plan approval mode) 5. Lead approves → builder-reviewer implements 6. Builder-reviewer reports completion + validation 7. Lead reviews, closes task, writes closeout note

No formal "acknowledge" actions needed — inbox messages are sufficient for a 3-agent team.

Role Mapping

Role Checklist Items Gates
lead T0, T2, T3, T7 Both hard gates
researcher T1, T6 (advisory) None
builder-reviewer T4, T5 Must pass validation gate

Risks / Failure Modes

  1. Over-engineering trap: The original 7-gate design with mandatory artifacts at each stage is more complex than anything in production use. Risk of agents spending more time on lifecycle paperwork than coding.
  2. Plan approval bottleneck: If lead is slow to approve, builder-reviewer sits idle. Mitigation: 10-minute approval timeout, then builder-reviewer escalates to human.
  3. Checklist fatigue: Even with only 2 hard gates, 7 checklist items may feel heavy. Monitor after 3 tasks.

Schedule Adherence Test

After the first 3 real tasks, check: are agents hitting the 20/30/15 minute targets? If consistently over, reassess caps or simplify further.

Decisions Needed From Mehdi

  1. Adopt "two hard gates + checklist" model? (Recommended: yes. Matches wild precedent, still provides structure.)
  2. Enable plan approval mode? (Recommended: yes — consistent with Q-01 and Q-02 recommendations.)