Q-03: Framework Comparison
Status: ANSWERED Agent: opencode/ext-agent Timestamp UTC: 2026-05-11T04:00:00Z Claim: CLAIM | opencode/ext-agent | 2026-05-11T03:30:00Z
Prior Answers Checked
- Q-00 (Architecture Synthesis): Recommends pi-teams only, avoid LangGraph/CrewAI/AutoGen
- Q-01 (Pi Teams Fit): Confirms pi-teams is sufficient for MVP; no alternative needed
- Q-02 (LangGraph Fit): No — LangGraph adds Python runtime, orchestration collision, and 2-3 sessions of pre-staging
- Q-05 (Knowledge Depot): LLM-wiki file-based, no RAG needed
- Q-06 (Container Shape): Single Docker container, ext4 bind mounts
Short Answer
None of CrewAI, AutoGen, or PydanticAI justify replacing pi-teams for MVP orchestration. All three require Python (a second runtime on the Pi), lose the tmux GUI surface, and duplicate the team/task/messaging infrastructure that pi-teams already provides.
However, PydanticAI is worth considering as a selective enhancement — not for orchestration, but for typed output validation (T5) and structured evals (T6). Its Pydantic-model-based output enforcement is genuinely complementary to pi-teams and could reduce false confidence without replacing the coordination layer.
Primary Recommendation
Orchestration: pi-teams (keep) — already works, Bun-based, tmux GUI
Validation: PydanticAI (defer) — typed outputs, evals, T5/T6 enhancement
Reject for MVP: CrewAI, AutoGen — Python runtime, replace pi-teams, too heavy
Evidence
Framework Profiles
pi-teams (Baseline)
| Attribute | Value |
|---|---|
| Runtime | Bun (JavaScript/TypeScript) |
| Install size | ~5 MB |
| Memory footprint | ~50 MB |
| Python dependency | None |
| Agent coordination | Built-in (task board, messaging, polling) |
| UI surface | tmux panes (native) |
| Task lifecycle | Claim board + plan approval mode |
| Current status | Running in container on relik-pi4 |
| Multi-agent model | Predefined teams (teams.yaml), agent configs (.pi/agents/) |
| External agent support | Custom (EXTERNAL-AGENT-PROMPT.md + LLM-wiki) |
CrewAI
| Attribute | Assessment |
|---|---|
| Runtime | Python 3.10-3.13 (pip install crewai crewai-tools) |
| Install size | ~200+ MB (dependencies: pydantic, openai, langchain-core, chromadb, etc.) |
| Memory footprint | ~300+ MB runtime |
| Agent coordination | Crews (autonomous) + Flows (deterministic event-driven) |
| UI surface | None built-in — CLI only, no tmux integration |
| Forgejo integration | None built-in |
| LLM-wiki integration | Would need custom tools — no file-based wiki concept |
| Strengths | Best-in-class balance of improvisation (Crews) + deterministic order (Flows). Role-based agents with tools. Enterprise security focus. |
| Weaknesses | Python dependency on Pi. Replaces pi-teams entirely. CrewAI Flows use decorators and Python classes; no YAML-based team definition. Would need 2+ sessions to port. |
CrewAI Architecture Mapping to d3-tui-triad:
Flow: D3TUIWorkcell (event-driven process)
├── @start() -> pick_next_task()
├── @listen(pick_next_task) -> research_crew.kickoff() # T1
├── @listen(research_crew) -> plan_crew.kickoff() # T2
├── @listen(plan_crew) -> implement_crew.kickoff() # T4
└── @listen(implement_crew) -> validate_crew.kickoff() # T5
This maps cleanly to T0-T7 conceptually, but: - The Flow is Python code, not a YAML file. No equivalent of teams.yaml. - Each Crew spawn is an LLM call chain — expensive on Pi resources. - No tmux pane visibility. Debugging requires Python logs. - All existing pi-teams config (6793 bytes of working setup) becomes dead weight.
AutoGen (Microsoft Agent Framework)
| Attribute | Assessment |
|---|---|
| Runtime | Python 3.10+ (pip install autogen-agentchat autogen-core autogen-ext[openai]) |
| Install size | ~250+ MB (multiple packages) |
| Memory footprint | ~400+ MB runtime |
| Agent coordination | AgentChat (conversational) + Core (event-driven, distributed) |
| UI surface | AutoGen Studio (web-based, separate process on port 8080) |
| Forgejo integration | None built-in |
| LLM-wiki integration | Could register as custom tools, no native file wiki concept |
| Strengths | Distributed agents (gRPC), event-driven core for deterministic workflows, AutoGen Studio for prototyping, MCP support, Docker code executor, Microsoft-backed. |
| Weaknesses | Enterprise-focused — designed for distributed multi-service architectures, not a single Pi container. Framework in flux (0.2 to 0.4 broke backward compat). Studio requires separate HTTP server process. |
Key problems for this workcell: - AgentChat is conversational agents passing messages — similar to pi-teams but without tmux - Core is an event-driven distributed system — overkill for 3 agents in 1 container - AutoGen Studio adds HTTP server on resource-constrained Pi - gRPC distributed runtime is for multi-machine setups — irrelevant here
PydanticAI
| Attribute | Assessment |
|---|---|
| Runtime | Python 3.10+ (pip install pydantic-ai) |
| Install size | ~80 MB (leanest of the Python options) |
| Memory footprint | ~100 MB runtime |
| Agent coordination | Single-agent focused. Multi-agent via composition + Pydantic Graph (beta) |
| UI surface | CLI + Pydantic Logfire (observability). No tmux. |
| Forgejo integration | None built-in |
| LLM-wiki integration | None built-in |
| Strengths | Typed structured output — Pydantic models guarantee output shape with automatic retry. Evals framework — systematic testing of agent performance. Capabilities — composable bundles (web search, thinking, MCP). Model-agnostic — 20+ providers. Durable execution — Temporal/DBOS integrations. Not an orchestrator — designed to build individual agents with strong type guarantees, not team coordination. |
| Weaknesses | Not a multi-agent orchestrator — no task board, agent messaging, or team spawn. Multi-agent patterns are documented but lack team-level infrastructure. Python dependency. No tmux. Would need to run alongside pi-teams, not instead of it. |
PydanticAI as Complement:
pi-teams (orchestration layer — keep)
├── Lead agent (pi-teams) — dispatches, approves, coordinates
├── Researcher agent (pi-teams) — reads repos, searches wiki
└── Builder-reviewer agent (pi-teams) — edits code, runs validation
└── PydanticAI eval (optional enhancement)
├── typed output validation (Pydantic models)
├── structured test runs (evals framework)
└── Logfire observability
PydanticAI would sit under pi-teams, not instead of it. It's a T5/T6 enhancement, not an orchestration replacement.
Detailed Comparison Matrix
| Concern | pi-teams | CrewAI | AutoGen | PydanticAI |
|---|---|---|---|---|
| Orchestration model | Teams + task board | Flows + Crews | AgentChat / Core | Single agent (no team orchestrator) |
| Improvisation | Autonomous polling + agent autonomy | Crews: free agent collaboration | AgentChat: free-form conversation | Agent-level autonomy + tools |
| Deterministic order | Plan approval + claim board (soft) | Flows: event-driven + state (hard) | Core: event-driven + routing (hard) | Pydantic Graph: type-safe (hard, beta) |
| Setup friction | Already running. Zero. | 2-3 sessions to port | 2-3 sessions to port | 1 session as complement |
| Runtime on Pi | Bun (~50 MB). Single. | Python (~300 MB). Second. | Python (~400 MB). Second. | Python (~100 MB). Second. |
| tmux GUI | Native panes = agent views | None | Studio web UI (port 8080) | None (Logfire web) |
| Forgejo | Prompted, not built-in | Custom tools needed | Custom tools needed | Custom tools needed |
| LLM-wiki | Direct file I/O | Custom tools needed | Custom tools needed | Custom tools needed |
| External agents | EXTERNAL-AGENT-PROMPT.md protocol | No concept | No concept | No concept |
| Structured output | Prompt-level only | Pydantic in task defs | Pydantic in response types | Best in class |
| Evals/testing | None built-in | None | None | Built-in evals framework |
| Durable execution | No (LLM-wiki is durable) | No | Partial | Yes (Temporal/DBOS/etc) |
Fit For This Pi Workcell
CrewAI: REJECT for MVP
CrewAI is the strongest conceptual match (Flows = T0-T7, Crews = triad roles) but practical cost is too high: 1. Replaces pi-teams entirely. All existing config, container setup, prompts — dead weight. 2. Python on Pi. cgroup/memory-limit issues (per Q-06) compounded by ~300 MB runtime. 3. Loses tmux GUI. Debugging becomes Python log diving. 4. Two new concepts to learn. Flows AND Crews before productive work resumes. 5. No Forgejo/LLM-wiki. Must be built as custom tools.
Migration target if pi-teams is outgrown. CrewAI's Flows+Crews model is the best conceptual fit for T0-T7 + triad roles among all alternatives.
AutoGen: REJECT for MVP (and foreseeable future)
Wrong shape entirely: 1. Enterprise distributed systems focus. gRPC, multi-language, web UI — for multi-service architectures, not 3 agents in 1 container. 2. Framework in flux. 0.2 to 0.4 migration broke backward compat. 3. AgentChat vs Core dilemma. Choosing is itself a research task. 4. No unique advantage. Everything AutoGen does that pi-teams doesn't is irrelevant to this workcell.
PydanticAI: DEFER (selective T5/T6 use if hooks insufficient)
Unique: not an orchestrator — complementary rather than competitive.
Worth using for: 1. Typed output validation (T5). Pydantic models guarantee output shape: ```python class BuildResult(BaseModel): success: bool errors: list[str] warnings: list[str] artifacts: list[str]
validator = Agent("openai:gpt-4o", output_type=BuildResult) # Guaranteed valid BuildResult or agent retries until valid ``` 2. Structured evals (T6). Systematic testing with datasets, LLM judges, performance tracking. 3. Capabilities. Composable tool bundles for the builder-reviewer.
Not worth using for: orchestration, team coordination, task lifecycle, agent messaging.
Decision: Defer. Quality gate hooks (shell scripts running make/lint) provide simpler first-line validation. Add PydanticAI only if hooks prove insufficient.
Risks / Failure Modes
- Second-runtime cascade. Any Python framework adds: runtime management, pip dependencies, virtual environments, dual model configs. pi-teams single-runtime simplicity is an operational advantage.
- Orchestration schizophrenia. If PydanticAI creeps into coordination, the team has: pi-teams task board + LLM-wiki claim board + Pydantic Graph. Three sources of truth.
- Framework abandonment risk. Python framework artifacts (Flow defs, agent configs, Pydantic models) become dead code. pi-teams + LLM-wiki use only Markdown and Bash — nothing rots.
- Dependency rot on Pi. Python AI libraries often have C extensions. ARM64 compatibility on Debian is not guaranteed. Bun is precompiled for ARM64.
Decision Needed From Mehdi
- CrewAI and AutoGen: rejected for MVP? (Recommended: yes. Replace pi-teams, add Python, lose tmux, 2-3 sessions of framework plumbing.)
- PydanticAI for validation: defer or add? (Recommended: defer. Quality gate hooks first. Add PydanticAI if hooks insufficient.)
- If PydanticAI is added later: separate service or same container? (Separate service with API — keeps runtimes clean.)
- Long-term migration path: CrewAI? (Yes, flag as migration target if pi-teams outgrown. Best conceptual fit but not needed now.)
Next Probe
Enable pi-teams quality gate hooks in the running container. Write a minimal make hook that fires on task completion. Verify it catches build failures. If hooks suffice, Python frameworks are unnecessary. If not, PydanticAI for typed validation becomes urgent.
Summary
| Framework | MVP Verdict | Reasoning |
|---|---|---|
| pi-teams | KEEP | Already working. Bun. tmux. Sufficient for 3 agents. |
| CrewAI | REJECT | Python, replaces pi-teams, loses tmux. Best migration target if pi-teams outgrown. |
| AutoGen | REJECT | Enterprise, distributed, framework in flux. Solves problems we don't have. |
| PydanticAI | DEFER | Complementary, not competitive. Consider for T5/T6 if quality gate hooks insufficient. |