Q-03: Framework Comparison

Status: ANSWERED Agent: opencode/ext-agent Timestamp UTC: 2026-05-11T04:00:00Z Claim: CLAIM | opencode/ext-agent | 2026-05-11T03:30:00Z

Prior Answers Checked

Q-00 (Architecture Synthesis): Recommends pi-teams only, avoid LangGraph/CrewAI/AutoGen
Q-01 (Pi Teams Fit): Confirms pi-teams is sufficient for MVP; no alternative needed
Q-02 (LangGraph Fit): No — LangGraph adds Python runtime, orchestration collision, and 2-3 sessions of pre-staging
Q-05 (Knowledge Depot): LLM-wiki file-based, no RAG needed
Q-06 (Container Shape): Single Docker container, ext4 bind mounts

Short Answer

None of CrewAI, AutoGen, or PydanticAI justify replacing pi-teams for MVP orchestration. All three require Python (a second runtime on the Pi), lose the tmux GUI surface, and duplicate the team/task/messaging infrastructure that pi-teams already provides.

However, PydanticAI is worth considering as a selective enhancement — not for orchestration, but for typed output validation (T5) and structured evals (T6). Its Pydantic-model-based output enforcement is genuinely complementary to pi-teams and could reduce false confidence without replacing the coordination layer.

Primary Recommendation

Orchestration:  pi-teams (keep)        — already works, Bun-based, tmux GUI
Validation:     PydanticAI (defer)     — typed outputs, evals, T5/T6 enhancement
Reject for MVP: CrewAI, AutoGen        — Python runtime, replace pi-teams, too heavy

Evidence

Framework Profiles

pi-teams (Baseline)

Attribute	Value
Runtime	Bun (JavaScript/TypeScript)
Install size	~5 MB
Memory footprint	~50 MB
Python dependency	None
Agent coordination	Built-in (task board, messaging, polling)
UI surface	tmux panes (native)
Task lifecycle	Claim board + plan approval mode
Current status	Running in container on relik-pi4
Multi-agent model	Predefined teams (teams.yaml), agent configs (.pi/agents/)
External agent support	Custom (EXTERNAL-AGENT-PROMPT.md + LLM-wiki)

CrewAI

Attribute	Assessment
Runtime	Python 3.10-3.13 (`pip install crewai crewai-tools`)
Install size	~200+ MB (dependencies: pydantic, openai, langchain-core, chromadb, etc.)
Memory footprint	~300+ MB runtime
Agent coordination	Crews (autonomous) + Flows (deterministic event-driven)
UI surface	None built-in — CLI only, no tmux integration
Forgejo integration	None built-in
LLM-wiki integration	Would need custom tools — no file-based wiki concept
Strengths	Best-in-class balance of improvisation (Crews) + deterministic order (Flows). Role-based agents with tools. Enterprise security focus.
Weaknesses	Python dependency on Pi. Replaces pi-teams entirely. CrewAI Flows use decorators and Python classes; no YAML-based team definition. Would need 2+ sessions to port.

CrewAI Architecture Mapping to d3-tui-triad:

Flow: D3TUIWorkcell (event-driven process)
  ├── @start() -> pick_next_task()
  ├── @listen(pick_next_task) -> research_crew.kickoff()  # T1
  ├── @listen(research_crew) -> plan_crew.kickoff()       # T2
  ├── @listen(plan_crew) -> implement_crew.kickoff()      # T4
  └── @listen(implement_crew) -> validate_crew.kickoff()  # T5

This maps cleanly to T0-T7 conceptually, but: - The Flow is Python code, not a YAML file. No equivalent of teams.yaml. - Each Crew spawn is an LLM call chain — expensive on Pi resources. - No tmux pane visibility. Debugging requires Python logs. - All existing pi-teams config (6793 bytes of working setup) becomes dead weight.

AutoGen (Microsoft Agent Framework)

Attribute	Assessment
Runtime	Python 3.10+ (`pip install autogen-agentchat autogen-core autogen-ext[openai]`)
Install size	~250+ MB (multiple packages)
Memory footprint	~400+ MB runtime
Agent coordination	AgentChat (conversational) + Core (event-driven, distributed)
UI surface	AutoGen Studio (web-based, separate process on port 8080)
Forgejo integration	None built-in
LLM-wiki integration	Could register as custom tools, no native file wiki concept
Strengths	Distributed agents (gRPC), event-driven core for deterministic workflows, AutoGen Studio for prototyping, MCP support, Docker code executor, Microsoft-backed.
Weaknesses	Enterprise-focused — designed for distributed multi-service architectures, not a single Pi container. Framework in flux (0.2 to 0.4 broke backward compat). Studio requires separate HTTP server process.

Key problems for this workcell: - AgentChat is conversational agents passing messages — similar to pi-teams but without tmux - Core is an event-driven distributed system — overkill for 3 agents in 1 container - AutoGen Studio adds HTTP server on resource-constrained Pi - gRPC distributed runtime is for multi-machine setups — irrelevant here

PydanticAI

Attribute	Assessment
Runtime	Python 3.10+ (`pip install pydantic-ai`)
Install size	~80 MB (leanest of the Python options)
Memory footprint	~100 MB runtime
Agent coordination	Single-agent focused. Multi-agent via composition + Pydantic Graph (beta)
UI surface	CLI + Pydantic Logfire (observability). No tmux.
Forgejo integration	None built-in
LLM-wiki integration	None built-in
Strengths	Typed structured output — Pydantic models guarantee output shape with automatic retry. Evals framework — systematic testing of agent performance. Capabilities — composable bundles (web search, thinking, MCP). Model-agnostic — 20+ providers. Durable execution — Temporal/DBOS integrations. Not an orchestrator — designed to build individual agents with strong type guarantees, not team coordination.
Weaknesses	Not a multi-agent orchestrator — no task board, agent messaging, or team spawn. Multi-agent patterns are documented but lack team-level infrastructure. Python dependency. No tmux. Would need to run alongside pi-teams, not instead of it.

PydanticAI as Complement:

pi-teams (orchestration layer — keep)
  ├── Lead agent (pi-teams) — dispatches, approves, coordinates
  ├── Researcher agent (pi-teams) — reads repos, searches wiki
  └── Builder-reviewer agent (pi-teams) — edits code, runs validation
        └── PydanticAI eval (optional enhancement)
              ├── typed output validation (Pydantic models)
              ├── structured test runs (evals framework)
              └── Logfire observability

PydanticAI would sit under pi-teams, not instead of it. It's a T5/T6 enhancement, not an orchestration replacement.

Detailed Comparison Matrix

Concern	pi-teams	CrewAI	AutoGen	PydanticAI
Orchestration model	Teams + task board	Flows + Crews	AgentChat / Core	Single agent (no team orchestrator)
Improvisation	Autonomous polling + agent autonomy	Crews: free agent collaboration	AgentChat: free-form conversation	Agent-level autonomy + tools
Deterministic order	Plan approval + claim board (soft)	Flows: event-driven + state (hard)	Core: event-driven + routing (hard)	Pydantic Graph: type-safe (hard, beta)
Setup friction	Already running. Zero.	2-3 sessions to port	2-3 sessions to port	1 session as complement
Runtime on Pi	Bun (~50 MB). Single.	Python (~300 MB). Second.	Python (~400 MB). Second.	Python (~100 MB). Second.
tmux GUI	Native panes = agent views	None	Studio web UI (port 8080)	None (Logfire web)
Forgejo	Prompted, not built-in	Custom tools needed	Custom tools needed	Custom tools needed
LLM-wiki	Direct file I/O	Custom tools needed	Custom tools needed	Custom tools needed
External agents	EXTERNAL-AGENT-PROMPT.md protocol	No concept	No concept	No concept
Structured output	Prompt-level only	Pydantic in task defs	Pydantic in response types	Best in class
Evals/testing	None built-in	None	None	Built-in evals framework
Durable execution	No (LLM-wiki is durable)	No	Partial	Yes (Temporal/DBOS/etc)

Fit For This Pi Workcell

CrewAI: REJECT for MVP

CrewAI is the strongest conceptual match (Flows = T0-T7, Crews = triad roles) but practical cost is too high: 1. Replaces pi-teams entirely. All existing config, container setup, prompts — dead weight. 2. Python on Pi. cgroup/memory-limit issues (per Q-06) compounded by ~300 MB runtime. 3. Loses tmux GUI. Debugging becomes Python log diving. 4. Two new concepts to learn. Flows AND Crews before productive work resumes. 5. No Forgejo/LLM-wiki. Must be built as custom tools.

Migration target if pi-teams is outgrown. CrewAI's Flows+Crews model is the best conceptual fit for T0-T7 + triad roles among all alternatives.

AutoGen: REJECT for MVP (and foreseeable future)

Wrong shape entirely: 1. Enterprise distributed systems focus. gRPC, multi-language, web UI — for multi-service architectures, not 3 agents in 1 container. 2. Framework in flux. 0.2 to 0.4 migration broke backward compat. 3. AgentChat vs Core dilemma. Choosing is itself a research task. 4. No unique advantage. Everything AutoGen does that pi-teams doesn't is irrelevant to this workcell.

PydanticAI: DEFER (selective T5/T6 use if hooks insufficient)

Unique: not an orchestrator — complementary rather than competitive.

Worth using for: 1. Typed output validation (T5). Pydantic models guarantee output shape: ```python class BuildResult(BaseModel): success: bool errors: list[str] warnings: list[str] artifacts: list[str]

validator = Agent("openai:gpt-4o", output_type=BuildResult) # Guaranteed valid BuildResult or agent retries until valid ``` 2. Structured evals (T6). Systematic testing with datasets, LLM judges, performance tracking. 3. Capabilities. Composable tool bundles for the builder-reviewer.

Not worth using for: orchestration, team coordination, task lifecycle, agent messaging.

Decision: Defer. Quality gate hooks (shell scripts running make/lint) provide simpler first-line validation. Add PydanticAI only if hooks prove insufficient.

Risks / Failure Modes

Second-runtime cascade. Any Python framework adds: runtime management, pip dependencies, virtual environments, dual model configs. pi-teams single-runtime simplicity is an operational advantage.
Orchestration schizophrenia. If PydanticAI creeps into coordination, the team has: pi-teams task board + LLM-wiki claim board + Pydantic Graph. Three sources of truth.
Framework abandonment risk. Python framework artifacts (Flow defs, agent configs, Pydantic models) become dead code. pi-teams + LLM-wiki use only Markdown and Bash — nothing rots.
Dependency rot on Pi. Python AI libraries often have C extensions. ARM64 compatibility on Debian is not guaranteed. Bun is precompiled for ARM64.

Decision Needed From Mehdi

CrewAI and AutoGen: rejected for MVP? (Recommended: yes. Replace pi-teams, add Python, lose tmux, 2-3 sessions of framework plumbing.)
PydanticAI for validation: defer or add? (Recommended: defer. Quality gate hooks first. Add PydanticAI if hooks insufficient.)
If PydanticAI is added later: separate service or same container? (Separate service with API — keeps runtimes clean.)
Long-term migration path: CrewAI? (Yes, flag as migration target if pi-teams outgrown. Best conceptual fit but not needed now.)

Next Probe

Enable pi-teams quality gate hooks in the running container. Write a minimal make hook that fires on task completion. Verify it catches build failures. If hooks suffice, Python frameworks are unnecessary. If not, PydanticAI for typed validation becomes urgent.

Summary

Framework	MVP Verdict	Reasoning
pi-teams	KEEP	Already working. Bun. tmux. Sufficient for 3 agents.
CrewAI	REJECT	Python, replaces pi-teams, loses tmux. Best migration target if pi-teams outgrown.
AutoGen	REJECT	Enterprise, distributed, framework in flux. Solves problems we don't have.
PydanticAI	DEFER	Complementary, not competitive. Consider for T5/T6 if quality gate hooks insufficient.