ANSWERED

Q-03: Framework Comparison

Status: ANSWERED Agent: opencode/ext-agent Timestamp UTC: 2026-05-11T04:00:00Z Claim: CLAIM | opencode/ext-agent | 2026-05-11T03:30:00Z

Prior Answers Checked

Short Answer

None of CrewAI, AutoGen, or PydanticAI justify replacing pi-teams for MVP orchestration. All three require Python (a second runtime on the Pi), lose the tmux GUI surface, and duplicate the team/task/messaging infrastructure that pi-teams already provides.

However, PydanticAI is worth considering as a selective enhancement — not for orchestration, but for typed output validation (T5) and structured evals (T6). Its Pydantic-model-based output enforcement is genuinely complementary to pi-teams and could reduce false confidence without replacing the coordination layer.

Primary Recommendation

Orchestration:  pi-teams (keep)        — already works, Bun-based, tmux GUI
Validation:     PydanticAI (defer)     — typed outputs, evals, T5/T6 enhancement
Reject for MVP: CrewAI, AutoGen        — Python runtime, replace pi-teams, too heavy

Evidence

Framework Profiles

pi-teams (Baseline)

Attribute Value
Runtime Bun (JavaScript/TypeScript)
Install size ~5 MB
Memory footprint ~50 MB
Python dependency None
Agent coordination Built-in (task board, messaging, polling)
UI surface tmux panes (native)
Task lifecycle Claim board + plan approval mode
Current status Running in container on relik-pi4
Multi-agent model Predefined teams (teams.yaml), agent configs (.pi/agents/)
External agent support Custom (EXTERNAL-AGENT-PROMPT.md + LLM-wiki)

CrewAI

Attribute Assessment
Runtime Python 3.10-3.13 (pip install crewai crewai-tools)
Install size ~200+ MB (dependencies: pydantic, openai, langchain-core, chromadb, etc.)
Memory footprint ~300+ MB runtime
Agent coordination Crews (autonomous) + Flows (deterministic event-driven)
UI surface None built-in — CLI only, no tmux integration
Forgejo integration None built-in
LLM-wiki integration Would need custom tools — no file-based wiki concept
Strengths Best-in-class balance of improvisation (Crews) + deterministic order (Flows). Role-based agents with tools. Enterprise security focus.
Weaknesses Python dependency on Pi. Replaces pi-teams entirely. CrewAI Flows use decorators and Python classes; no YAML-based team definition. Would need 2+ sessions to port.

CrewAI Architecture Mapping to d3-tui-triad:

Flow: D3TUIWorkcell (event-driven process)
  ├── @start() -> pick_next_task()
  ├── @listen(pick_next_task) -> research_crew.kickoff()  # T1
  ├── @listen(research_crew) -> plan_crew.kickoff()       # T2
  ├── @listen(plan_crew) -> implement_crew.kickoff()      # T4
  └── @listen(implement_crew) -> validate_crew.kickoff()  # T5

This maps cleanly to T0-T7 conceptually, but: - The Flow is Python code, not a YAML file. No equivalent of teams.yaml. - Each Crew spawn is an LLM call chain — expensive on Pi resources. - No tmux pane visibility. Debugging requires Python logs. - All existing pi-teams config (6793 bytes of working setup) becomes dead weight.

AutoGen (Microsoft Agent Framework)

Attribute Assessment
Runtime Python 3.10+ (pip install autogen-agentchat autogen-core autogen-ext[openai])
Install size ~250+ MB (multiple packages)
Memory footprint ~400+ MB runtime
Agent coordination AgentChat (conversational) + Core (event-driven, distributed)
UI surface AutoGen Studio (web-based, separate process on port 8080)
Forgejo integration None built-in
LLM-wiki integration Could register as custom tools, no native file wiki concept
Strengths Distributed agents (gRPC), event-driven core for deterministic workflows, AutoGen Studio for prototyping, MCP support, Docker code executor, Microsoft-backed.
Weaknesses Enterprise-focused — designed for distributed multi-service architectures, not a single Pi container. Framework in flux (0.2 to 0.4 broke backward compat). Studio requires separate HTTP server process.

Key problems for this workcell: - AgentChat is conversational agents passing messages — similar to pi-teams but without tmux - Core is an event-driven distributed system — overkill for 3 agents in 1 container - AutoGen Studio adds HTTP server on resource-constrained Pi - gRPC distributed runtime is for multi-machine setups — irrelevant here

PydanticAI

Attribute Assessment
Runtime Python 3.10+ (pip install pydantic-ai)
Install size ~80 MB (leanest of the Python options)
Memory footprint ~100 MB runtime
Agent coordination Single-agent focused. Multi-agent via composition + Pydantic Graph (beta)
UI surface CLI + Pydantic Logfire (observability). No tmux.
Forgejo integration None built-in
LLM-wiki integration None built-in
Strengths Typed structured output — Pydantic models guarantee output shape with automatic retry. Evals framework — systematic testing of agent performance. Capabilities — composable bundles (web search, thinking, MCP). Model-agnostic — 20+ providers. Durable execution — Temporal/DBOS integrations. Not an orchestrator — designed to build individual agents with strong type guarantees, not team coordination.
Weaknesses Not a multi-agent orchestrator — no task board, agent messaging, or team spawn. Multi-agent patterns are documented but lack team-level infrastructure. Python dependency. No tmux. Would need to run alongside pi-teams, not instead of it.

PydanticAI as Complement:

pi-teams (orchestration layer — keep)
  ├── Lead agent (pi-teams) — dispatches, approves, coordinates
  ├── Researcher agent (pi-teams) — reads repos, searches wiki
  └── Builder-reviewer agent (pi-teams) — edits code, runs validation
        └── PydanticAI eval (optional enhancement)
              ├── typed output validation (Pydantic models)
              ├── structured test runs (evals framework)
              └── Logfire observability

PydanticAI would sit under pi-teams, not instead of it. It's a T5/T6 enhancement, not an orchestration replacement.

Detailed Comparison Matrix

Concern pi-teams CrewAI AutoGen PydanticAI
Orchestration model Teams + task board Flows + Crews AgentChat / Core Single agent (no team orchestrator)
Improvisation Autonomous polling + agent autonomy Crews: free agent collaboration AgentChat: free-form conversation Agent-level autonomy + tools
Deterministic order Plan approval + claim board (soft) Flows: event-driven + state (hard) Core: event-driven + routing (hard) Pydantic Graph: type-safe (hard, beta)
Setup friction Already running. Zero. 2-3 sessions to port 2-3 sessions to port 1 session as complement
Runtime on Pi Bun (~50 MB). Single. Python (~300 MB). Second. Python (~400 MB). Second. Python (~100 MB). Second.
tmux GUI Native panes = agent views None Studio web UI (port 8080) None (Logfire web)
Forgejo Prompted, not built-in Custom tools needed Custom tools needed Custom tools needed
LLM-wiki Direct file I/O Custom tools needed Custom tools needed Custom tools needed
External agents EXTERNAL-AGENT-PROMPT.md protocol No concept No concept No concept
Structured output Prompt-level only Pydantic in task defs Pydantic in response types Best in class
Evals/testing None built-in None None Built-in evals framework
Durable execution No (LLM-wiki is durable) No Partial Yes (Temporal/DBOS/etc)

Fit For This Pi Workcell

CrewAI: REJECT for MVP

CrewAI is the strongest conceptual match (Flows = T0-T7, Crews = triad roles) but practical cost is too high: 1. Replaces pi-teams entirely. All existing config, container setup, prompts — dead weight. 2. Python on Pi. cgroup/memory-limit issues (per Q-06) compounded by ~300 MB runtime. 3. Loses tmux GUI. Debugging becomes Python log diving. 4. Two new concepts to learn. Flows AND Crews before productive work resumes. 5. No Forgejo/LLM-wiki. Must be built as custom tools.

Migration target if pi-teams is outgrown. CrewAI's Flows+Crews model is the best conceptual fit for T0-T7 + triad roles among all alternatives.

AutoGen: REJECT for MVP (and foreseeable future)

Wrong shape entirely: 1. Enterprise distributed systems focus. gRPC, multi-language, web UI — for multi-service architectures, not 3 agents in 1 container. 2. Framework in flux. 0.2 to 0.4 migration broke backward compat. 3. AgentChat vs Core dilemma. Choosing is itself a research task. 4. No unique advantage. Everything AutoGen does that pi-teams doesn't is irrelevant to this workcell.

PydanticAI: DEFER (selective T5/T6 use if hooks insufficient)

Unique: not an orchestrator — complementary rather than competitive.

Worth using for: 1. Typed output validation (T5). Pydantic models guarantee output shape: ```python class BuildResult(BaseModel): success: bool errors: list[str] warnings: list[str] artifacts: list[str]

validator = Agent("openai:gpt-4o", output_type=BuildResult) # Guaranteed valid BuildResult or agent retries until valid ``` 2. Structured evals (T6). Systematic testing with datasets, LLM judges, performance tracking. 3. Capabilities. Composable tool bundles for the builder-reviewer.

Not worth using for: orchestration, team coordination, task lifecycle, agent messaging.

Decision: Defer. Quality gate hooks (shell scripts running make/lint) provide simpler first-line validation. Add PydanticAI only if hooks prove insufficient.

Risks / Failure Modes

  1. Second-runtime cascade. Any Python framework adds: runtime management, pip dependencies, virtual environments, dual model configs. pi-teams single-runtime simplicity is an operational advantage.
  2. Orchestration schizophrenia. If PydanticAI creeps into coordination, the team has: pi-teams task board + LLM-wiki claim board + Pydantic Graph. Three sources of truth.
  3. Framework abandonment risk. Python framework artifacts (Flow defs, agent configs, Pydantic models) become dead code. pi-teams + LLM-wiki use only Markdown and Bash — nothing rots.
  4. Dependency rot on Pi. Python AI libraries often have C extensions. ARM64 compatibility on Debian is not guaranteed. Bun is precompiled for ARM64.

Decision Needed From Mehdi

  1. CrewAI and AutoGen: rejected for MVP? (Recommended: yes. Replace pi-teams, add Python, lose tmux, 2-3 sessions of framework plumbing.)
  2. PydanticAI for validation: defer or add? (Recommended: defer. Quality gate hooks first. Add PydanticAI if hooks insufficient.)
  3. If PydanticAI is added later: separate service or same container? (Separate service with API — keeps runtimes clean.)
  4. Long-term migration path: CrewAI? (Yes, flag as migration target if pi-teams outgrown. Best conceptual fit but not needed now.)

Next Probe

Enable pi-teams quality gate hooks in the running container. Write a minimal make hook that fires on task completion. Verify it catches build failures. If hooks suffice, Python frameworks are unnecessary. If not, PydanticAI for typed validation becomes urgent.

Summary

Framework MVP Verdict Reasoning
pi-teams KEEP Already working. Bun. tmux. Sufficient for 3 agents.
CrewAI REJECT Python, replaces pi-teams, loses tmux. Best migration target if pi-teams outgrown.
AutoGen REJECT Enterprise, distributed, framework in flux. Solves problems we don't have.
PydanticAI DEFER Complementary, not competitive. Consider for T5/T6 if quality gate hooks insufficient.