Q-01 Concurrency Assessment — Real Caps for Parallel Execution

Status: ACTIVE (analysis) Agent: opencode/ext-agent (sandshrew) Timestamp UTC: 2026-05-12T03:00:00Z Session: What concurrent load can the Pi 4 actually handle? Where's the real cap?

Target Concurrency Profile

Scenario What's Running Count Hermes Instances
Typical 3 units at 3 different nodes. Poller active on one unit's output. 4 concurrent 4
Peak 3 units. Poller on unit A. Curation agent routing context from unit B → unit C. 5 concurrent 5
Light 1 unit working, nothing else. 1 1

The cap question: can the Pi 4 run 5 concurrent Hermes calls?

Memory Budget (Worst Case: 5 Concurrent)

Component Est. RAM Notes
OS + system services ~500MB Current Pi baseline (678MB used, includes Docker)
Bun runtime ~50MB Lightweight, single process
LangGraph graph (compiled) ~10MB 36 nodes, ~180 edges. Minimal memory.
HTTP server (Hono/Bun.serve) ~10MB Trivial
Hermes base (shared runtime) ~50MB Shared across all instances
Hermes instance × 5 5 × 150MB = 750MB Qwen 3.6+ context. 4K-8K tokens per call.
Total estimated ~1.4GB
Pi 4 available 3.0GB (3.7GB - 678MB used)
Headroom ~1.6GB Plenty

Memory is not the bottleneck. 5 concurrent Hermes calls consume ~750MB beyond baseline. The Pi has 1.6GB of headroom remaining.

What Actually Caps Concurrency

Not the Pi 4

The Real Bottleneck: Hermes Portal Qwen 3.6+ Rate Limits

What Likely Cap Impact
Hermes Portal OAuth Unknown — depends on plan tier Need to verify. If rate-limited per OAuth token, concurrent calls may queue or fail.
Qwen 3.6+ API May have per-account rate limit Self-hosted or portal-proxied? Need to confirm architecture.
Network latency 10-50ms per call to Qwen endpoint Negligible for turn-based gameplay
Token throughput Each Hermes call may be 1-5 seconds The player waits for agent responses, not for concurrency

The cap is almost certainly Hermes Portal's rate limit, not the Pi 4's hardware. Until we know the OAuth plan's concurrency limit, we can't say definitively that 5 concurrent calls work. But the Pi can handle it.

LangGraph Concurrency: How It Works

Each unit gets its own thread_id. Concurrent invocations are independent:

// Rif moves to hex 17, Echo moves to hex 23, Sherpa moves to hex 09
// All three can run in parallel:

const configRif = { configurable: { thread_id: "unit-rif" } };
const configEcho = { configurable: { thread_id: "unit-echo" } };
const configSherpa = { configurable: { thread_id: "unit-sherpa" } };

await Promise.all([
  graph.invoke({ unit: "rif", action: "move", target: "17" }, configRif),
  graph.invoke({ unit: "echo", action: "move", target: "23" }, configEcho),
  graph.invoke({ unit: "sherpa", action: "move", target: "09" }, configSherpa),
]);

Poller and curation agents use their own thread_ids or run as subgraphs from the parent unit's thread. Either way, they're independent LangGraph invocations.

SqliteSaver write contention: the only real serialization point. Multiple concurrent writes to the same checkpointer may queue at the SQLite level (single-writer design). For turn-based gameplay with 5 concurrent agents making a few writes per second, this is imperceptible.

Recommendation

Allow up to 5 concurrent Hermes calls. The Pi 4 has the headroom. The real cap is Hermes Portal's rate limit — which can't be determined until we test with the OAuth token.

If Hermes Portal limits concurrency to fewer than 5, implement a queue: excess calls wait until a slot opens. The player sees "Agent queued — waiting for available slot" on the RG status bar. This is a graceful degradation, not a failure.

Decision Needed


Live RAM Audit (2026-05-12)

Process RAM What
Forgejo (outside Docker) 170MB Forgejo binary, runs in Docker but visible to host
Docker daemon 95MB dockerd
Hermes Gateway 80MB ⚠️ Already running on Pi! Python Hermes at /mnt/kitchen/private/hermes/venv/
Probe server (ours) 67MB ⚠️ Still running from earlier test — must kill
Tailscale 57MB tailscaled
containerd 42MB Docker container runtime
SMB 26MB File sharing
Python HTTP (port 8080) 20MB ⚠️ Unknown process — investigate
NetworkManager 20MB Networking
Other system ~67MB Misc services
Total ~644MB (not 678MB — earlier free -h rounded up)

Adjusted Baseline (After Cleanup)

Action RAM Freed New Baseline
Kill probe server -67MB 577MB
Kill unknown HTTP (8080) -20MB 557MB
Prune Docker images (disk only) 557MB
Game-ready baseline ~550MB With Docker + Forgejo + Hermes gateway + Tailscale

Hermes Already Running

Hermes Gateway is already active on the Pi at /mnt/kitchen/private/hermes/venv/bin/python -m hermes_cli.main gateway run --replace. This is a Python-based Hermes. Two paths:

A) Use existing Hermes Gateway. LangGraph on Bun calls Hermes Gateway via HTTP (port?). No new install needed. But reintroduces the HTTP boundary between Bun and Python.

B) Install Hermes on Bun. Stops the Python gateway. One Bun process for everything. Zero boundary. But needs Hermes Bun install + OAuth config.

The decision depends on what's simpler: piping through the existing gateway (no install, but boundary) vs installing Hermes on Bun (install work, but zero boundary). Given the zero-boundary philosophy we've settled on, Option B is consistent.

Hermes Fallback Rules

Qwen 3.6+ is primary (free, unlimited via OAuth). If rate-limited or failing:

const HERMES_MODELS = [
  { model: "qwen-3.6-plus", provider: "hermes-portal", auth: "oauth" },   // primary — free
  { model: "kimi-k2.6", provider: "kimi", auth: "api_key" },               // fallback 1
  { model: "minimax", provider: "minimax", auth: "api_key" },              // fallback 2
];

// Hermes configured to try primary first, fall back on rate limit or failure

Kimi and MiniMax keys are already available (visible in d3-tui container env). Fallback is automatic — Hermes tries Qwen, if rate-limited or error, tries Kimi, then MiniMax. The RG shows which model is active in the unit status view.

Updated Concurrency Budget

Component Est. RAM (after cleanup)
OS + system services (after pruning) ~550MB
Bun runtime ~50MB
LangGraph + HTTP server ~20MB
Hermes base (shared) ~50MB
Hermes instance × 5 5 × 150MB = 750MB
Total at 5 concurrent ~1.4GB
Pi 4 available ~3.2GB (3.7GB - 550MB baseline)
Headroom ~1.8GB — even more than estimated

More headroom than the original estimate. The Pi 4 is not the bottleneck at any reasonable concurrent load.