Q-01 Concurrency Assessment — Real Caps for Parallel Execution
Status: ACTIVE (analysis) Agent: opencode/ext-agent (sandshrew) Timestamp UTC: 2026-05-12T03:00:00Z Session: What concurrent load can the Pi 4 actually handle? Where's the real cap?
Target Concurrency Profile
| Scenario | What's Running | Count | Hermes Instances |
|---|---|---|---|
| Typical | 3 units at 3 different nodes. Poller active on one unit's output. | 4 concurrent | 4 |
| Peak | 3 units. Poller on unit A. Curation agent routing context from unit B → unit C. | 5 concurrent | 5 |
| Light | 1 unit working, nothing else. | 1 | 1 |
The cap question: can the Pi 4 run 5 concurrent Hermes calls?
Memory Budget (Worst Case: 5 Concurrent)
| Component | Est. RAM | Notes |
|---|---|---|
| OS + system services | ~500MB | Current Pi baseline (678MB used, includes Docker) |
| Bun runtime | ~50MB | Lightweight, single process |
| LangGraph graph (compiled) | ~10MB | 36 nodes, ~180 edges. Minimal memory. |
| HTTP server (Hono/Bun.serve) | ~10MB | Trivial |
| Hermes base (shared runtime) | ~50MB | Shared across all instances |
| Hermes instance × 5 | 5 × 150MB = 750MB | Qwen 3.6+ context. 4K-8K tokens per call. |
| Total estimated | ~1.4GB | |
| Pi 4 available | 3.0GB (3.7GB - 678MB used) | |
| Headroom | ~1.6GB | Plenty |
Memory is not the bottleneck. 5 concurrent Hermes calls consume ~750MB beyond baseline. The Pi has 1.6GB of headroom remaining.
What Actually Caps Concurrency
Not the Pi 4
- Bun handles 5 concurrent invocations trivially (uWebSockets-based HTTP server, designed for high concurrency)
- LangGraph uses
thread_idto isolate state per invocation — 5 concurrentinvoke()calls with different thread_ids is standard usage - SqliteSaver handles concurrent reads cleanly. Concurrent writes may serialize at the DB level but for turn-based gameplay (few writes per second), this is negligible.
The Real Bottleneck: Hermes Portal Qwen 3.6+ Rate Limits
| What | Likely Cap | Impact |
|---|---|---|
| Hermes Portal OAuth | Unknown — depends on plan tier | Need to verify. If rate-limited per OAuth token, concurrent calls may queue or fail. |
| Qwen 3.6+ API | May have per-account rate limit | Self-hosted or portal-proxied? Need to confirm architecture. |
| Network latency | 10-50ms per call to Qwen endpoint | Negligible for turn-based gameplay |
| Token throughput | Each Hermes call may be 1-5 seconds | The player waits for agent responses, not for concurrency |
The cap is almost certainly Hermes Portal's rate limit, not the Pi 4's hardware. Until we know the OAuth plan's concurrency limit, we can't say definitively that 5 concurrent calls work. But the Pi can handle it.
LangGraph Concurrency: How It Works
Each unit gets its own thread_id. Concurrent invocations are independent:
// Rif moves to hex 17, Echo moves to hex 23, Sherpa moves to hex 09
// All three can run in parallel:
const configRif = { configurable: { thread_id: "unit-rif" } };
const configEcho = { configurable: { thread_id: "unit-echo" } };
const configSherpa = { configurable: { thread_id: "unit-sherpa" } };
await Promise.all([
graph.invoke({ unit: "rif", action: "move", target: "17" }, configRif),
graph.invoke({ unit: "echo", action: "move", target: "23" }, configEcho),
graph.invoke({ unit: "sherpa", action: "move", target: "09" }, configSherpa),
]);
Poller and curation agents use their own thread_ids or run as subgraphs from the parent unit's thread. Either way, they're independent LangGraph invocations.
SqliteSaver write contention: the only real serialization point. Multiple concurrent writes to the same checkpointer may queue at the SQLite level (single-writer design). For turn-based gameplay with 5 concurrent agents making a few writes per second, this is imperceptible.
Recommendation
Allow up to 5 concurrent Hermes calls. The Pi 4 has the headroom. The real cap is Hermes Portal's rate limit — which can't be determined until we test with the OAuth token.
If Hermes Portal limits concurrency to fewer than 5, implement a queue: excess calls wait until a slot opens. The player sees "Agent queued — waiting for available slot" on the RG status bar. This is a graceful degradation, not a failure.
Decision Needed
- Test Hermes Portal OAuth concurrency limits once the token is available. Fire 5 simultaneous calls and observe: do all 5 succeed? Do some queue? Do some fail?
- If rate-limited: implement a call queue with visible status on the RG.
- If unlimited: confirm by testing, not assuming. Document the actual limit.
Live RAM Audit (2026-05-12)
| Process | RAM | What |
|---|---|---|
| Forgejo (outside Docker) | 170MB | Forgejo binary, runs in Docker but visible to host |
| Docker daemon | 95MB | dockerd |
| Hermes Gateway | 80MB | ⚠️ Already running on Pi! Python Hermes at /mnt/kitchen/private/hermes/venv/ |
| Probe server (ours) | 67MB | ⚠️ Still running from earlier test — must kill |
| Tailscale | 57MB | tailscaled |
| containerd | 42MB | Docker container runtime |
| SMB | 26MB | File sharing |
| Python HTTP (port 8080) | 20MB | ⚠️ Unknown process — investigate |
| NetworkManager | 20MB | Networking |
| Other system | ~67MB | Misc services |
| Total | ~644MB | (not 678MB — earlier free -h rounded up) |
Adjusted Baseline (After Cleanup)
| Action | RAM Freed | New Baseline |
|---|---|---|
| Kill probe server | -67MB | 577MB |
| Kill unknown HTTP (8080) | -20MB | 557MB |
| Prune Docker images | (disk only) | 557MB |
| Game-ready baseline | ~550MB | With Docker + Forgejo + Hermes gateway + Tailscale |
Hermes Already Running
Hermes Gateway is already active on the Pi at /mnt/kitchen/private/hermes/venv/bin/python -m hermes_cli.main gateway run --replace. This is a Python-based Hermes. Two paths:
A) Use existing Hermes Gateway. LangGraph on Bun calls Hermes Gateway via HTTP (port?). No new install needed. But reintroduces the HTTP boundary between Bun and Python.
B) Install Hermes on Bun. Stops the Python gateway. One Bun process for everything. Zero boundary. But needs Hermes Bun install + OAuth config.
The decision depends on what's simpler: piping through the existing gateway (no install, but boundary) vs installing Hermes on Bun (install work, but zero boundary). Given the zero-boundary philosophy we've settled on, Option B is consistent.
Hermes Fallback Rules
Qwen 3.6+ is primary (free, unlimited via OAuth). If rate-limited or failing:
const HERMES_MODELS = [
{ model: "qwen-3.6-plus", provider: "hermes-portal", auth: "oauth" }, // primary — free
{ model: "kimi-k2.6", provider: "kimi", auth: "api_key" }, // fallback 1
{ model: "minimax", provider: "minimax", auth: "api_key" }, // fallback 2
];
// Hermes configured to try primary first, fall back on rate limit or failure
Kimi and MiniMax keys are already available (visible in d3-tui container env). Fallback is automatic — Hermes tries Qwen, if rate-limited or error, tries Kimi, then MiniMax. The RG shows which model is active in the unit status view.
Updated Concurrency Budget
| Component | Est. RAM (after cleanup) |
|---|---|
| OS + system services (after pruning) | ~550MB |
| Bun runtime | ~50MB |
| LangGraph + HTTP server | ~20MB |
| Hermes base (shared) | ~50MB |
| Hermes instance × 5 | 5 × 150MB = 750MB |
| Total at 5 concurrent | ~1.4GB |
| Pi 4 available | ~3.2GB (3.7GB - 550MB baseline) |
| Headroom | ~1.8GB — even more than estimated |
More headroom than the original estimate. The Pi 4 is not the bottleneck at any reasonable concurrent load.