agent-orchestrator/TODO.md

7.3 KiB

Agents Orchestrator — TODO

Architecture Decisions

  • Tauri vs WGPU alternative — Evaluate staying with Tauri 2.x (WebKit2GTK) vs migrating to a Bun-based stack with WGPU rendering. Key factors: WebGL limitations in WebKit2GTK, xterm.js Canvas addon constraint (max 4 instances), native GPU acceleration, Bun's single-binary advantage. Research: Dioxus, Slint, Zed's GPUI. Decision needed before v4.
  • Frontend-backend tight binding — Reduce IPC overhead between Svelte frontend and Rust backend. Options: shared memory via WebAssembly, direct Rust→DOM rendering for perf-critical paths, compile Svelte components to WASM, or move more logic to Rust (terminal rendering, syntax highlighting). Profile current IPC bottlenecks first.

Features (v3.2)

  • Profile export/import — Define a portable profile format (JSON/TOML/YAML) for groups, projects, agents, themes, keybindings, secrets (encrypted). Must handle: version migration, partial import (merge vs overwrite), sensitive data encryption (age/libsodium), cross-machine portability. Evaluate TOML (human-readable) vs JSON (tooling) vs custom binary (compact + signed).
  • Keyboard shortcuts settings — Configurable keybindings UI in SettingsTab. Levels: global (app-wide), context (terminal, agent pane, palette), compose sequences (Ctrl+K → Ctrl+S). Conflict detection. Import/export. Default keymap file at ~/.config/agor/keybindings.json. Reference: VSCode keybindings model.
  • Per-project settings — Deeper per-project configuration beyond current fields. Per-project theme override, per-project keybindings, per-project plugin enable/disable, per-project environment variables, per-project shell, per-project model preferences. Cascade: global → group → project (most specific wins).
  • Custom editors (AI-augmented) — Specialized editor panes for non-code content: image editor (crop, annotate, AI inpaint/upscale via stable diffusion API), video editor (trim, subtitle, AI transcription), audio editor (waveform, AI transcription/TTS), 3D viewer/editor (glTF/OBJ, AI mesh generation). Each as a ProjectBox tab, triggered by file extension. Evaluate: WebGL for 3D (blocked by WebKit2GTK — ties into Tauri vs WGPU decision), Canvas for 2D, Web Audio API for audio.

Electrobun Hardening (from Codex Audit #3)

  • Durable event sequencing — Monotonic message indexes per session, idempotent replay on reconnect, conflict-safe persistence. Prevents message loss during concurrent agent output. Useful for session replay/debugging.
  • File-save conflict detection — Track mtime + content hash before write. Atomic temp-file rename on save. Show conflict dialog if file changed externally between read and write. Prevents silent overwrites.
  • Remote credential vault — Secure storage for relay tokens (encrypted at rest). Auto-reconnect uses stored token without re-prompting. Integrates with system keyring when available, falls back to encrypted SQLite blob.
  • Push-based task/relay updates — Replace 5-second polling in TaskBoardTab and CommsTab with WebSocket push from btmsg/bttask backends. Request tokens or revision numbers for stale-response detection. Reduces CPU + network overhead.
  • Sidecar backpressure guard — Max NDJSON line size (10MB), max pending stdout buffer, max terminal paste chunk (64KB). Prevents memory exhaustion from buggy/malicious sidecar runners.
  • Per-project retention controls — Configurable session history retention (last N sessions, or N days). untrackProject() cleans up health store, agent store, search index. Prevents unbounded memory/disk growth.
  • Channel membership/ACL enforcement — btmsg group_id validation (sender + recipient same group), channel membership checks before send, auto-add creator on channel create. Prevents cross-tenant message leakage.
  • Transport diagnostics panel — Real-time view of PTY/relay/session persistence health. Dropped event counters, reconnection history, RPC latency histogram, buffer fill levels. Useful for debugging multi-machine setups.
  • Plugin sandbox policy layer — Per-plugin network egress control (allow/deny), CPU time quotas (terminate after N seconds), memory limits, filesystem access scope. Prevents malicious plugins from exfiltrating data or DoS.
  • Multi-tool health tracking — Replace toolInFlight: boolean with toolsInFlight: number counter. Accurate state machine for concurrent tool execution. Prevents false idle/stalled transitions during parallel tool use.

Dual-Repo & Commercial

  • CLA setup — Configure CLA-assistant.io on community repo (DexterFromLab/agent-orchestrator) before accepting external PRs.
  • Community export workflow — Define and document the process for stripping commercial content and pushing to DexterFromLab origin.
  • Dual CI validation — Verify both leak-check.yml and commercial-build.yml workflows work in GitHub Actions.

Multi-Machine (v3.1)

  • Real-world relay testing — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end.

Multi-Agent (v3.1)

  • Agent Teams real-world testing — Subagent delegation prompt + env injection done. Needs real multi-agent session to verify Manager spawns child agents.

Reliability

  • Soak test — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory, WAL size, xterm count, supervisor restarts.
  • WebKit2GTK Worker verification — Verify Web Worker Blob URL approach in Tauri's WebKit2GTK webview.

E2E Testing

  • More realistic fixtures — Add 3-5 dummy projects to test fixtures with varied configurations: different providers (claude, codex, ollama), agent roles (manager, architect, tester), worktree isolation enabled/disabled, multiple groups, SSH configs. Makes tests more reliable and covers multi-project interactions.
  • Test daemon CI integration — Wire daemon CLI (tests/e2e/daemon/) into CI workflow. Verify --agent flag works with Agent SDK.

Completed

  • E2E full suite passing — 19/19 specs, 306s, daemon with smart caching | Done: 2026-03-18
  • E2E test daemon CLI — ANSI dashboard, smart caching (3-pass skip), error toast catching, Agent SDK bridge | Done: 2026-03-18
  • SPKI pin persistence — pins saved to groups.json, survive app restarts | Done: 2026-03-18
  • E2E spec expansion — 19 files, ~200 tests, Phase D/E/F added, all specs split <300 lines | Done: 2026-03-18
  • E2E port isolation — dedicated port 9750, app identity verification, devUrl conflict detection | Done: 2026-03-18
  • Pro Svelte components wired — AnalyticsDashboard, SessionExporter, AccountSwitcher in ProjectBox Pro tab | Done: 2026-03-18
  • ThemeEditor — 26 color pickers, live preview, import/export, custom theme persistence | Done: 2026-03-18
  • Comprehensive error handling — AppError enum (Rust), handleError/handleInfraError (frontend), global handler | Done: 2026-03-18
  • Plugin marketplace — 13 plugins (8 free, 5 paid), catalog, security (SHA-256, HTTPS, path traversal) | Done: 2026-03-17
  • Security audit fixes — 5 critical + 14 high issues found and fixed across agor-pro + Svelte | Done: 2026-03-17
  • Settings redesign — 6 modular components replacing 2959-line monolith | Done: 2026-03-18