agent-orchestrator/TODO.md

64 lines
7.3 KiB
Markdown

# Agents Orchestrator — TODO
## Architecture Decisions
- [ ] **Tauri vs WGPU alternative** — Evaluate staying with Tauri 2.x (WebKit2GTK) vs migrating to a Bun-based stack with WGPU rendering. Key factors: WebGL limitations in WebKit2GTK, xterm.js Canvas addon constraint (max 4 instances), native GPU acceleration, Bun's single-binary advantage. Research: Dioxus, Slint, Zed's GPUI. Decision needed before v4.
- [ ] **Frontend-backend tight binding** — Reduce IPC overhead between Svelte frontend and Rust backend. Options: shared memory via WebAssembly, direct Rust→DOM rendering for perf-critical paths, compile Svelte components to WASM, or move more logic to Rust (terminal rendering, syntax highlighting). Profile current IPC bottlenecks first.
## Features (v3.2)
- [ ] **Profile export/import** — Define a portable profile format (JSON/TOML/YAML) for groups, projects, agents, themes, keybindings, secrets (encrypted). Must handle: version migration, partial import (merge vs overwrite), sensitive data encryption (age/libsodium), cross-machine portability. Evaluate TOML (human-readable) vs JSON (tooling) vs custom binary (compact + signed).
- [ ] **Keyboard shortcuts settings** — Configurable keybindings UI in SettingsTab. Levels: global (app-wide), context (terminal, agent pane, palette), compose sequences (Ctrl+K → Ctrl+S). Conflict detection. Import/export. Default keymap file at ~/.config/agor/keybindings.json. Reference: VSCode keybindings model.
- [ ] **Per-project settings** — Deeper per-project configuration beyond current fields. Per-project theme override, per-project keybindings, per-project plugin enable/disable, per-project environment variables, per-project shell, per-project model preferences. Cascade: global → group → project (most specific wins).
- [ ] **Custom editors (AI-augmented)** — Specialized editor panes for non-code content: image editor (crop, annotate, AI inpaint/upscale via stable diffusion API), video editor (trim, subtitle, AI transcription), audio editor (waveform, AI transcription/TTS), 3D viewer/editor (glTF/OBJ, AI mesh generation). Each as a ProjectBox tab, triggered by file extension. Evaluate: WebGL for 3D (blocked by WebKit2GTK — ties into Tauri vs WGPU decision), Canvas for 2D, Web Audio API for audio.
## Electrobun Hardening (from Codex Audit #3)
- [ ] **Durable event sequencing** — Monotonic message indexes per session, idempotent replay on reconnect, conflict-safe persistence. Prevents message loss during concurrent agent output. Useful for session replay/debugging.
- [ ] **File-save conflict detection** — Track `mtime` + content hash before write. Atomic temp-file rename on save. Show conflict dialog if file changed externally between read and write. Prevents silent overwrites.
- [ ] **Remote credential vault** — Secure storage for relay tokens (encrypted at rest). Auto-reconnect uses stored token without re-prompting. Integrates with system keyring when available, falls back to encrypted SQLite blob.
- [ ] **Push-based task/relay updates** — Replace 5-second polling in TaskBoardTab and CommsTab with WebSocket push from btmsg/bttask backends. Request tokens or revision numbers for stale-response detection. Reduces CPU + network overhead.
- [ ] **Sidecar backpressure guard** — Max NDJSON line size (10MB), max pending stdout buffer, max terminal paste chunk (64KB). Prevents memory exhaustion from buggy/malicious sidecar runners.
- [ ] **Per-project retention controls** — Configurable session history retention (last N sessions, or N days). `untrackProject()` cleans up health store, agent store, search index. Prevents unbounded memory/disk growth.
- [ ] **Channel membership/ACL enforcement** — btmsg group_id validation (sender + recipient same group), channel membership checks before send, auto-add creator on channel create. Prevents cross-tenant message leakage.
- [ ] **Transport diagnostics panel** — Real-time view of PTY/relay/session persistence health. Dropped event counters, reconnection history, RPC latency histogram, buffer fill levels. Useful for debugging multi-machine setups.
- [ ] **Plugin sandbox policy layer** — Per-plugin network egress control (allow/deny), CPU time quotas (terminate after N seconds), memory limits, filesystem access scope. Prevents malicious plugins from exfiltrating data or DoS.
- [ ] **Multi-tool health tracking** — Replace `toolInFlight: boolean` with `toolsInFlight: number` counter. Accurate state machine for concurrent tool execution. Prevents false idle/stalled transitions during parallel tool use.
## Dual-Repo & Commercial
- [ ] **CLA setup** — Configure CLA-assistant.io on community repo (DexterFromLab/agent-orchestrator) before accepting external PRs.
- [ ] **Community export workflow** — Define and document the process for stripping commercial content and pushing to DexterFromLab origin.
- [ ] **Dual CI validation** — Verify both leak-check.yml and commercial-build.yml workflows work in GitHub Actions.
## Multi-Machine (v3.1)
- [ ] **Real-world relay testing** — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end.
## Multi-Agent (v3.1)
- [ ] **Agent Teams real-world testing** — Subagent delegation prompt + env injection done. Needs real multi-agent session to verify Manager spawns child agents.
## Reliability
- [ ] **Soak test** — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory, WAL size, xterm count, supervisor restarts.
- [ ] **WebKit2GTK Worker verification** — Verify Web Worker Blob URL approach in Tauri's WebKit2GTK webview.
## E2E Testing
- [ ] **More realistic fixtures** — Add 3-5 dummy projects to test fixtures with varied configurations: different providers (claude, codex, ollama), agent roles (manager, architect, tester), worktree isolation enabled/disabled, multiple groups, SSH configs. Makes tests more reliable and covers multi-project interactions.
- [ ] **Test daemon CI integration** — Wire daemon CLI (tests/e2e/daemon/) into CI workflow. Verify --agent flag works with Agent SDK.
## Completed
- [x] E2E full suite passing — 19/19 specs, 306s, daemon with smart caching | Done: 2026-03-18
- [x] E2E test daemon CLI — ANSI dashboard, smart caching (3-pass skip), error toast catching, Agent SDK bridge | Done: 2026-03-18
- [x] SPKI pin persistence — pins saved to groups.json, survive app restarts | Done: 2026-03-18
- [x] E2E spec expansion — 19 files, ~200 tests, Phase D/E/F added, all specs split <300 lines | Done: 2026-03-18
- [x] E2E port isolation dedicated port 9750, app identity verification, devUrl conflict detection | Done: 2026-03-18
- [x] Pro Svelte components wired AnalyticsDashboard, SessionExporter, AccountSwitcher in ProjectBox Pro tab | Done: 2026-03-18
- [x] ThemeEditor 26 color pickers, live preview, import/export, custom theme persistence | Done: 2026-03-18
- [x] Comprehensive error handling AppError enum (Rust), handleError/handleInfraError (frontend), global handler | Done: 2026-03-18
- [x] Plugin marketplace 13 plugins (8 free, 5 paid), catalog, security (SHA-256, HTTPS, path traversal) | Done: 2026-03-17
- [x] Security audit fixes 5 critical + 14 high issues found and fixed across agor-pro + Svelte | Done: 2026-03-17
- [x] Settings redesign 6 modular components replacing 2959-line monolith | Done: 2026-03-18