agent-orchestrator/docs/orchestration.md
Hibryda de8dd04f4b docs: add architecture, sidecar, orchestration, and production guides
New documentation covering end-to-end system architecture, multi-provider
sidecar lifecycle, btmsg/bttask multi-agent orchestration, and production
hardening features (supervisor, sandbox, search, plugins, secrets, audit).
2026-03-14 02:33:59 +01:00

17 KiB

Multi-Agent Orchestration

Agent Orchestrator supports running multiple AI agents that communicate with each other, coordinate work through a shared task board, and are managed by a hierarchy of specialized roles. This document covers the inter-agent messaging system (btmsg), the task board (bttask), agent roles and system prompts, and the auto-wake scheduler.


Agent Roles (Tier 1 and Tier 2)

Agents are organized into two tiers:

Tier 1 — Management Agents

Defined in groups.json under a group's agents[] array. Each management agent gets a full ProjectBox in the UI (converted via agentToProject() in the workspace store). They have role-specific capabilities, tabs, and system prompts.

Role Tabs btmsg Permissions bttask Permissions Purpose
Manager Model, Tasks Full (send, receive, create channels) Full CRUD Coordinates work, creates/assigns tasks, delegates to subagents
Architect Model, Architecture Send, receive Read-only + comments Designs solutions, creates PlantUML diagrams, reviews architecture
Tester Model, Selenium, Tests Send, receive Read-only + comments Runs tests, monitors screenshots, discovers test files
Reviewer Model, Tasks Send, receive Read + status + comments Reviews code, manages review queue, approves/rejects tasks

Tier 2 — Project Agents

Regular ProjectConfig entries in groups.json. Each project gets its own Claude session with optional custom context via project.systemPrompt. They have standard tabs (Model, Docs, Context, Files, SSH, Memory) but no role-specific tabs.

System Prompt Generation

Tier 1 agents receive auto-generated system prompts built by generateAgentPrompt() in utils/agent-prompts.ts. The prompt has 7 sections:

  1. Identity — Role name, project context, team membership
  2. Environment — Working directory, available tools, shell info
  3. Team — List of other agents in the group with their roles
  4. btmsg documentation — CLI usage, channel commands, message format
  5. bttask documentation — CLI usage, task lifecycle, role-specific permissions
  6. Custom context — Optional project.systemPrompt (Tier 2) or role-specific instructions
  7. Workflow — Role-specific workflow guidelines (e.g., Manager delegates, Reviewer checks review queue)

Tier 2 agents receive only the custom context section (if project.systemPrompt is set), injected as the system_prompt field in AgentQueryOptions.

BTMSG_AGENT_ID

Tier 1 agents receive the BTMSG_AGENT_ID environment variable, injected via extra_env in AgentQueryOptions. This flows through 5 layers: TypeScript → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. The CLI tools (btmsg, bttask) read this variable to identify which agent is sending messages or creating tasks.

Periodic Re-injection

LLM context degrades over long sessions as important instructions scroll out of the context window. To counter this, AgentSession runs a 1-hour timer that re-sends the system prompt when the agent is idle. The mechanism:

  1. AgentSession timer fires after 60 minutes of agent inactivity
  2. Sets autoPrompt flag, which AgentPane reads via onautopromptconsumed callback
  3. AgentPane calls startQuery() with resume=true and the refresh prompt
  4. The agent receives the role/tools reminder as a follow-up message

btmsg — Inter-Agent Messaging

btmsg is a messaging system that lets agents communicate with each other. It consists of a Rust backend (SQLite), a Python CLI tool (for agents to use in their shell), and a Svelte frontend (CommsTab).

Architecture

Agent (via btmsg CLI)
    │
    ├── btmsg send <recipient> "message"     → writes to btmsg.db
    ├── btmsg read                           → reads from btmsg.db
    ├── btmsg channel create #review-queue   → creates channel
    ├── btmsg channel post #review-queue "msg" → posts to channel
    └── btmsg heartbeat                      → updates agent heartbeat
         │
         ▼
btmsg.db (SQLite, WAL mode, ~/.local/share/bterminal/btmsg.db)
    │
    ├── agents table        — registered agents with roles
    ├── messages table      — DMs and channel messages
    ├── channels table      — named channels (#review-queue, #review-log)
    ├── contacts table      — ACL (who can message whom)
    ├── heartbeats table    — agent liveness tracking
    ├── dead_letter_queue   — undeliverable messages
    └── audit_log           — all operations for debugging
         │
         ▼
Rust Backend (btmsg.rs, ~600 lines)
    │
    ├── btmsg_list_messages, btmsg_send_message, ...
    ├── 15+ Tauri commands for full CRUD
    └── Shared database connection (WAL + 5s busy_timeout)
         │
         ▼
Frontend (btmsg-bridge.ts → CommsTab.svelte)
    ├── Activity feed — all messages across all agents
    ├── DM view — direct messages between specific agents
    └── Channel view — channel messages (#review-queue, etc.)

Database Schema

The btmsg database (btmsg.db) stores all messaging data:

Table Purpose Key Columns
agents Agent registry id, name, role, project_id, status, created_at
messages All messages id, sender_id, recipient_id, channel_id, content, read, created_at
channels Named channels id, name, created_by, created_at
contacts ACL agent_id, contact_id (bidirectional)
heartbeats Liveness agent_id, last_heartbeat, status
dead_letter_queue Failed delivery message_id, reason, created_at
audit_log All operations id, event_type, agent_id, details, created_at

CLI Usage (for agents)

Agents use the btmsg Python CLI tool in their shell. The tool reads BTMSG_AGENT_ID to identify the sender:

# Send a direct message
btmsg send architect "Please review the auth module design"

# Read unread messages
btmsg read

# Create a channel
btmsg channel create #architecture-decisions

# Post to a channel
btmsg channel post #review-queue "PR #42 ready for review"

# Send heartbeat (agents do this periodically)
btmsg heartbeat

# List all agents
btmsg agents

Frontend (CommsTab)

The CommsTab component (rendered in ProjectBox for all agents) shows:

  • Activity Feed — chronological view of all messages across all agents
  • DMs — direct message threads between agents
  • Channels — named channel message streams
  • Polling-based updates (5s interval)

Dead Letter Queue

Messages sent to non-existent or offline agents are moved to the dead letter queue instead of being silently dropped. The Rust backend checks agent status before delivery and queues failures. The Manager agent's health dashboard shows dead letter count.

Audit Logging

Every btmsg operation is logged to the audit_log table with event type, agent ID, and JSON details. Event types include: message_sent, message_read, channel_created, agent_registered, heartbeat, and prompt_injection_detected.


bttask — Task Board

bttask is a kanban-style task board that agents use to coordinate work. It shares the same SQLite database as btmsg (btmsg.db) for deployment simplicity.

Architecture

Agent (via bttask CLI)
    │
    ├── bttask list                    → list all tasks
    ├── bttask create "Fix auth bug"   → create task (Manager only)
    ├── bttask status <id> in_progress → update status
    ├── bttask comment <id> "Done"     → add comment
    └── bttask review-count            → count review queue tasks
         │
         ▼
btmsg.db → tasks table + task_comments table
    │
    ▼
Rust Backend (bttask.rs, ~300 lines)
    │
    ├── 7 Tauri commands: list, create, update_status, delete, add_comment, comments, review_queue_count
    └── Optimistic locking via version column
         │
         ▼
Frontend (bttask-bridge.ts → TaskBoardTab.svelte)
    └── Kanban board: 5 columns, 5s poll, drag-and-drop

Task Lifecycle

┌──────────┐   assign   ┌─────────────┐   complete   ┌──────────┐
│  Backlog  │──────────►│ In Progress  │────────────►│  Review   │
└──────────┘            └─────────────┘              └──────────┘
                                                          │
                                              ┌───────────┼───────────┐
                                              ▼                       ▼
                                         ┌────────┐             ┌──────────┐
                                         │  Done   │             │ Rejected │
                                         └────────┘             └──────────┘

When a task moves to the "Review" column, the system automatically posts a notification to the #review-queue btmsg channel. The ensure_review_channels() function creates #review-queue and #review-log channels idempotently on first use.

Optimistic Locking

To prevent concurrent updates from corrupting task state, bttask uses optimistic locking via a version column:

  1. Client reads task with current version (e.g., version=3)
  2. Client sends update with expected version=3
  3. Server's UPDATE query includes WHERE version = 3
  4. If another client updated first (version=4), the WHERE clause matches 0 rows
  5. Server returns a conflict error, client must re-read and retry

This is critical because multiple agents may try to update the same task simultaneously.

Role-Based Permissions

Role List Create Update Status Delete Comments
Manager Yes Yes Yes Yes Yes
Reviewer Yes No Yes (review decisions) No Yes
Architect Yes No No No Yes
Tester Yes No No No Yes
Project (Tier 2) Yes No No No Yes

Permissions are enforced in the CLI tool based on the agent's role (read from BTMSG_AGENT_ID → agents table lookup).

Review Queue Integration

The Reviewer agent gets special treatment in the attention scoring system:

  • reviewQueueDepth is an input to attention scoring: 10 points per review task, capped at 50
  • Priority: between file_conflict (70) and context_high (40)
  • ProjectBox polls review_queue_count every 10 seconds for reviewer agents
  • Results feed into setReviewQueueDepth() in the health store

Frontend (TaskBoardTab.svelte)

The kanban board renders 5 columns (Backlog, In Progress, Review, Done, Rejected) with task cards. Features:

  • 5-second polling for updates
  • Click to expand task details + comments
  • Manager-only create/delete buttons
  • Color-coded status badges

Wake Scheduler

The wake scheduler automatically re-activates idle Manager agents when attention-worthy events occur. It runs in wake-scheduler.svelte.ts and supports three user-selectable strategies.

Strategies

Strategy Behavior Use Case
Persistent Sends a resume prompt to the existing session Long-running managers that should maintain context
On-demand Starts a fresh session Managers that work in bursts
Smart On-demand, but only when wake score exceeds threshold Avoids waking for minor events

Strategy and threshold are configurable per group agent via GroupAgentConfig.wakeStrategy and GroupAgentConfig.wakeThreshold fields, persisted in groups.json.

Wake Signals

The wake scorer evaluates 6 signals (defined in types/wake.ts, scored by utils/wake-scorer.ts):

Signal Weight Trigger
AttentionSpike 1.0 Any project's attention score exceeds threshold
ContextPressureCluster 0.9 Multiple projects have >75% context usage
BurnRateAnomaly 0.8 Cost rate deviates significantly from baseline
TaskQueuePressure 0.7 Task backlog grows beyond threshold
ReviewBacklog 0.6 Review queue has pending items
PeriodicFloor 0.1 Minimum periodic check (floor signal)

The pure scoring function in wake-scorer.ts is tested with 24 unit tests. The types are in types/wake.ts (WakeStrategy, WakeSignal, WakeEvaluation, WakeContext).

Lifecycle

  1. ProjectBox registers manager agents via $effect on mount
  2. Wake scheduler creates per-manager timers
  3. Every 5 seconds, AgentSession polls wake events
  4. If score exceeds threshold (for smart strategy), triggers wake
  5. On group switch, clearWakeScheduler() cancels all timers
  6. In test mode (BTERMINAL_TEST=1), wake scheduler is disabled via disableWakeScheduler()

Health Monitoring & Attention Scoring

The health store (health.svelte.ts) tracks per-project health with a 5-second tick timer. It provides the data that feeds the StatusBar, wake scheduler, and attention queue.

Activity States

State Meaning Visual
Inactive No agent running, no recent activity Dim dot
Running Agent actively processing Green pulse
Idle Agent finished, waiting for input Gray dot
Stalled Agent hasn't produced output for >N minutes Orange pulse

The stall threshold is configurable per-project via stallThresholdMin in ProjectConfig (default 15 min, range 5-60, step 5).

Attention Scoring

Each project gets an attention score (0-100) based on its current state. The attention queue in the StatusBar shows the top 5 projects sorted by urgency:

Condition Score Priority
Stalled agent 100 Highest — agent may be stuck
Error state 90 Agent crashed or API error
Context >90% 80 Context window nearly full
File conflict 70 Two agents wrote same file
Review queue depth 10/task, cap 50 Reviewer has pending reviews
Context >75% 40 Context pressure building

The pure scoring function is in utils/attention-scorer.ts (14 tests). It takes AttentionInput and returns a numeric score.

Burn Rate

Cost tracking uses a 5-minute exponential moving average (EMA) of cost snapshots. The StatusBar displays aggregate $/hr across all running agents.

File Conflict Detection

The conflicts store (conflicts.svelte.ts) detects two types of conflicts:

  1. Agent overlap — Two agents in the same worktree write the same file (tracked via tool_call analysis in the dispatcher)
  2. External writes — A file watched by an agent is modified externally (detected via inotify in fs_watcher.rs, uses 2s timing heuristic AGENT_WRITE_GRACE_MS to distinguish agent writes from external)

Both types show badges in ProjectHeader (orange for external, red ⚠ for agent overlap).


Session Anchors

Session anchors preserve important conversation turns through Claude's context compaction process. Without anchors, valuable early context (architecture decisions, debugging breakthroughs) can be lost when the context window fills up.

Anchor Types

Type Created By Behavior
Auto System (on first compaction) Captures first 3 turns, observation-masked (reasoning preserved, tool outputs compacted)
Pinned User (pin button in AgentPane) Marks specific turns as important
Promoted User (from pinned) Re-injectable into future sessions via system prompt

Anchor Budget

The budget controls how many tokens are spent on anchor re-injection:

Scale Token Budget Use Case
Small 2,000 Quick sessions, minimal context needed
Medium 6,000 Default, covers most scenarios
Large 12,000 Complex debugging sessions
Full 20,000 Maximum context preservation

Configurable per-project via slider in SettingsTab, stored as ProjectConfig.anchorBudgetScale in groups.json.

Re-injection Flow

When a session resumes with promoted anchors:

  1. anchors.svelte.ts loads promoted anchors for the project
  2. anchor-serializer.ts serializes them (turn grouping, observation masking, token estimation)
  3. AgentPane.startQuery() includes serialized anchors in the system_prompt field
  4. The sidecar passes the system prompt to the SDK
  5. Claude receives the anchors as context alongside the new prompt

Storage

Anchors are persisted in the session_anchors table in sessions.db. The ContextTab shows an anchor section with a budget meter (derived from the configured scale) and promote/demote buttons.