agent-orchestrator/docs/pro/features/knowledge-base.md
Hibryda b6c1d4b6af docs: add 11 new documentation files across all categories
New reference docs:
- agents/ref-btmsg.md: inter-agent messaging schema and CLI
- agents/ref-bttask.md: kanban task board operations
- providers/ref-providers.md: Claude/Codex/Ollama/Aider comparison
- config/ref-settings.md: (already committed)

New guides:
- contributing/dual-repo-workflow.md: community vs commercial repos
- plugins/guide-developing.md: Web Worker sandbox API and publishing

New pro docs:
- pro/features/knowledge-base.md: persistent memory + symbol graph
- pro/features/git-integration.md: context injection + branch policy
- pro/marketplace/README.md: 13 plugins catalog

Split files:
- architecture/data-model.md: from architecture.md (schemas, layout)
- production/hardening.md: from production.md (supervisor, sandbox, WAL)
- production/features.md: from production.md (FTS5, plugins, secrets, audit)
2026-03-17 04:18:05 +01:00

8.8 KiB

Knowledge Base

This documentation covers Pro edition features available in the agents-orchestrator/agents-orchestrator private repository.

The Knowledge Base provides two complementary systems: Persistent Agent Memory (structured knowledge fragments with search and TTL) and the Codebase Symbol Graph (regex-based symbol extraction for code navigation context).


Persistent Agent Memory

Purpose

Persistent Agent Memory stores knowledge fragments that agents produce during sessions and makes them available in future sessions. Unlike session anchors (community feature, per-session), memory fragments persist across sessions and are searchable via FTS5.

Memory Fragments

A memory fragment is a discrete piece of knowledge with metadata:

interface MemoryFragment {
  id: number;
  projectId: string;
  content: string;          // The knowledge itself (plain text or markdown)
  source: string;           // Where it came from (session ID, file path, user)
  trustTier: TrustTier;     // agent | human | auto
  tags: string[];           // Categorization tags
  createdAt: string;        // ISO 8601
  updatedAt: string;        // ISO 8601
  expiresAt: string | null; // ISO 8601, null = never expires
  accessCount: number;      // Times retrieved for injection
}

type TrustTier = 'agent' | 'human' | 'auto';

Trust Tiers

Tier Source Injection Priority Editable
human Created or approved by user Highest Yes
agent Extracted by agent during session Medium Yes
auto Auto-extracted from patterns Lowest Yes

When injecting memories into agent prompts, higher-trust memories are prioritized. Within the same tier, more recently accessed memories rank higher.

TTL (Time-To-Live)

Memories can have an optional expiration date. Expired memories are excluded from search results and injection. A background cleanup runs on plugin init, deleting memories expired more than 30 days ago.

Default TTL by trust tier:

Tier Default TTL
human None (permanent)
agent 90 days
auto 30 days

Users can override TTL on any individual memory.

Auto-Extraction

When an agent session completes, the dispatcher can trigger auto-extraction. The extractor scans the session transcript for:

  • Explicit knowledge statements ("I learned that...", "Note:", "Important:")
  • Error resolutions (error message followed by successful fix)
  • Configuration discoveries (env vars, file paths, API endpoints)

Extracted fragments are created with trustTier: 'auto' and default TTL. The user can promote them to agent or human tier.

Memory Injection

Before an agent session starts, the top-K most relevant memories are retrieved and formatted into a ## Project Knowledge section in the system prompt. Relevance is determined by FTS5 rank score against the project context (project name, CWD, recent file paths).

Default K = 5. Configurable per project via pro_memory_set_config.

SQLite Schema

In agor_pro.db:

CREATE TABLE pro_memories (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    project_id TEXT NOT NULL,
    content TEXT NOT NULL,
    source TEXT NOT NULL,
    trust_tier TEXT NOT NULL DEFAULT 'auto',
    tags TEXT NOT NULL DEFAULT '[]',       -- JSON array
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
    expires_at TEXT,
    access_count INTEGER NOT NULL DEFAULT 0
);

CREATE VIRTUAL TABLE pro_memories_fts USING fts5(
    content,
    tags,
    content=pro_memories,
    content_rowid=id
);

CREATE INDEX idx_pro_memories_project ON pro_memories(project_id);
CREATE INDEX idx_pro_memories_expires ON pro_memories(expires_at);

Commands

pro_memory_create

Field Type Required Description
projectId String Yes Project identifier
content String Yes Memory content
source String Yes Origin (session ID, "user", file path)
trustTier String No agent, human, or auto (default: auto)
tags Vec<String> No Categorization tags
ttlDays u32 No Days until expiration (null = tier default)

FTS5 search across memory fragments for a project.

Field Type Required Description
projectId String Yes Project identifier
query String Yes FTS5 search query
limit u32 No Max results (default: 10)

pro_memory_get, pro_memory_update, pro_memory_delete

Standard CRUD by memory ID.

pro_memory_list

List memories for a project with optional filters.

Field Type Required Description
projectId String Yes Project identifier
trustTier String No Filter by tier
tag String No Filter by tag
limit u32 No Max results (default: 50)

pro_memory_inject

Returns formatted memory text for system prompt injection.

Field Type Required Description
projectId String Yes Project identifier
contextHints Vec<String> No Additional FTS5 terms for relevance
topK u32 No Number of memories to include (default: 5)

pro_memory_set_config

Sets memory injection configuration for a project.

Field Type Required Description
projectId String Yes Project identifier
topK u32 No Default injection count
autoExtract bool No Enable auto-extraction on session end

Codebase Symbol Graph

Purpose

The Symbol Graph provides structural code awareness by scanning source files with regex patterns to extract function/method/class/struct definitions and build a lightweight call graph. This gives agents contextual knowledge about code structure without requiring a full language server.

Scanning

The scanner processes files matching configurable glob patterns. Default extensions:

Language Extensions Patterns Extracted
TypeScript .ts, .tsx functions, classes, interfaces, type aliases, exports
Rust .rs functions, structs, enums, traits, impls
Python .py functions, classes, decorators

Scanning is triggered manually or on project open. Results are stored in agor_pro.db. A full re-scan replaces all symbols for the project.

Symbol Types

interface CodeSymbol {
  id: number;
  projectId: string;
  filePath: string;         // Relative to project root
  name: string;             // Symbol name
  kind: SymbolKind;         // function | class | struct | enum | trait | interface | type
  line: number;             // Line number (1-based)
  signature: string;        // Full signature line
  parentName: string | null; // Enclosing class/struct/impl
}

type SymbolKind = 'function' | 'class' | 'struct' | 'enum' | 'trait' | 'interface' | 'type';

Commands

pro_symbols_scan

Triggers a full scan of the project's source files.

Field Type Required Description
projectId String Yes Project identifier
rootPath String Yes Absolute path to project root
extensions Vec<String> No File extensions to scan (default: ts,rs,py)

Search symbols by name (prefix match).

Field Type Required Description
projectId String Yes Project identifier
query String Yes Symbol name prefix
kind String No Filter by symbol kind
limit u32 No Max results (default: 20)

pro_symbols_find_callers

Searches for references to a symbol name across the project's scanned files using text matching.

Field Type Required Description
projectId String Yes Project identifier
symbolName String Yes Symbol to find references to

Returns file paths and line numbers where the symbol name appears (excluding its definition).

pro_symbols_file

Returns all symbols in a specific file.

Field Type Required Description
projectId String Yes Project identifier
filePath String Yes Relative file path

Limitations

  • Regex-based extraction is approximate. It does not parse ASTs and may miss symbols in unusual formatting or produce false positives in comments/strings.
  • find_callers uses text matching, not semantic analysis. It will find string matches in comments and string literals.
  • Large codebases (10,000+ files) may take several seconds to scan. Scanning runs on a background thread and emits a completion event.