# Row-Bot — Architecture & Detailed Design

> Full technical reference for every feature, module, and subsystem in Row-Bot.
> For a concise overview, see the [README](../README.md).

---

## Table of Contents

- [ReAct Agent Architecture](#react-agent-architecture)
- [Long-Term Memory & Knowledge Graph](#long-term-memory--knowledge-graph)
- [Wiki Vault](#wiki-vault)
- [Dream Cycle](#dream-cycle)
- [Document Knowledge Extraction](#document-knowledge-extraction)
- [Brain Model & Providers](#brain-model--providers)
- [Embeddings & Vector Indexing](#embeddings--vector-indexing)
- [Voice Input & Text-to-Speech](#voice-input--text-to-speech)
- [Shell Access](#shell-access)
- [Browser Automation](#browser-automation)
- [Vision](#vision)
- [Workflows & Scheduling](#workflows--scheduling)
- [Designer Studio](#designer-studio)
- [Developer Studio](#developer-studio)
- [Custom Tools](#custom-tools)
- [Row-Bot Status & Identity](#row-bot-status--identity)
- [Self-Knowledge & Insights](#self-knowledge--insights)
- [Messaging Channels](#messaging-channels)
- [Tunnel Manager](#tunnel-manager)
- [X (Twitter) Tool](#x-twitter-tool)
- [Tool Guides](#tool-guides)
- [Image Generation](#image-generation)
- [Video Generation](#video-generation)
- [MCP Client & External Tools](#mcp-client--external-tools)
- [Migration Wizard](#migration-wizard)
- [Plugin System & Marketplace](#plugin-system--marketplace)
- [Auto-Updates](#auto-updates)
- [Habit & Health Tracker](#habit--health-tracker)
- [Desktop App](#desktop-app)
- [Chat & Conversations](#chat--conversations)
- [Notifications](#notifications)
- [Stability & Diagnostics](#stability--diagnostics)
- [Bundled Skills](#bundled-skills)
- [Core Modules](#core-modules)
- [Data Storage](#data-storage)
- [Comparison with Other Tools](#comparison-with-other-tools)

---

## ReAct Agent Architecture

- **Autonomous tool use** — the agent decides which tools to call, when, and how many times, based on your question
- **30+ core tools plus Developer, Custom Tool, and auto-generated channel tools** — web search, email, calendar, file management, shell access, browser automation, vision, image generation, video generation, X (Twitter), a personal knowledge graph, Designer Studio, Developer Studio, Custom Tool Builder, scheduled workflows, habit tracking, Row-Bot Status self-inspection, external MCP tools, and more
- **Streaming responses** — tokens stream in real-time with a typing indicator
- **Thinking indicators** — shows when the model is reasoning before responding
- **Smart context management** — automatic conversation summarization compresses older turns when token usage exceeds 80% of the context window, preserving the 5 most recent turns and a running summary; a hard trim at 85% drops oldest messages as a safety net; oversized tool outputs are proportionally shrunk so multi-tool chains fit within context; accurate token counting via tiktoken (cl100k_base)
- **Dynamic tool budgets** — the agent automatically adjusts how many tools are exposed to the model based on available context headroom; when context usage is high, lower-priority tools are temporarily hidden to prevent the system prompt from crowding out conversation history
- **Runtime readiness routing** — before building the graph, selected models are evaluated for context headroom, provider capability metadata, tool support, and surface requirements; full agent mode, chat-only mode, and blocked states are explicit outcomes rather than accidental provider failures
- **Chat-only runtime** — models that are useful for normal conversation but cannot reliably accept tool schemas use a compact tool-free prompt, a shaped transcript without full tool bodies, and the normal streaming/persistence path
- **Provider transcript normalization** — model-facing histories are checked for duplicate tool-call IDs, orphan tool results, invalid tool calls, empty assistant turns, and unsafe reasoning/tool artifacts before replay to custom or hosted providers
- **Centralized prompts plus self-knowledge injection** — base prompt templates live in `prompts.py`, while `self_knowledge.py` injects a dynamic identity line, capability manifest, and live runtime state so Row-Bot can describe itself accurately without stale hard-coded copy
- **Live token counter** — progress bar in the sidebar shows real-time context window usage based on trimmed (model-visible) history
- **Graceful stop & error recovery** — stop button cleanly halts generation with drain timeout; agent tool loops are caught automatically with mode-aware budgets (normal chat, workflows, and long Developer turns have separate limits) and wind-down prompts; orphaned tool calls are repaired; provider/API errors are surfaced as persistent red toasts and saved to the conversation checkpoint so they survive thread refresh
- **Workflow cancellation** — running background workflows can be stopped from the chat header, activity panel, or workflow card; cancellation is checked between every LangGraph node for clean shutdown
- **Displaced tool-call auto-repair** — if context trimming displaces tool-call/response pairs, the agent automatically detects and repairs the ordering before the next LLM call; orphaned tool calls trigger an automatic retry
- **Grouped tool traces** — repeated tool calls of the same type are grouped into expandable transcript entries, keeping long research, browser, and Developer runs readable while preserving individual results
- **Thinking retention** — non-empty reasoning/thinking text is preserved across streaming, detached reattach, checkpoint loading, and final transcript rendering without treating reasoning-only chunks as user-visible final answers
- **Date/time awareness** — current date and time is injected into every LLM call so the model always knows "today"
- **Destructive action confirmation** — dangerous operations (file deletion, sending emails, deleting calendar events, deleting memories, deleting workflows, selected settings changes) require explicit user approval via an interrupt mechanism
- **Workflow-scoped background permissions** — background workflows use a tiered system: safe operations always run, low-risk operations (move file, move calendar, send email) are allowed with optional runtime guards, and irreversible operations (delete file, delete memory) are always blocked; shell commands and email recipients can be allowlisted per-workflow via the editor UI

---

## Long-Term Memory & Knowledge Graph

Row-Bot doesn't just store isolated facts — it builds a **personal knowledge graph**: a connected web of people, places, preferences, events, and their relationships. Every memory is an entity linked to others through typed relations, so the agent can reason about how things in your life connect.

- **Entity-relation model** — memories are stored as entities with a type, subject, description, aliases, and tags; entities are connected by typed directional relations (e.g. `Dad --[father_of]--> User`, `User --[lives_in]--> London`)
- **10 entity types** — `person`, `preference`, `fact`, `event`, `place`, `project`, `organisation`, `concept`, `skill`, `media`
- **Memory tool** — 7 sub-tools let the agent save, search, list, update, delete, **link**, and **explore** memories through natural conversation
- **Link memories** — the agent can create relationships between any two entities, building a richer graph over time
- **Explore connections** — the agent can traverse the graph outward from any entity, discovering chains of relationships for broad questions like family, work, and projects
- **Interactive memory visualization** — a dedicated **Knowledge** surface renders the entire knowledge graph as an interactive network diagram with search, filters, full-graph / ego-graph toggle, and detail cards
- **Bounded auto-recall policy** — before every response, `memory_policy.py` builds a deterministic recall query, retrieves candidates, scores them against tier/status/confidence/evidence/recency/query fit, applies a context-aware token budget, and records a compact recall trace for diagnostics
- **Hybrid recall candidates** — recall combines FAISS semantic search, FTS5 lexical search, keyword fallback, and graph-neighbor expansion; strong seed memories can pull in related entities with relation confidence and hop metadata
- **Recall-safe retrieval** — candidate inspection does not mutate memory state; only memories actually injected into the turn are reinforced with `recalled_at` and recall-count metadata
- **Automatic memory extraction** — a background process scans past conversations on startup and every 6 hours, extracting entities and relations the agent missed during live conversation; active threads and workflow threads are excluded; assistant messages are truncated to 200 chars to prevent extracting from AI-generated content; low-confidence relations are skipped and conflicting facts can be marked for review instead of overwriting high-authority user edits
- **Deterministic deduplication** — both live saves and background extraction check for existing entities by normalized subject before creating new entries; cross-category matching prevents fragmentation; alias resolution ensures related names merge; richer content is always kept
- **Memory evolution metadata** — `memory_evolution.py` normalizes status (`active`, `needs_review`, `superseded`, `archived`), tier (`core`, `semantic`, `episodic`, `resource`), confidence, evidence, source context, manual edits, superseding, archival, and journal entries
- **Vague-type banning** — `related_to`, `associated_with`, `connected_to`, `linked_to`, `has_relation`, `involves`, and `correlates_with` are rejected before saving, preventing noisy low-value edges
- **Relation pre-normalization** — alias forms are canonicalized before ban, confidence, and dedup checks
- **67 valid relation types** — curated vocabulary with 60+ alias mappings plus document-specific relations like `extracted_from`, `uploaded`, `builds_on`, `cites`, `extends`, and `contradicts`
- **Source and audit tracking** — each entity is tagged with its origin (`live`, `extraction`, `dream_*`, document-derived, wiki-synced, or manual) plus audit metadata such as status, tier, confidence, evidence, source context, and user-modified timestamps
- **Semantic and lexical recall indexes** — FAISS vectors are backed by the configured embedding provider, while an optional FTS5 entity index supports exact/keyword recall and fallback search
- **Memory IDs in context** — auto-recalled memories include their IDs so the agent can update or delete specific entries when the user corrects previously saved information
- **Consolidation utilities** — built-in duplicate consolidation merges near-duplicate memories that may accumulate over time
- **Local SQLite + NetworkX + FAISS storage** — entities and relations live in `~/.row-bot/memory.db`, mirrored in a NetworkX graph for traversal, with FAISS vectors in `~/.row-bot/memory_vectors/`
- **Knowledge audit UI** — browse, search, visualize, review, restore, supersede, archive, and bulk-delete memories from the Knowledge tab and entity editor, including graph statistics, audit badges, recall traces, and the memory evolution journal

---

## Wiki Vault

The knowledge graph can be exported as a structured **Obsidian-compatible markdown vault** — one `.md` file per entity with YAML frontmatter, `[[wiki-links]]`, and auto-generated indexes.

- **Vault structure** — entities grouped by type (`wiki/person/`, `wiki/project/`, `wiki/event/`, etc.) with one `.md` file per entity; sparse entities (<20 chars) roll up into `_index.md` per type; per-type indexes and a master `index.md` are auto-generated on rebuild
- **YAML frontmatter** — each article includes `id`, `type`, `subject`, `aliases`, `tags`, `source`, `created`, and `updated` metadata
- **Wiki-links** — related entities linked via `[[Entity Name]]` syntax, enabling Obsidian backlinks and graph view
- **Connections section** — outgoing and incoming relations listed with arrow notation
- **Live export** — entities are exported on save, deleted on entity removal, and rebuilt on batch operations
- **Search** — full-text search across all `.md` files with title, snippet, and entity ID results
- **Conversation export** — any thread can be exported as a vault-compatible markdown file
- **Agent tools** — `wiki_read`, `wiki_rebuild`, `wiki_stats`, and `wiki_export_conversation` let the agent interact with the vault directly
- **Settings UI** — enable/disable toggle, vault path configuration, stats display, rebuild, and open-folder actions in the Knowledge tab

---

## Dream Cycle

A 5-phase background daemon refines the knowledge graph during idle hours and ends with an insight-generation pass over recent system activity.

- **Phase 1: Duplicate merge** — entities with ≥0.93 semantic similarity and same type are merged; the LLM synthesizes the best description, aliases are unioned, and relations are re-pointed to the survivor
- **Subject-name guard** — entities with different normalized subjects require ≥0.98 similarity to merge, preventing false merges of distinct people or concepts
- **Phase 2: Description enrichment** — thin entities (<80 chars) appearing in multiple conversations get richer descriptions from conversation context and graph neighborhood
- **Phase 3: Confidence decay** — stale `dream_infer` relations older than 90 days lose 10% confidence per cycle; very low-confidence edges are pruned automatically
- **Phase 4: Relationship inference** — co-occurring entity pairs with no meaningful edge are evaluated for a specific typed relation; hub diversity caps, batch rotation, half-overlap reuse, multi-excerpt evidence, and a 7-day rejection cache improve quality and reduce repetition
- **Phase 5: Insights analysis** — the system captures a snapshot of recent logs, provider/model/media configuration, channels, task state, memory stats, skills, and existing insights, feeds it to `DREAM_INSIGHTS_PROMPT`, and stores actionable results in `insights.py`
- **Three-layer anti-contamination** — sentence-level excerpt filtering, deterministic post-enrichment validation, and strengthened prompting prevent cross-entity fact bleed
- **Ollama busy check** — queries `/api/ps` before starting; defers if Ollama is actively serving a user request to avoid competing for GPU or CPU
- **Configurable window** — default 1–5 AM local time; checks every 30 minutes if enabled, idle, in window, and not yet run that day
- **Dream journal** — all operations logged to `~/.row-bot/dream_journal.json` with cycle ID, summary, duration, merges, enrichments, inferences, insights, and errors
- **Post-cycle rebuilds** — FAISS is rebuilt after the cycle, and the wiki vault is regenerated when enabled so downstream views stay in sync
- **Manual trigger** — a dedicated Dream button in the Knowledge surface can start the cycle immediately
- **Settings UI** — enable/disable toggle, quiet window controls, and last-run summary in the Preferences tab

---

## Document Knowledge Extraction

Uploaded documents are processed through a three-phase **map-reduce LLM pipeline** that extracts structured knowledge into the graph with full source provenance.

- **Map phase** — each document is split into ~6K-char windows; each window is summarized to 3–5 sentences
- **Reduce phase** — window summaries are combined into a coherent 300–600 word article
- **Extract phase** — core entities and relations are pulled from the final article; extraction is capped at 12 entities per document to prevent over-extraction
- **Curated relation vocabulary** — 67 valid relation types with 60+ alias mappings remove unknown-type churn and keep document-derived edges consistent with live memory extraction
- **Hub entity** — the document itself is saved as a `media` entity; extracted entities link back via `extracted_from` for provenance tracking
- **Quality gates** — minimum description length, self-loop rejection, and vague-relation bans keep output usable
- **Resource memory metadata** — document hub entities and extracted facts are written with resource-tier provenance, confidence, evidence, and audit fields so document knowledge can be reviewed without overwriting personal memories blindly
- **Cross-window dedup** — repeated entities across windows are merged before saving
- **Cross-source merge protection** — document-derived content uses a stricter semantic merge threshold when it resembles a personal entity, reducing the chance of impersonal text overwriting personal knowledge
- **Supported formats** — PDF, DOCX, TXT, Markdown, HTML, and EPUB
- **Live progress** — the status bar shows an extraction pill with phase, progress, queue depth, and a stop action
- **Background queue** — worker thread processes one document at a time
- **Per-document cleanup** — removing a document deletes vector-store entries and extracted graph content associated with its source tag

---

## Brain Model & Providers

The brain model is Row-Bot's default LLM — the model used for conversations, memory extraction, dream analysis, and any thread or workflow without a specific override. It is selected during setup or later from Settings, and can come from the supported local runtime, a hosted provider, ChatGPT / Codex, Ollama Cloud, or a custom OpenAI-compatible endpoint.

Row-Bot is local-first in its data model, but model routing is provider-neutral. Local models remain a first-class path for offline and private use, while hosted and self-hosted models can be selected per thread, workflow, Developer workspace, or media surface. The setup wizard determines the initial default; on the local path, Row-Bot uses one of the models already exposed by the local runtime, with 14B-class models recommended for stronger agent/tool behavior.

Provider models are supported for users without a dedicated GPU, for frontier reasoning on demand, or for trying many providers without downloading large local weights. Row-Bot supports opt-in provider models through **OpenAI** (direct API), **Anthropic** (Claude), **Google AI** (Gemini), **xAI** (Grok), **MiniMax** (M2 models through the Anthropic-compatible API), **OpenRouter** (many third-party models), **Ollama Cloud** (direct API and local daemon cloud-tagged models), **ChatGPT / Codex** (subscription-backed Codex models), and Custom/Self-hosted OpenAI-compatible endpoints such as oMLX, LM Studio, vLLM, llama.cpp, LocalAI, LiteLLM, SGLang, or private gateways. Provider connections, health, and credential sources are configured from Settings -> Providers; model catalog browsing, pinning, and defaults live in Settings -> Models.

The `providers/` subsystem now owns provider config, auth metadata, model catalog normalization, runtime construction, display-safe status, runtime readiness, and Quick Choices. Model selections are preserved as provider-qualified refs (`model:<provider>:<model>`) at UI and settings boundaries so a local/custom model does not silently fall back to OpenRouter when another provider has the same or unknown bare model id. Existing public functions in `models.py` remain as compatibility facades while provider-backed selection is rolled through the app. Settings -> Models pickers are intentionally Quick Choice surfaces: catalog rows must be pinned before they become everyday Brain, Vision, Image, or Video choices, while the current default can still appear as a fallback value. `providers/model_catalog_cache.py` refreshes hosted-provider and local-runtime catalog rows in the background so Settings can render from cache without blocking on large remote catalogs.

Runtime readiness is evaluated before agent execution. `providers/readiness.py` and `providers/resolution.py` resolve the selected model/provider, inspect cached capability snapshots, probe uncertain local/custom models when needed, compare the effective context window against tool-schema requirements, and return one of three outcomes: full agent mode, chat-only mode, or blocked with user-facing guidance. Forced-agent surfaces such as workflow execution, approval resumes, and Designer text generation request agent mode explicitly; normal chat can fall back to chat-only mode when a model is conversationally useful but not tool-compatible.

ChatGPT / Codex is deliberately modeled as a subscription provider, not as another OpenAI API-key route. Direct Codex runtime requires Row-Bot's in-app ChatGPT device-flow sign-in so Row-Bot stores its own runnable OAuth tokens in the local OS credential store. Existing Codex CLI auth files can be referenced only as display-safe metadata: Row-Bot records that the external login exists, path/fingerprint metadata, and broad auth-file shape, but it does not copy runnable tokens from `~/.codex/auth.json`.

Codex runtime uses ChatGPT's subscription/internal Codex backend rather than the public OpenAI API. That means endpoint behavior, catalog shape, auth requirements, rate limits, and model availability may change upstream. When a ChatGPT / Codex model is selected, the current conversation plus model-visible tool context and tool results are sent to ChatGPT / Codex for that turn. Durable Row-Bot data such as memories, documents, files, and other conversations remain local unless explicitly included in the active conversation or surfaced by a tool result.

- **Dynamic model switching** — change the brain model from Settings or approved `row_bot_update_setting` calls; choices are validated against pinned local/provider Quick Choices, installed local models, and provider catalogs before saving
- **Per-thread & per-workflow model override** — conversations and workflows can each run on a different model, with overrides persisted locally
- **Quick Choices** — models pinned from the consolidated Models catalog appear in chat, workflow, channel, Designer, status-tool, and Vision pickers when their capability snapshot supports that surface
- **Cost-efficient context management** — smart context trimming compresses older conversation turns and shrinks oversized tool outputs, reducing token usage and API costs for provider models
- **Local catalog accuracy** — installed Ollama chat models remain visible even when their family is newer than Row-Bot's curated tool/vision heuristics, while embedding-like local models are kept out of chat choices and Vision support is only inferred from known metadata/families
- **Ollama Cloud paths** — direct Ollama Cloud API keys and local daemon `:cloud` models are represented separately while sharing catalog normalization and display metadata; direct API errors are normalized into user-facing provider messages
- **Tool-support validation** — unsupported or uncertain local/custom models are warned about, can be probed with a real tool round-trip, and route to agent, chat-only, or blocked mode based on the result
- **Custom endpoint compatibility profiles** — OpenAI-compatible endpoints can use oMLX, LM Studio, vLLM, llama.cpp, LocalAI, LiteLLM, SGLang, or generic profiles to normalize message content, tool history, unsupported parameters, streaming behavior, and request-time context settings
- **Custom endpoint probes** — self-hosted/proxy endpoints can be probed for model catalog access, streaming deltas, tool-call round trips, native context metadata, and no-auth behavior; results are persisted with provider metadata for later routing decisions
- **Configurable context window** — local and provider context caps can be set independently; actual model limits are still respected, override caches are invalidated when caps change, and custom endpoint profiles can pass capped context request parameters when the backend supports them
- **Provider transcript hygiene** — provider-facing message histories are normalized to strip invalid tool calls, rewrite duplicate tool-call IDs, drop orphan tool results, flatten tool history for non-tool profiles, and preserve or suppress reasoning fields depending on endpoint support
- **Local & provider indicators** — the UI clearly distinguishes downloaded local models, missing local models, and connected provider models
- **Provider vision detection** — provider models with image capability are detected and reused by the Vision feature when available

---

## Embeddings & Vector Indexing

Embeddings are configured separately from chat models so users can choose the privacy/performance tradeoff that fits document search, memory recall, and knowledge graph rebuilds.

- **`embedding_config.py`** — persists the selected embedding provider, model, dimension metadata, and privacy-related settings
- **`embedding_providers.py`** — normalizes local and cloud embedding backends behind one interface used by document search, memory recall, and graph/vector rebuilds
- **Local choices** — local embeddings keep vectorization on-device, with Qwen as the high-quality fallback and additional runtime-downloaded options such as Nomic and Mixedbread/MXBAI-style models
- **Cloud option** — cloud embeddings can be enabled explicitly in Settings and show privacy copy because document or memory text is sent to the chosen embedding provider
- **Stale-index detection** — vector stores record embedding provider and dimension metadata so Row-Bot can detect when a document or memory index was built with a different embedding configuration
- **Memory release** — heavy document and extraction jobs release cached embedding resources after use to reduce long-session RSS growth
- **Settings integration** — embedding provider controls live in the model/settings surfaces without overloading the chat model picker

---

## Voice Input & Text-to-Speech

- **Toggle-based voice** — simple manual toggle to start and stop listening, no wake word required
- **4-state pipeline** — stopped → listening → transcribing → muted, with explicit state transitions and mic gating during playback
- **Local speech-to-text** — transcription via faster-whisper (tiny/base/small/medium), CPU-only int8 quantization, no cloud APIs required
- **Voice-aware responses** — voice input is tagged so the agent knows you are speaking and can respond more conversationally
- **Neural TTS** — high-quality text-to-speech via Kokoro, fully offline
- **10 voice options** — US and British English, male and female variants
- **Streaming TTS** — responses are spoken sentence-by-sentence as they stream in
- **Mic gating** — microphone is automatically muted during TTS playback to prevent echo and feedback loops
- **Hands-free mode** — voice input plus streaming TTS creates a fully conversational loop

---

## Shell Access

- **Full shell access** — the agent can run shell commands on your machine through natural conversation
- **Persistent sessions** — `cd`, environment variables, and other shell state persist across commands within a conversation; each thread gets its own isolated shell session
- **3-tier safety classification** — commands are classified as safe, moderate, or blocked before execution
- **Safe commands run instantly** — read-only operations like `ls`, `pwd`, `cat`, `git status`, or `pip list` execute without interruption
- **Dangerous commands require approval** — destructive or system-modifying commands trigger an interrupt so you can accept or reject them
- **Blocked by default** — high-risk commands like `shutdown`, `reboot`, or `mkfs` are rejected outright
- **Background safety integration** — safe commands always execute; moderate commands are blocked by default in workflows but can be allowlisted per workflow; dangerous commands remain blocked
- **Inline terminal panel** — command output appears in a collapsible terminal panel in the chat UI with clear and history controls
- **History persistence** — shell history is saved per thread in `~/.row-bot/shell_history.json`

---

## Browser Automation

- **Full browser automation** — the agent can navigate websites, click elements, fill forms, scroll pages, and manage tabs in a real, visible Chromium window
- **Shared visible browser** — runs with `headless=False` so you can see what the agent is doing and intervene when needed
- **Persistent profile** — cookies, logins, and local storage survive across restarts in `~/.row-bot/browser_profile/`
- **Accessibility-tree snapshots** — after every action the tool captures the page's accessibility tree with numbered references so the model can click and type by number
- **Smart snapshot filtering** — deduplicates links, drops hidden elements, and caps interactive elements to keep context under control
- **Snapshot compression** — older browser snapshots are compressed to short stubs while the latest state remains detailed
- **7 browser operations** — navigate, click, type, scroll, snapshot, back, and tab management
- **Per-thread tab isolation** — each chat thread or background workflow gets its own browser tab; tabs are cleaned up on thread deletion or workflow completion
- **Automatic browser detection** — prefers installed Chrome, then Edge on Windows, then Playwright's bundled Chromium
- **Crash recovery** — if the browser closes externally, the next action relaunches it cleanly

---

## Vision

- **Camera analysis** — capture and analyze images from your webcam in real-time
- **Screen capture** — take screenshots and ask questions about what is on your screen
- **Image file analysis** — analyze workspace image files by path without needing a camera or live capture
- **Configurable vision model** — choose from pinned local or provider-capable Vision Quick Choices, including ChatGPT / Codex rows whose catalog metadata reports image input support
- **Camera selection** — pick which camera to use when multiple devices are present
- **Inline image display** — captured and workspace images are shown inline in chat
- **Provider vision support** — provider models with image capability are auto-detected and work alongside local vision models; Quick Choice refresh preserves provider-specific Vision metadata instead of downgrading rows through generic text-only heuristics

---

## Workflows & Scheduling

Tasks have been renamed to **Workflows** throughout the application. The workflow engine adds a step-based pipeline runner, delivery routing, approvals, triggers, and safety gating on top of APScheduler.

### Core Engine

- **Unified workflow engine** — named multi-step workflows run sequentially in a fresh or persistent thread and are scheduled through APScheduler
- **SQLite schema recovery** — `tasks.py` validates the workflow database schema before use, repairs partial schemas in place, backs up and recreates corrupt DBs, and retries schema-related operations once after repair
- **7 schedule types** — `daily`, `weekly`, `weekdays`, `weekends`, `interval`, `cron`, and one-shot `delay_minutes`
- **Template variables** — prompts can use `{{date}}`, `{{day}}`, `{{time}}`, `{{month}}`, `{{year}}`, `{{task_id}}`, and `{{step.X.output}}`
- **Per-workflow model override** — each workflow can force a different model, then restore the default after completion
- **Skills and tools overrides** — workflows can narrow the skill set globally and the tool set per step
- **Channel delivery** — workflow output can inherit the workflow-level default delivery channels or use a per-workflow override via `delivery_channel` and `delivery_target`; web-app run status is always preserved
- **Persistent threads** — workflows can reuse the same thread across runs to preserve context
- **Notify-only mode** — workflows can skip agent execution and just send notifications
- **Webhook triggers** — workflows can be launched by HTTP webhook with per-workflow secrets
- **Completion triggers** — one workflow can trigger another after finishing
- **Concurrency groups** — related workflows can be serialized so only one runs at a time
- **Safety mode** — `block_destructive`, `require_approval`, and `allow_all` modes control shell, workflow, and channel behavior inside background execution

### Step-Based Pipelines

- **5 step types** — Prompt, Condition, Approval, Subtask, and Notify
- **Conditional branching** — condition steps support `contains`, `not_contains`, `regex`, `json_path`, and `llm_evaluate`, each with `if_true` / `if_false` targets
- **Approval gates** — approval steps pause execution, route requests through channels and desktop notifications, and resume on explicit user approval or denial
- **Prompt chaining** — each step can see previous step output, enabling research → summarize → act patterns
- **Agent-callable workflows** — the task tool can create, update, and run full step graphs programmatically

### Workflow Builder UI

- **Simple/Advanced toggle** — simple mode preserves a single-prompt workflow editor; advanced mode exposes the full step builder
- **Step builder** — reorder, delete, retarget, and retype steps visually
- **Variable insertion menu** — context variables and prior-step outputs can be inserted without hand typing placeholders
- **Flow preview** — Mermaid diagram generated from the step graph with manual refresh
- **Validation** — required-field checks, reference validation, and operator-specific rules run before save
- **Delivery defaults UI** — the Workflows panel exposes a compact default-delivery selector; workflows tied to default update when the global default changes, while explicit overrides remain untouched

### Approval System

- **Pending approvals panel** — approval cards show task name, request text, and Approve / Deny controls
- **Sidebar badge** — pending approvals surface as a badge and quick actions above the thread list
- **Multi-channel routing** — approvals can be routed through Telegram and desktop notifications with inline action controls
- **Resume integration** — the agent and workflow runtime resume correctly on approve or deny and follow the appropriate branch

### Workflow Console

- **Right-side console** — `ui/command_center.py` exposes running work, approvals, upcoming runs, quick launch actions, recent history, and insights in one drawer
- **Collapsible layout** — console expansion state persists in browser and pywebview; collapsed state shows compact running/approval/insight badges and attention styling when an approval is waiting
- **Live operational view** — running workflows, background states, and recent outcomes stay visible while you continue chatting elsewhere in the app
- **Insight actions** — insight cards support pin, dismiss, and apply actions directly from the console
- **Journal access** — extraction and dream journals are accessible from the same workflow-centric monitoring surfaces

### Existing Features

- **Always-background execution** — workflows run without blocking the main chat UI
- **Pre-built templates** — seeds five disabled starter workflows across simple and advanced examples; nothing is scheduled or run until the user enables it
- **Home screen dashboard** — Workflows and Activity tabs show tiles, upcoming runs, run history, channel status, pending approvals, extraction journal, and dream journal
- **Persistent run history** — execution history survives workflow deletion for auditability
- **Monitoring / polling** — interval schedules plus condition steps support ongoing monitors like price checks or release watchers
- **Stop / cancel support** — running workflows can be stopped from the chat header, activity panel, or workflow card

---

## Designer Studio

Designer Studio is Row-Bot's dedicated visual-authoring subsystem. It spans five distinct **project modes**, a sandboxed interactive runtime, an authoring guardrail stack, and a mutation-reviewable tool surface for editing projects turn over turn.

### Project Modes

Every project is created in one of five modes. Each mode carries its own canvas presets, template gallery, prompt budgets, critique rules, runtime behavior, and export targets.

- **`deck`** — traditional slide decks; 16:9 canvas; ≤5 bullets per slide; PPTX export via `python-pptx` preserves editable text runs, images, and charts
- **`document`** — long-form report / one-pager pages; A4 or letter canvas; 130–160 words per block; PDF export is the primary delivery format
- **`landing`** — interactive marketing landing pages; vertical scroll canvas; CTAs and multi-section hero / feature / pricing layouts; published as interactive HTML
- **`app_mockup`** — multi-screen app prototypes; route-aware navigator so the agent can define screens and declarative navigation between them; runtime bridge turns link / button clicks into in-preview route changes
- **`storyboard`** — motion / ad storyboards; limited to 3–4 blocks per frame to avoid cropping; pairs naturally with the video generation tool for per-frame motion references

### Interactive Runtime

Interactive modes (`landing`, `app_mockup`, `storyboard`) do **not** allow free-form `<script>` from the agent. Behavior is expressed declaratively via `data-row-bot-action` attributes and interpreted at runtime by a sandboxed bridge.

- **`designer/runtime/` package** — loads per-project runtime state, resolves route / screen navigation, handles state toggles, controls media playback, and dispatches declarative actions to real DOM operations inside the preview iframe
- **Declarative action grammar** — `data-row-bot-action="navigate:screen-id"`, `data-row-bot-action="toggle:state-key"`, `data-row-bot-action="play:asset-id"`, etc. — the agent authors intent, the runtime executes it safely
- **Shared preview + publish runtime** — the same runtime powers editor preview, presenter mode, and published share links so interactive projects behave identically in all three surfaces

### Project Model & Storage

- **Multi-page / multi-screen projects** — each project stores a page list, canvas dimensions, aspect ratio, mode, title metadata, notes, brand settings, and (for app mockups) a route map
- **Home gallery** — the Home screen includes a dedicated **Designer** tab with recent projects, new-project flows, and quick reopen actions
- **Canvas presets and resizing** — projects can be resized after creation; mode-appropriate presets are offered up front
- **Reference storage** — uploaded briefs, screenshots, and source material are stored as reusable references so future designer sessions can reopen them without reuploading
- **Asset-backed media** — project HTML stores media as `asset://<asset-id>` references rather than brittle placeholder tokens; `designer/render_assets.py` normalizes legacy refs, preserves `data-asset-id`, and hydrates assets for preview, presentation, export, and published output
- **Persistent asset storage** — designer assets live on disk under `~/.row-bot/designer/assets/`; projects and references are stored separately
- **Windows-safe writes** — `designer/storage.py` uses temp-file + replace semantics with retry logic to avoid broken saves on Windows file locks

### Editor & Authoring

- **Full-width editor** — `designer/editor.py` switches the app into a full-width editing mode with page / screen navigator, preview, controls, and assistant side chat
- **Shared chat primitives** — the designer editor reuses `ui/chat_components.py` so uploads, input behavior, and chat rendering match the main conversation UI
- **Surgical tool surface** — the designer tool can set, update, add, move, duplicate, and delete pages / screens; move, replace, restyle, and remove individual elements; refine-text-in-place (shorten / expand / simplify / rewrite); insert reusable components; update brand settings; and resize projects
- **Setup flow** — the creation flow captures mode, format, audience, tone, and source brief before generating an initial draft
- **Typed image slots** — templates declare expected image slots by semantic role (hero, thumbnail, icon, background, etc.) so generated imagery lands in intentional places with appropriate aspect ratios
- **Reusable components** — curated insertable blocks like heroes, stat bands, timelines, testimonials, pricing sections, and app shells accelerate common layouts
- **Authoring guardrails** — mode-specific content budgets, no-decorative-overlap rules, horizontal button-row rules, and slot-typed imagery are encoded in `designer/prompt.py` so the agent produces layout-clean output on first draft

### Critique & Repair Loop

- **`designer/critique.py`** — runs deterministic checks for overflow, card-heavy sections, contrast, hierarchy, readability, and spacing on any page
- **Mandatory post-edit critique** — the designer tool automatically critiques after each structural change and applies safe repairs before returning control to the agent
- **Repair operations** — deterministic, side-effect-scoped fixes (e.g. trim overflowing blocks, drop redundant bullets, fix contrast, respace buttons)
- **Review dialog** — a mutation diff view shows exactly what the agent changed on each turn, per page, so the user can accept, revert, or spot-check without hunting through the project

### AI Content

- **AI image generation** — generate slide / page imagery directly inside the designer workflow, routed into typed image slots
- **AI video generation** — storyboard frames and landing hero videos can be generated via the `video_gen_tool` and referenced as `asset://` media
- **Chart insertion** — create charts from inline CSV and place them in a page layout
- **Speaker notes** — generate and persist notes for presenter use

### Presentation, Sharing & Export

- **Presenter mode** — `designer/presentation.py` serves Reveal.js-based presenter mode with notes support (deck mode)
- **Export pipeline** — Playwright drives raster + HTML export; `python-pptx` drives editable PPTX; `weasyprint` / Playwright drive PDF; PNG exports for any page
- **Published share links** — self-contained interactive HTML (with runtime bridge) is mounted under `/published` for direct sharing

---

## Developer Studio

Developer Studio is Row-Bot's code-workspace subsystem. It is not a full IDE; it is a Codex-style agent workbench for connecting local Git repositories, reviewing code, making scoped edits, running tests, preparing branches/commits/PRs, and keeping the user in control through approval modes and an inspector.

### Workspace Model

- **`developer/` package** — owns workspace storage, Git helpers, runtime profiles, approval policy, sandbox state, tool context, todos, change ledger, inspector snapshots, GitHub helpers, and UI
- **Explicit repo linking** — users open an existing local repo or clone into a folder they choose. Row-Bot stores a workspace link and metadata, not a copy of the repo in app data
- **Code threads** — Developer conversations are tagged as code threads and reopen directly into Developer Studio with the associated workspace context
- **Workspace context injection** — Developer turns receive compact hidden context with repo path, branch, dirty state, remote URL, top-level files, approval mode, execution mode, shell guidance, and sandbox state
- **No user-message leakage** — Developer context is injected as model context and is not rendered as part of the visible user message

### Approval Modes & Tooling

- **Mode-specific policy** — read-only, ask-before-changes, auto-edit, and agent-run modes control file writes, shell commands, Git operations, commits, pushes, and PR preparation
- **Native Developer tools** — `tools/developer_tool.py` exposes workspace-scoped operations for repo info, file listing, reads, search, git status, diffs, todos, detected tests, shell commands, patch preview/apply, file writes, branch create/switch, commit, push, fast-forward merge, sandbox imports, and safe revert of agent-owned changes
- **Shell remains available** — Developer-native tools are preferred for repo work, but shell is still available for legitimate project commands and follows Developer approval policy
- **Long coding budget** — Developer turns have a larger recursion/step budget than normal chat, with wind-down prompts that checkpoint progress before a generic recursion failure
- **Tool guide and skills** — the Developer tool guide plus Developer coding/review/PR/custom-tool skills are injected for Developer context without bloating normal chat by default

### Inspector & Live State

- **Developer Inspector** — the right-side inspector shows Overview, Safety Policy, Sandbox, Todos, Changes, Files, Agent Changes, Tests, and GitHub/PR sections
- **Debounced snapshots** — `developer/inspector_snapshot.py` builds lightweight snapshots that the UI can apply without fully rebuilding the chat transcript
- **Resizable panel** — the inspector can be widened for diffs, files, and test output
- **File tree** — Files render as a repo tree instead of a flat list, with generated/build/cache paths filtered out
- **Change ledger** — `developer/change_ledger.py` tracks agent-owned edits, line counts, diffs, and revert eligibility
- **Todo persistence** — `developer/todos.py` stores visible coding plans so long-running work can show current, pending, and completed steps

### Docker Sandbox

- **Execution modes** — Local runs commands in the selected repo folder; Docker Sandbox runs commands in an isolated shadow copy
- **Persistent sandbox container** — `developer/sandbox_runtime.py` manages a per-workspace container and shadow workspace so repeated commands share sandbox state until rebuilt or cleaned
- **Import-gated edits** — sandbox changes become pending patches and only modify the real repo after explicit import
- **Network policy** — Docker Sandbox supports network off, ask, or on. Network/package-install attempts are blocked early when network is off and approval-gated when policy requires it
- **Image selection** — workspaces can choose a Docker image; changing the image cleans the sandbox copy before the next Docker command
- **Local fallback** — users who do not want Docker keep using local execution under the same Developer approval model
- **Clear startup errors** — stopped Docker, missing local images, and Docker credential-helper problems are reported as actionable sandbox errors

### GitHub & PR Flow

- **GitHub CLI detection** — `developer/github.py` and `developer/executables.py` locate `gh` from common install paths, especially on Windows where PATH can differ between the app and a shell
- **PR helpers** — branch, commit, push, and PR-prep tools are approval-gated and operate inside the selected workspace
- **No hidden remotes** — cloning asks for an explicit destination, and push/PR operations are visible through the active approval mode

---

## Custom Tools

Custom Tools let users convert a GitHub repo, local folder, or current Developer workspace into reusable Row-Bot tools without editing manifests by hand.

### Product Surface

- **Developer home surface** — Custom Tools live under Developer as a global area separate from workspaces, with cards for created tools, source, install path, command count, enablement, test output, and removal
- **Wizard flow** — the guided flow is Source -> Inspect -> Test -> Enable. Users can review proposed commands, run a smoke test, then choose whether the tool is only available in Developer or promoted to normal chat
- **Conversational builder** — `tools/custom_tool_builder_tool.py` exposes one compact agent-facing tool for clone/source setup, draft creation, command refinement, testing, creation, promotion, disable, and removal
- **Settings integration** — the Custom Tool Builder appears as a Utilities toggle. Disabling it removes the builder from normal chat while keeping the Developer UI available for manual management

### Command Generation & Validation

- **`developer/tool_capsules.py`** — retained internal module name for compatibility; user-facing copy says Custom Tool
- **LLM-assisted proposals** — a lightweight model pass inspects repo files and README content to propose useful, preferably read-only commands. Deterministic fallback remains available if AI analysis fails
- **Draft management** — the builder stores draft IDs so users can review and refine proposed commands across turns before creating the tool
- **Command classification** — commands are tagged by locality/risk and validated for dangerous shell patterns, unreviewed network behavior, write operations, missing placeholders, and malformed command templates
- **One-time tests** — local/read-only tests can run directly; network or riskier tests route through the normal approval mechanism
- **Promotion** — promoted Custom Tools register as synthetic plugin-style tools, inherit normal tool enablement, and can be disabled or removed without deleting the source repo

### Trust Boundaries

- **Source transparency** — cards show source URL, local install path, version, command count, and availability
- **No automatic broad enablement** — generated tools are opt-in and are not silently made available to normal chat
- **Repo code is not trusted by default** — proposed commands are reviewed and tested before promotion; users should only promote tools whose behavior they understand

---

## Row-Bot Status & Identity

Row-Bot now has a formal self-inspection and self-management surface: a tool for querying its own state, a controlled settings mutation API, and a Preferences UI for identity and self-improvement.

### Status Queries

- **`row_bot_status` tool** — read-only introspection across `overview`, `version`, `model`, `channels`, `memory`, `skills`, `tools`, `api_keys`, `identity`, `tasks`, `vision`, `image_gen`, `voice`, `config`, `logs`, `errors`, and `designer`
- **Live runtime visibility** — the tool can report current model/provider, active channels, knowledge graph counts, enabled skills, configured APIs, task state, voice and image settings, and designer project counts
- **Diagnostics access** — recent warnings, errors, and tracebacks can be summarized without opening log files manually
- **Home health bar parity** — `ui/status_checks.py` and `ui/status_bar.py` expose compact health checks for Ollama, active model, cloud API, tunnel, OAuth accounts, workflows, knowledge, wiki vault, documents, search, skills, tracker, Buddy, MCP, plugins, network, tools, disk, threads DB, FAISS, Dream Cycle, TTS, and logging

### Controlled Self-Management

- **`row_bot_update_setting`** — approved mutations for Brain and Vision model switching, assistant name, personality, context caps, dream-cycle controls, skill toggles, tool toggles, image-generation model, video-generation model, manual dream-cycle trigger, and self-improvement toggle
- **Interrupt-gated writes** — all state-changing operations route through explicit user confirmation before they are applied
- **Optional self-improvement toolchain** — when self-improvement is enabled, `row_bot_create_skill` and `row_bot_patch_skill` become available
- **Skill patch safety** — bundled skills are patched via user-space overrides, not in-place mutation; old versions are backed up under `~/.row-bot/skill_versions/`

### Identity & Preferences

- **`identity.py`** — stores assistant name, personality text, and self-improvement flag; sanitizes personality input before save
- **Preferences tab** — Settings exposes name, personality, preview, and self-improvement controls in one place
- **Prompt integration** — the same identity settings are consumed by `self_knowledge.py` so the opening line seen by the model matches what the user configured
- **Parallel UI surface** — the Home health/status bar provides a visual health view for the user, while `row_bot_status` exposes the same class of state to the agent

---

## Self-Knowledge & Insights

Row-Bot now carries an explicit self-description into prompts and uses Dream Cycle to turn recent activity into structured insight objects.

### Prompt-Time Self-Knowledge

- **Feature manifest** — `FEATURE_MANIFEST` in `self_knowledge.py` is the canonical inventory of major capabilities used when Row-Bot explains what it can do
- **Dynamic identity line** — `build_identity_line()` combines the configured assistant name and personality into the opening identity sentence
- **Dynamic state block** — `build_self_knowledge_block()` appends live state like current model, configured providers, entity count, last dream summary, active channels, designer project count, and enabled skills
- **Prompt injection** — the self-knowledge block is added alongside tool, memory, and citation guidance so the model can talk about itself accurately without outdated copy in `prompts.py`

### Insight Generation & Triage

- **Dream snapshot analysis** — Dream Cycle phase 5 captures logs, provider/model/media configuration, usage signals, and active insights, then runs `DREAM_INSIGHTS_PROMPT`
- **Structured insight store** — `insights.py` persists categorized insights like `error_pattern`, `skill_proposal`, `tool_config`, `knowledge_quality`, `usage_pattern`, and `system_health`
- **Dedup and pruning** — similar titles and semantically overlapping insights are merged; stale insights are auto-pruned; last-analysis time is tracked
- **Pin / dismiss / apply actions** — the Workflow Console exposes user actions for curating the insight list without leaving the app
- **Skill proposals** — insight objects can carry draft skill metadata, which pairs naturally with the self-improvement toolchain when enabled

---

## Messaging Channels

Row-Bot uses a generic **Channel** abstraction. Any messaging platform can plug in by subclassing the base adapter, declaring capabilities, and registering itself. The system then auto-generates tools, settings UI, monitoring, and approval routing around that channel.

### Channel Architecture

- **`Channel` ABC** — adapters implement lifecycle methods (`start`, `stop`, `is_configured`, `is_running`) plus outbound send methods for text, photos, documents, and approval requests
- **`ChannelCapabilities`** — declarative feature flags describe what each channel supports: photos, documents, voice, buttons, streaming, typing, reactions, and commands
- **Config schema** — each channel declares config fields so Settings can render the right form dynamically
- **Channel registry** — adapters self-register; runtime helpers expose all channels, running channels, configured channels, and delivery routing
- **Channel credential store** — `channels/auth_store.py` stores channel secrets through a channel-specific OS keyring path with legacy fallback so running channels survive migrations even if UI fields are intentionally blank
- **Shared media pipeline** — inbound audio transcription, image analysis, document extraction, inbox persistence, and workspace copy helpers are centralized in `channels/media.py`
- **Shared utilities** — auth, command handling, approval routing, media capture, and corrupt-thread repair live in reusable channel modules
- **Tool factory** — running channels contribute auto-generated send/photo/document tools through `channels/tool_factory.py`
- **Activity tracking** — per-channel last-activity timestamps drive the sidebar monitor and status surfaces

### Bundled Channels

- **Telegram** — full agent access, streaming edits, voice transcription, photo analysis, document extraction, emoji reactions, inline approval buttons, `/model` support, and HTML-safe formatting
- **WhatsApp** — Baileys bridge with QR pairing, inbound/outbound media, rich YouTube previews, Markdown-to-WhatsApp formatting, and streaming-style updates
- **Discord** — DM-based adapter with reactions, typing indicators, slash-command integration, and media support
- **Slack** — Socket Mode adapter with DM threading, message updates, reactions, typing indicators, and file uploads
- **SMS** — Twilio adapter with inbound webhook support, outbound SMS/MMS, and tunnel-manager integration for public callbacks

### Delivery & Monitoring

- **Auto-generated channel tools** — when a channel is running, the agent gains send/photo/document tools for that channel automatically
- **Approval routing** — approvals can be sent through supported channels with inline action controls
- **Sidebar channel monitor** — the conversation sidebar shows live status dots, icons, display names, and relative last-activity timestamps
- **Auto-start and config persistence** — channel enablement and settings persist to `~/.row-bot/channels_config.json`

---

## Tunnel Manager

A provider-agnostic tunnel layer exposes local webhook ports to the internet when a channel needs inbound delivery.

- **Provider abstraction** — `TunnelProvider` defines the backend contract; `NgrokProvider` is the current implementation
- **`TunnelManager` singleton** — manages tunnel lifecycle, per-port allocation, cleanup, and status reporting
- **Automatic use by channels** — channels that need a public callback request a tunnel on start and release it on shutdown
- **Optional app tunneling** — the main Row-Bot UI can also be exposed intentionally for remote access
- **Settings UI** — tunnel provider, auth token, and active-tunnel status live in the System settings surface
- **Health checks** — tunnel status participates in the status monitor and diagnostics flows

---

## X (Twitter) Tool

Row-Bot integrates with X API v2 through a native httpx-based client, grouped into three high-level tool entry points.

- **3 grouped LangChain tools** — `x_read`, `x_post`, and `x_engage`
- **Read operations** — search, read tweet, timeline, mentions, and user info
- **Post operations** — post tweet, reply, quote, and delete tweet
- **Engagement operations** — like, unlike, repost, unrepost, bookmark, and unbookmark
- **OAuth 2.0 PKCE** — browser-based auth flow with a local callback server and refresh-token support
- **Rate-limit tracking** — per-endpoint rate information is recorded and surfaced in structured error responses
- **Tier discovery** — X tier information is persisted and reused for rate-limit expectations
- **Local token storage** — auth state lives in `~/.row-bot/x/`
- **Settings UI** — connect, disconnect, and inspect X auth from Accounts settings

---

## Tool Guides

Tool guides are lightweight `SKILL.md` packages that attach contextual instructions to tools without hard-coding those instructions into the main system prompt.

- **Skill-like format** — each guide is a directory with a `SKILL.md` file and YAML frontmatter, just like a manual skill
- **`tools:` activation field** — guides declare the tools they apply to; when any linked tool is in the active tool belt, the guide is injected automatically
- **Prompt injection** — `prompts.py` discovers active guides and appends them to the system prompt at runtime
- **Invisible to the manual skill toggles** — tool guides are auto-managed and do not clutter the user-facing skill list
- **20 bundled guides** — Browser, Calendar, Chart, Custom Tool Builder, Designer, Developer, Email, Filesystem, Math, MCP, Shell, Telegram, Row-Bot Status, Tracker, Updater, Video, Vision, Weather, Wiki, and X
- **Consistency benefits** — guide content can evolve independently of the main prompt, reducing drift and duplicated instructions

---

## Image Generation

Row-Bot can generate and edit images through multiple external providers, render them inline, persist them to disk, and reuse them in designer workflows or channel delivery.

- **Provider support** — OpenAI image models, xAI Grok Imagine, Google Imagen 4, and Gemini image-capable models
- **Generate and edit flows** — prompts can generate a new image or edit the most recent image, an attached image, or an on-disk file
- **Inline rendering** — generated images are surfaced directly in the chat stream without requiring a separate viewer
- **Per-thread persistence** — generated images are saved into Row-Bot's media storage so they survive refreshes and can be referenced later
- **Channel delivery** — running messaging channels can pick up generated images and send them as photos
- **Designer reuse** — Designer Studio can invoke the same provider layer for slide assets and visual content generation
- **Settings selector** — the active image-generation model is configurable from Settings and queryable through `row_bot_status`

---

## Video Generation

Row-Bot can generate short video clips from text prompts or reference images through Google Veo and xAI Grok Imagine Video for chat use and Designer storyboard workflows.

- **`video_gen_tool`** — top-level agent tool for text-to-video and image-to-video generation
- **Provider support** — Google Veo handles text-to-video and image-to-video with provider-side person-generation policy handling; xAI Grok Imagine Video supports text-to-video and image-to-video with provider-specific aspect ratio, duration, and resolution constraints
- **Inline rendering** — generated clips are surfaced directly in the chat stream with safe media-element hydration
- **Designer integration** — Designer Studio storyboards and landing hero slots can reference generated videos as `asset://` media; motion clips are rendered in preview, presenter mode, and published share links
- **Persistent asset storage** — generated clips are saved to Row-Bot's media storage so they survive thread refreshes and can be reused across designer projects
- **Channel delivery** — running messaging channels can pick up generated videos and deliver them where supported

---

## MCP Client & External Tools

Row-Bot includes a guarded Model Context Protocol client that can connect external MCP servers and expose their tools to the ReAct agent without making external servers part of Row-Bot's trusted core.

### Runtime Model

- **Dedicated package** — `mcp_client/` owns persistent config, marketplace search, dependency checks, safety classification, runtime sessions, logging, result normalization, and curated starter metadata
- **Separate config file** — MCP state is stored in `~/.row-bot/mcp_servers.json`, separate from native tool toggles, so malformed or broken MCP config falls back to an empty disabled config instead of damaging normal tool settings
- **Global enable switch** — `enabled` in MCP config is the top-level kill switch. Turning it off stops active sessions, clears the discovered catalog, removes dynamic MCP tools from the agent, and keeps saved server definitions for later
- **Per-server runtime** — each enabled server gets its own `McpServerRuntime` session tracked by status (`connecting`, `connected`, `failed`, `dependency_missing`, `stopped`, `global_disabled`), tool counts, timestamps, transport, and last error
- **Transport support** — stdio, Streamable HTTP, and SSE are supported through the Python MCP SDK. Each server can set command, args, cwd, env, URL, headers, connect timeout, tool timeout, and output limit
- **Non-blocking startup** — `app.py` discovers enabled servers during startup in a guarded path. Exceptions are logged as warnings and do not stop Row-Bot from launching
- **Shutdown cleanup** — app shutdown calls MCP runtime shutdown to close child sessions and stop external stdio processes

### Dynamic Tool Injection

- **Parent registry tool** — `tools/mcp_tool.py` registers `mcp` / **External MCP Tools** as the native parent tool. It is the stable toggle users see in Settings and Row-Bot Status
- **Dynamic wrappers** — discovered enabled MCP tools are converted into LangChain `StructuredTool` instances at agent build time, with names generated as `mcp_<server>_<tool>`
- **Schema conversion** — JSON input schemas are converted into Pydantic argument models where possible, with a permissive fallback for complex or invalid schemas
- **Resources and prompts** — servers can optionally expose `list_resources`, `read_resource`, `list_prompts`, and `get_prompt` utility tools through per-server advanced toggles
- **Readable display names** — `agent.py` resolves tool-call UI labels back to the original MCP tool and server name, for example `MCP: microsoft_docs_search (microsoft-learn-mcp)`
- **Result normalization** — MCP text, structured content, resource links, embedded resources, binary/image blocks, empty results, errors, and oversized outputs are normalized before being sent back into the model

### Safety & Trust Boundaries

- **External output is untrusted** — the MCP tool guide tells the agent not to follow instructions found inside MCP results unless they are clearly part of the user's request
- **Native tools stay preferred** — Row-Bot Memory, Browser, filesystem, document, search, channel, and Designer capabilities remain canonical for Row-Bot-owned behavior; overlapping MCP servers are treated as external alternatives
- **Destructive classification** — tool names, descriptions, and MCP annotations are inspected for write/send/delete/run/deploy/payment-style behavior. Destructive tools require approval and are not enabled by default after discovery
- **Approval synchronization** — destructive MCP wrapper names are included in the parent tool's `destructive_tool_names`, so they flow through the existing interrupt approval mechanism
- **Background workflow rules** — MCP destructive tools follow workflow safety mode: approval-required modes interrupt, while explicit allow-all mode can run enabled destructive MCP tools
- **Capability overlap detection** — `mcp_client/conflicts.py` labels MCP servers that overlap native memory, browser, documents, web search, URL reading, channels, or Designer capabilities and forces manual tool selection for overlap/high-risk imports
- **Secret masking** — diagnostics use masked config output so headers, tokens, and environment values are not displayed raw

### Settings UI & Marketplace

- **Settings → MCP** — `ui/mcp_settings.py` provides the user-facing MCP control surface: global enable switch, add server, import config, browse MCP servers, diagnostics, test, refresh, edit, delete, and per-tool controls
- **Disabled-until-tested imports** — manual JSON imports and marketplace entries are saved disabled. Users test the server before enabling it
- **Tool review rows** — after a successful probe, each tool shows name, description, input schema summary, enabled state, destructive badge, approval state, and whether it comes only from saved config or live catalog
- **Marketplace adapters** — `mcp_client/marketplace.py` can search curated starters plus official-style directories, PulseMCP, Smithery, and Glama, with cache and curated fallback when live results fail or ignore the query
- **Starter metadata** — curated entries preserve trust tier, risk level, auth requirement, native overlap, requirements, notes, and install recipe metadata
- **Diagnostics dialog** — Settings can display masked MCP config and live status summary for support/debugging without requiring file edits

### Runtime Requirements

- **Requirement inference** — stdio launch commands infer runtime requirements for `npx`/Node.js, `uvx`/uv, Docker, and Playwright MCP browser dependencies
- **Managed user-space installs** — Row-Bot can install private Node.js LTS, uv, and Playwright Chromium runtimes under `~/.row-bot/runtimes/` and inject those paths only into MCP child process environments
- **Manual complex dependencies** — Docker and other heavyweight system dependencies are surfaced as manual setup requirements with setup links instead of being bundled into Row-Bot
- **No bundled MCP runtimes** — Row-Bot depends on the Python MCP SDK, but external server runtimes are resolved at runtime so the app package does not ship Node, uv, Docker, or browser payloads unnecessarily

### Testing & Release Checks

- **Offline regression suite** — `tests/test_mcp_client.py` covers config fallback, secret masking, safety classification, marketplace fallback/filtering, conflict policy, runtime requirement handling, managed environment injection, settings rows, stdio discovery/call, global disable, bad server failure, display names, background safety, and browser-loop handling
- **Opt-in live E2E** — `scripts/mcp_real_world_e2e.py` and `tests/test_mcp_real_world_e2e.py` connect to public MCP servers outside normal CI to validate import, probe, manual tool enablement, dynamic wrapper invocation, and read-only approval classification
- **Maintainer workflow** — MCP-heavy releases run the offline suite first, then the live public E2E check from the repo root

---

## Migration Wizard

Row-Bot includes a one-time migration wizard for moving selected data from Hermes Agent or OpenClaw into a Row-Bot data directory without treating legacy state as trusted runtime configuration.

### Flow & UI

- **Preferences launcher** — `ui/settings.py` exposes **Open Migration Wizard** at the bottom of Settings → Preferences. The wizard opens in a maximized dialog so it stays available without occupying a permanent settings tab
- **Three-step flow** — `ui/migration_wizard.py` guides users through source/target selection, read-only scan/review, and explicit apply
- **Provider support** — users choose Hermes Agent or OpenClaw. Defaults point at `~/.hermes` or `~/.openclaw`, but any source and target folder can be selected for disposable test runs
- **Preview controls** — categories and rows show status, selection state, conflict notes, manual-review notes, archive-only behavior, and report paths after apply

### Detection & Planning

- **Pure migration package** — `migration/` owns the feature: `core.py` models plans/items/summaries, `redaction.py` masks secrets, `detection.py` scans sources read-only, `planner.py` builds dry-run plans, `apply.py` writes backups/reports, and `fixtures.py` creates realistic test homes
- **Read-only scan** — detection and planning do not write to either source or target. Existing target files are only inspected to mark conflicts
- **Provider mismatch guard** — Hermes scans reject OpenClaw-looking folders, and OpenClaw scans reject Hermes-looking folders, returning an empty actionable plan instead of partial generic matches
- **Mapped data** — planners can map model/provider config, identity/persona files, long-term memories, OpenClaw daily memory, skills, MCP server definitions, and explicit API key/token entries
- **Risk boundaries** — channel config, approvals, browser/cron/hooks/tools settings, legacy runtime state, logs, sessions, OAuth/auth stores, plugin state, and broad command allowlists are skipped for manual review or copied only into the migration report archive

### Apply, Backups & Reports

- **Explicit apply** — only selected planned/sensitive items are applied. Archive-only items are copied to the report archive, not activated in the live Row-Bot target
- **Backups first** — existing target files are backed up before overwrite/append/update. Multiple writes to the same target preserve the pre-migration original once per run, and newly created files are not backed up later in the same run
- **Redacted report** — each run writes redacted `plan.json`, `result.json`, `backup_manifest.json`, and `summary.md` under `migration-reports/<timestamp>/`
- **Archive redaction** — JSON and key/value archive snapshots are redacted before being copied into reports; binary or unsupported files are represented by a placeholder instead of raw content
- **MCP import safety** — migrated MCP servers are written disabled. They must be reviewed and enabled from Settings → MCP before any external tools become available to the agent
- **Credential import** — API keys and tokens are off by default and require explicit selection. Reports hide their values; selected keys route through target-profile secure storage via `api_keys.set_key_for_data_dir`, so normal imports use the OS credential store with metadata-only local files when keyring is available

### Testing

- **Focused suites** — `tests/test_migration_core.py`, `tests/test_migration_detection.py`, `tests/test_migration_planner.py`, `tests/test_migration_apply.py`, and `tests/test_migration_wizard_ui.py` cover model invariants, source detection, dry-run planning, wrong-provider rejection, conflict behavior, backups, reports, redaction, daily memory import, and UI helper logic
- **Realistic fixtures** — `migration/fixtures.py` builds multi-month Hermes and OpenClaw homes with fake secrets, memories, skills, channels, MCP servers, approvals, cron/hooks, plugins, sessions, logs, and archive-only state
- **Manual E2E path** — disposable targets under `.tmp/migration-fixtures/` are used for click-through validation before release; the fixture root is ignored by git

---

## Plugin System & Marketplace

A sandboxed, hot-reloadable extension system lets plugins add new tools and skills without modifying the core codebase.

### Plugin Architecture

- **Plugin API** — `PluginAPI` and `PluginTool` are the core abstractions available to plugins
- **Manifest system** — each plugin declares metadata, tools, skills, settings, and dependencies in `plugin.json`
- **Security sandbox** — static scans block dangerous constructs like `eval`, `exec`, and shell escape paths; imports from sensitive core modules are restricted
- **Dependency safety** — plugin dependency installs cannot silently downgrade core packages required by Row-Bot
- **State persistence** — enablement and non-secret config are stored under `~/.row-bot/plugin_state.json`; plugin API-key secrets use the OS credential store with metadata-only `plugin_secrets.json` state and session-only fallback for new saves when keyring is unavailable
- **Hot reload** — Settings can reload plugins without restarting the app; agent caches are cleared automatically
- **Skill auto-discovery** — plugin `skills/` directories are scanned for `SKILL.md` definitions and injected like built-in skills

### Marketplace

- **Marketplace index** — remote plugin catalog fetched from GitHub-hosted JSON with caching and update checks
- **Browse dialog** — search, inspect, and install plugins from within the app
- **Install / update / uninstall** — plugin archives are validated before install and reloaded immediately afterward
- **Per-plugin settings UI** — each installed plugin gets config controls, secret inputs, and enable/disable toggles in Settings
- **Custom Tool bridge** — promoted Custom Tools are registered through the plugin/tool surface as synthetic local tools so normal chat can use them without adding a separate extension mechanism

---

## Auto-Updates

`updater.py` polls the GitHub Releases API for the official Row-Bot repo on a background thread (30-second startup delay, then every 6 hours, with a 24-hour debounce on actual network calls). Checking is on by default; if there is no internet the call fails silently and the next tick retries.

- **Channel** — `stable` (default) hits `/releases/latest`; `beta` walks the top 10 releases and includes pre-releases. Persisted in `~/.row-bot/update_config.json`.
- **Manifest verification** — every release body must contain a fenced `<!-- row-bot-update-manifest -->` block with SHA256 hashes for each platform asset. Without a manifest entry, `download_update` refuses to install. The CI workflow `.github/workflows/update-manifest.yml` calls `scripts/append_sha_manifest.py` to PATCH the release body once artifacts are uploaded. The Linux one-line installer uses the same manifest before running the bundled tarball installer.
- **OS code signature** — Windows installs invoke `signtool.exe verify /pa`; macOS installs invoke `codesign --verify --deep --strict`. Linux tarball installs do not have a universal OS-level signing verifier, so Row-Bot relies on GitHub HTTPS plus the required SHA256 release manifest.
- **Hand-off** — Windows: `Row-Bot-x.y.z-Windows-x64.exe /SILENT /CLOSEAPPLICATIONS /RESTARTAPPLICATIONS`. The `.iss` uses `CloseApplications=yes` / `RestartApplications=yes` so Inno Setup can swap files while Row-Bot is running, and repair/upgrade deletes `{app}\python` before recopying the bundled runtime so stray packages installed into embedded Python cannot survive. macOS: `open <dmg>` and exit; the user drags the new app to `/Applications`. Linux: install the verified `Row-Bot-X.Y.Z-Linux-ARCH.tar.gz` into the user XDG release tree, atomically flip `~/.local/share/row-bot/current`, refresh the desktop entry/icon, and restart through `~/.local/bin/row-bot`.
- **UI** — a green "⬆ vX.Y.Z" status-bar pill appears when a newer release is detected; clicking opens the What's-New dialog (rendered release notes plus Install / Skip / Later buttons). Settings → Preferences → Updates exposes channel selection, "Check for updates", and a list of skipped versions.
- **Agent surface** — `tools/updater_tool.py` registers `row_bot_check_for_updates` (read-only) and `row_bot_install_update` (interrupt-gated). The dynamic self-knowledge block surfaces "Update available: …" when applicable, and `row_bot_status` adds an `updates` category.
- **Dev installs** — when a `.git/` directory sits next to the app source (i.e. running from a checkout), the scheduler is disabled and `row_bot_install_update` refuses, so working copies are never overwritten. On Linux, packaged installs are recognized by `install_info.json` with `platform: linux` and `install_kind: xdg-user-tarball`.

---

## Habit & Health Tracker

- **Conversational tracking** — log medications, symptoms, exercise, mood, sleep, periods, and other recurring data in natural language
- **Auto-detect & confirm** — the agent recognizes likely trackable events and asks before writing anything
- **3 tracker operations** — structured logging, flexible querying, and destructive delete with confirmation
- **Built-in analysis** — adherence, streaks, numeric summaries, frequency, day-of-week patterns, cycle estimation, and co-occurrence analysis
- **Trend visualization** — tracker queries can export CSV and chain directly into the chart tool for Plotly output
- **Fully local** — tracker data lives in `~/.row-bot/tracker/tracker.db`
- **Memory isolation** — tracker entries are intentionally excluded from the personal knowledge graph

---

## Desktop App

- **Native window** — runs in a desktop window via pywebview on Windows and macOS; Linux defaults to browser mode and can opt into `--native` when the desktop has the required GTK/Qt backend
- **System tray** — `launcher.py` exposes open and quit controls plus running-state feedback on Windows and macOS; Linux defaults to no tray and can opt into `--tray` when AppIndicator/desktop support is available
- **Splash screen** — Tk-based splash with console fallback during startup
- **Startup diagnostics** — `startup_diagnostics.py` runs early in `app.py` and probes fragile optional native packages. Missing optional packages are ignored; installed-but-broken packages such as TorchCodec are logged with recovery steps and patched out of optional Transformers availability checks where safe.
- **First-launch setup wizard** — starts with model/provider choice, then migration and setup-center steps for Local, Providers, Custom/Self-hosted, memory/docs, workflows, Designer, Developer, channels, voice, and related setup without touching config files by hand
- **Self-contained installers** — Windows and macOS releases bundle dependencies for one-click setup; Linux uses a one-line bootstrapper that verifies and installs the self-contained XDG tarball into user-owned paths
- **Launcher identity and ports** — the launcher probes `/api/launcher-ping` before reusing port 8080, passes the chosen port through `ROW_BOT_PORT`, and supports explicit `--browser`, `--native`, `--tray`, `--no-tray`, `--server`, `--no-open`, `--port`, and `--host` modes
- **Launcher recovery hints** — when the managed server exits during startup, `launcher.py` tails `~/.row-bot/row_bot_app.log` and emits targeted recovery hints for recognized startup signatures, including broken optional TorchCodec DLL loads in the embedded Windows runtime.
- **Launcher data recovery commands** — `launcher.py --reset-tasks-db`, `--reset-db`, and `--restore-data` back up SQLite DB families before recreating or restoring known task, memory, and thread databases
- **Auto-restart flow** — closing the native window does not kill the tray-managed app process; reopen is fast
- **Release pipeline** — build, sign, notarize, and publish automation lives in CI

---

## Chat & Conversations

- **Multi-turn threads** — conversation history is stored in SQLite via LangGraph checkpointing and local thread metadata
- **Auto-naming and switching** — threads are named from the conversation and can be reopened, exported, or deleted individually
- **Per-thread model override** — conversations can pin a different local or provider model than the global default
- **Input-level model picker** — the main chat model selector lives in the chat input area, loads cached options immediately, refreshes asynchronously, and keeps the top bar focused on thread state
- **File attachments** — drag-and-drop, clipboard paste, and standard upload flows handle images, PDFs, spreadsheets, JSON, and text
- **Media persistence** — chat media is stored per thread on disk with sidecar metadata; generated content persists more aggressively than transient capture artifacts
- **Inline rich rendering** — Plotly charts, Mermaid diagrams, YouTube embeds, syntax-highlighted code, and images render directly in the transcript
- **Shared chat components** — `ui/chat_components.py` provides the input bar, upload flow, and message container for main chat, Designer Studio, and Developer Studio
- **Bounded transcript rendering** — `ui/transcript.py` chooses a visible window for large threads, exposes load-earlier behavior, and avoids rendering every historic row on initial open
- **Checkpoint loading without graph import** — transcript loaders can read checkpoint messages and token usage without constructing the agent graph, reducing blank-thread and large-thread latency
- **Status monitor panel** — Home health-check pills, diagnosis actions, and quick settings links surface runtime health at a glance
- **Workflow Console integration** — approvals, recent runs, and insight actions are visible without leaving the conversation experience
- **Streaming hardening** — detached streams persist final content and media, grouped tool-call counts update during streaming, thinking text survives reattach/final render, and safe timer helpers avoid UI writes after clients disconnect
- **Output truncation warnings** — the UI warns when a response was cut short by model token limits

---

## Notifications

- **Desktop notifications** — workflow completions, reminders, and other events can raise OS-level notifications
- **Sound effects** — distinct audio chimes are used for workflow completion and timer-like alerts
- **In-app toasts** — lightweight status notifications appear in the UI; errors can remain persistent until dismissed
- **Unified API** — all notification surfaces flow through a single `notify()` entry point
- **Approval awareness** — approval requests can surface as both UI notices and channel-delivered prompts

---

## Stability & Diagnostics

Row-Bot includes a stability layer for the kinds of failures that are hard to catch from normal request logs: UI callback crashes, client-side JavaScript errors, event-loop stalls, memory spikes, and startup/shutdown issues.

- **`stability.py`** — centralizes crash reporting, UI callback error reports, client-side error capture, asyncio exception handling, thread/unraisable hooks, memory snapshots, and event-loop lag logging
- **Safe timers** — `ui/timer_utils.py` wraps deferred UI callbacks and polling timers so disconnected clients or deleted NiceGUI slots do not crash the app silently
- **Settings diagnostics** — model settings collection/render phases log timings and memory snapshots, while cached model catalogs keep large provider refreshes off the critical UI path
- **UI performance helpers** — `ui/performance.py` provides render generation tokens, timed UI sections, slow-section logging, and safe UI callback/task wrappers used by Settings, Knowledge, chat, and graph surfaces
- **Startup sequencing** — startup status covers cached model catalog load, workflow scheduler, MCP, plugins, channel migration/autostart, tunnel startup, and knowledge graph load
- **Clean shutdown** — app shutdown attempts ordered channel, tunnel, MCP, scheduler, and process cleanup to reduce locked logs and lingering child processes
- **Task database diagnostics** — Home, Command Center, and Row-Bot Status can report task-schema state, repair results, and launcher recovery guidance when workflow storage is missing or corrupt
- **Frontend error reporting** — browser-side exceptions are reported back into the structured log with enough context to correlate with UI actions
- **Performance probes** — memory RSS/VMS/thread counts, event-loop lag, token-counter refresh, model settings load, Settings tab render generations, transcript rendering, FAISS rebuild, and catalog refresh timings are logged for support investigations

---

## Bundled Skills

Skills are reusable instruction packs that shape how the agent thinks and responds. Each skill is a `SKILL.md` file with YAML frontmatter (display name, icon, description, required tools, tags) and freeform instructions injected into the system prompt when enabled.

Row-Bot ships with **17 manual bundled skills** and **20 tool guides**. Manual skills are toggled from Settings; tool guides auto-activate when their linked tools are available.

| Skill | Description |
|-------|-------------|
| **🧠 Brain Dump** | Capture unstructured thoughts and organize them into structured notes saved to memory |
| **💻 Claude Code Delegation** | Coordinate Claude Code CLI as an external coding agent through Row-Bot's approval-gated shell workflow |
| **☀️ Daily Briefing** | Compile a morning briefing with weather, calendar, and news headlines |
| **📊 Data Analyst** | Analyze datasets, produce statistical summaries, and create insightful charts |
| **💻 Developer Coding** | Plan and implement scoped code changes in Developer Studio using repo-aware tools and approval policy |
| **🔎 Developer Review** | Review code for bugs, regressions, missing tests, and risky behavior before summarizing |
| **🚀 Developer PR Prep** | Prepare branch, test, commit, push, and PR-ready summaries for Developer workspaces |
| **🧩 Developer Custom Tools** | Design, test, and promote Custom Tools from repos or folders without over-broad command surfaces |
| **🔬 Deep Research** | Perform multi-source research on a topic and produce a structured report |
| **🎨 Design Creator** | Structured workflow for presentations, one-pagers, reports, and visual layouts in Designer Studio |
| **🗣️ Humanizer** | Write in a natural, human tone without AI-speak or corporate filler |
| **📚 Knowledge Base** | Manage the personal knowledge base across graph memories, document intelligence, and the wiki vault |
| **📋 Meeting Notes** | Turn raw notes into actionable minutes with follow-ups and clear structure |
| **🎯 Proactive Agent** | Anticipate user needs, ask clarifying questions, and self-check work at milestones |
| **🪞 Self-Reflection** | Review memory for contradictions, gaps, and stale information |
| **⚙️ Task Automation** | Design effective workflows with steps, conditions, approvals, triggers, and delivery routing |
| **🌐 Web Navigator** | Strategic patterns for browser automation, research, forms, and data extraction |

- **Claude Code Delegation** — disabled by default, this skill treats Claude Code CLI as an external coding worker while Row-Bot remains coordinator. It favors bounded `claude -p` print-mode tasks, explicit working-directory checks, `--allowedTools`, turn/budget limits, diff inspection, Row-Bot-side verification, and user approval before write-capable or destructive delegation.
- **User skills** — custom skills live in `~/.row-bot/skills/<name>/SKILL.md`; user skills with the same name as a bundled skill override it
- **In-app skill editor** — skills can be created and edited directly from Settings
- **Per-skill enablement** — only enabled manual skills are injected into the system prompt
- **Per-thread and per-workflow overrides** — skill selection can be narrowed for individual threads and workflows
- **Tool guides remain automatic** — Browser, Calendar, Chart, Custom Tool Builder, Designer, Developer, Email, Filesystem, Math, MCP, Shell, Telegram, Row-Bot Status, Tracker, Updater, Video, Vision, Weather, Wiki, and X guides are in the built-in set

---

## Core Modules

| File | Purpose |
|------|---------|
| **`app.py`** + **`ui/`** | NiceGUI application shell, chat surfaces, lazy home tabs, health/status bar, workflow console, settings dialog, UI performance helpers, and native-webview integration points |
| **`buddy/`** + **`ui/buddy.py`** | Buddy companion event bus, behavior brain, config, asset validation, Hatch generation, in-app docked/undocked presence, and optional desktop overlay helpers |
| **`designer/`** | Designer Studio subsystem: gallery, editor, tooling, storage, exports, presentation mode, publishing, and asset hydration |
| **`developer/`** | Developer Studio subsystem: workspace links, Git helpers, approval policy, Docker/local runtime, sandbox state, inspector snapshots, todos, file tree, diffs, GitHub helpers, Custom Tool internals, and UI |
| **`ui/chat_components.py`** | Shared chat input, upload, and message-area components reused by main chat, Designer Studio, and Developer Studio |
| **`agent.py`** | LangGraph ReAct agent, prompt assembly, runtime readiness routing, chat-only execution, provider transcript normalization, streaming event generation, tool routing, interrupt handling, cache clearing, and background execution integration |
| **`threads.py`** | SQLite-backed thread metadata, LangGraph checkpoint wiring, checkpoint transcript helpers, per-thread media storage, and thread-level overrides |
| **`memory.py`** | Backward-compatible memory wrapper that maps legacy memory calls onto the knowledge graph implementation |
| **`memory_policy.py`** + **`memory_evolution.py`** | Bounded auto-recall scoring/filtering/tracing plus memory status, tier, confidence, evidence, review, superseding, archival, and evolution journal helpers |
| **`knowledge_graph.py`** | Entity/relation store, FAISS and FTS5 recall, NetworkX traversal, deduplication, relation normalization, recall reinforcement, and graph stats |
| **`wiki_vault.py`** | Obsidian-compatible markdown vault export, indexing, search, and conversation export |
| **`dream_cycle.py`** | Nightly graph refinement engine: merges, enrichment, decay, relation inference, insights analysis, and journal logging |
| **`document_extraction.py`** | Background document map-reduce extraction pipeline with provenance-aware graph writes |
| **`models.py`** | Local model compatibility facades, context policy, context caps, Quick Choices, provider detection, model factories, and legacy model APIs |
| **`providers/`** | Provider auth, normalized model catalogs, background catalog cache, provider-qualified resolution, readiness evaluation, custom endpoint profiles/probes, runtime construction, transports, and display-safe provider status |
| **`embedding_config.py`** + **`embedding_providers.py`** | Embedding provider selection, local/cloud embedding backends, vector metadata, and stale-index detection |
| **`documents.py`** | Document ingestion, chunking, embedding, vector-store persistence, and per-document cleanup |
| **`voice.py`** | Faster-whisper-based speech input pipeline and voice-state management |
| **`tts.py`** | Kokoro text-to-speech integration, voice catalog, and streaming playback |
| **`vision.py`** | Camera capture, screen capture, and workspace image analysis via local or provider vision models |
| **`data_reader.py`** | Shared structured-data loader for CSV, TSV, Excel, JSON, and JSONL |
| **`data_paths.py`** | Shared Row-Bot data-directory and SQLite path resolution for tasks, memory, threads, diagnostics, and recovery commands |
| **`launcher.py`** | Desktop launcher, system tray, splash screen, app lifecycle, logging bootstrap, local runtime startup decisions, and DB recovery commands |
| **`stability.py`** | UI callback/error capture, asyncio/thread exception hooks, memory snapshots, event-loop lag logging, and crash diagnostics |
| **`startup_diagnostics.py`** | Early startup probes for optional native packages that can break app import/startup when partially installed |
| **`api_keys.py`** + **`secret_store.py`** | API key storage and retrieval for tools and API-key providers, backed by OS keyring with metadata-only local files and legacy plaintext migration |
| **`identity.py`** | Assistant name, personality, and self-improvement preference storage with sanitization |
| **`self_knowledge.py`** | Capability manifest, identity-line builder, live runtime state builder, and prompt-time self-knowledge assembly |
| **`insights.py`** | Structured insight store with dedup, pruning, pin/dismiss/apply state, and last-analysis tracking |
| **`prompts.py`** | Centralized prompt templates including summarization, extraction, and dream-insights analysis |
| **`memory_extraction.py`** | Background conversation scan that extracts entities and relations the live agent did not save |
| **`skills.py`** | Discovery, loading, activation, override, and prompt-building for manual skills and tool guides |
| **`bundled_skills/`** | 17 built-in manual skills as `SKILL.md` packages |
| **`tool_guides/`** | 20 built-in tool-specific auto-activation guides |
| **`tasks.py`** | Workflow engine, SQLite persistence, schema validation/repair, APScheduler scheduling, pipeline execution, run history, safety mode, and delivery routing |
| **`notifications.py`** | Unified desktop, sound, and toast notification system |
| **`channels/`** | Channel ABC, registry, media helpers, auth helpers, approval routing, command handling, tool generation, and bundled channel adapters |
| **`tunnel.py`** | Tunnel provider abstraction, ngrok integration, and lifecycle manager |
| **`tools/row_bot_status_tool.py`** | Self-introspection and controlled self-management tool, including optional self-improvement skill operations |
| **`tools/developer_tool.py`** + **`tools/custom_tool_builder_tool.py`** | Developer workspace operations and conversational Custom Tool creation/testing/promotion surface |
| **`tools/`** + **`designer/tool.py`** | Self-registering core tool modules, registry, base classes, Wikipedia recovery behavior, and LangChain tool conversion |
| **`plugins/`** | Plugin runtime, marketplace client, manifest validation, security scanner, and settings integration |
| **`mcp_client/`** | External Model Context Protocol client: config, runtime sessions, marketplace search, requirements, safety classification, diagnostics, and result normalization |
| **`migration/`** | Hermes/OpenClaw migration models, redaction, source detection, dry-run planning, realistic fixtures, and guarded apply/report generation |
| **`static/`** | Bundled frontend assets such as Mermaid, graph/visualization helpers, and Buddy runtime/motion assets |
| **`version.py`** | Single source of truth for the current Row-Bot version, located at `src/row_bot/version.py` |

---

## Data Storage

All user data is stored under `~/.row-bot/` (or `%USERPROFILE%\\.row-bot\\` on Windows) unless `ROW_BOT_DATA_DIR` is set.

```text
~/.row-bot/
├── threads.db                     # Conversation history and LangGraph checkpoints
├── media/                         # Per-thread media files and sidecar metadata
├── tasks.db                       # Workflows, schedules, pipeline definitions, run history, and approval state
├── memory.db                      # Knowledge graph entities and relations
├── memory_vectors/                # FAISS vectors for semantic memory recall
├── memory_recall_trace.json       # Recent auto-recall decisions and include/reject diagnostics
├── memory_evolution_journal.json  # Memory status/tier/review/superseding/audit changes
├── memory_extraction_state.json   # Last extraction metadata
├── extraction_journal.json        # Memory extraction journal
├── dream_config.json              # Dream Cycle settings
├── dream_journal.json             # Dream Cycle run log
├── dream_rejections.json          # Rejected inference-pair cache
├── insights.json                  # Structured insight store
├── api_keys.json                  # API key metadata only; raw key values live in the OS credential store when available
├── cloud_config.json              # Legacy provider-model pinning compatibility data
├── providers.json                 # Provider metadata, Quick Choices, compatibility profiles, probe results, and credential fingerprints only
├── model_settings.json            # Current model, context caps, and model setting compatibility state
├── model_catalog_cache.json        # Background-refreshed provider/local-runtime model catalog rows and refresh diagnostics
├── context_catalog_cache.json      # Cached context-window metadata used before live provider refresh
├── embedding_config.json           # Active embedding provider/model settings
├── app_config.json                # Onboarding and first-run flags
├── user_config.json               # Avatar preferences, identity, and self-improvement settings
├── channels_config.json           # Channel enablement and per-channel config
├── developer/
│   ├── workspaces.json             # Developer workspace links, approval mode, execution mode, sandbox image/network settings
│   ├── tool_capsules.json          # Registered Custom Tools and promotion/enablement metadata
│   ├── custom_tool_drafts.json     # Conversational Custom Tool Builder draft state
│   └── sandboxes/                  # Docker shadow workspaces and per-workspace sandbox state
├── shell_history.json             # Per-thread shell history
├── skills_config.json             # Manual skill enable/disable state
├── mcp_servers.json               # External MCP server config, global switch, tool enablement, approvals
├── mcp_marketplace_cache.json     # Cached MCP directory search results
├── migration-reports/             # Redacted migration plans, results, summaries, and archive snapshots
├── migration-backups/             # Pre-migration backups of overwritten target files
├── runtimes/                      # Optional user-space runtimes installed by MCP requirement helper
├── skill_versions/                # Skill patch backups for self-improvement flows
├── row_bot_app.log                  # Structured application log
├── splash.log                     # Splash-screen diagnostics
├── inbox/                         # Files received via messaging channels
├── browser_profile/               # Persistent Chromium profile
├── browser_history.json           # Browser history and snapshots
├── designer/
│   ├── projects/                  # Designer project JSON files
│   ├── references/                # Designer source/reference uploads
│   ├── assets/                    # Persistent project assets
│   └── published/                 # Published HTML bundles and shareable output
├── tracker/
│   ├── tracker.db                 # Habit and health tracker database
│   └── exports/                   # CSV exports for tracker charts
├── vector_store/                  # Uploaded document vector index
│   └── embedding_metadata.json     # Embedding provider/dimension metadata for stale-index detection
├── gmail/                         # Gmail OAuth tokens
├── calendar/                      # Calendar OAuth tokens
├── wiki/                          # Obsidian-compatible markdown vault export
├── x/                             # X OAuth tokens and tier metadata
├── installed_plugins/             # Marketplace-installed plugins
├── plugin_state.json              # Plugin config and enablement state
├── plugin_secrets.json            # Plugin API-key metadata only; raw key values live in the OS credential store when available
├── recovery/                      # Backups created by task/local DB reset and restore helpers
└── kokoro/                        # Kokoro TTS model and voice files
```

> Override the data directory by setting the `ROW_BOT_DATA_DIR` environment variable.

---

## Comparison with Other Tools

### Why not just use another open-source assistant?

Most open-source AI assistants are still **developer tools disguised as products** — CLI-first, config-file-driven, and built around Docker, YAML, and environment variables. Getting started often means cloning repos, editing configs, wiring databases, and debugging dependencies before you can ask a single useful question.

**Row-Bot is different.** It is packaged as a native desktop experience with one-click installers for Windows and macOS, a one-line Linux installer backed by a verified XDG tarball, local-first defaults, and a GUI that exposes models, tools, workflows, channels, Designer Studio, Developer Studio, Custom Tools, and memory without requiring terminal fluency.

### Why not just use ChatGPT?

| | ChatGPT / Claude / Gemini | Row-Bot |
|---|---|---|
| **Your data** | Stored on provider servers, subject to their privacy policies | Stays on your machine. With opt-in provider/custom models, only the current conversation and model-visible tool context go to the selected endpoint; memories, files, designer projects, and history remain local unless explicitly included |
| **Conversations** | Provider-owned chat history | Local SQLite-backed threads, exportable anytime |
| **Cost** | Subscription or provider billing | Free with local models; provider/custom usage is upstream API billing, self-hosted infrastructure, or ChatGPT subscription access only when you opt in |
| **Memory** | Limited, opaque, provider-controlled | Personal knowledge graph with entities, relations, bounded recall, audit/review states, visualization, wiki export, and background refinement |
| **Tools** | Limited app integrations and provider-defined plug-ins | 30+ core tools plus Developer-native tools, Custom Tool Builder, promoted Custom Tools, and auto-generated channel tools: shell, browser, filesystem, Gmail, Calendar, memory graph, Designer Studio, Row-Bot Status, MCP external tools, image generation, video generation, research tools, and more |
| **Customization** | Pick a model and maybe a custom instruction | Swap provider-qualified models per thread, workflow, or Developer workspace, configure name and personality, build workflows, toggle tools and skills, create Custom Tools from repos/folders, and enable self-improvement features |
| **Voice** | Usually cloud-processed | Local faster-whisper STT plus Kokoro TTS |
| **Availability** | Internet required | Local models work offline; hosted providers and custom endpoints are optional |

> **Bottom line:** cloud assistants rent you access to someone else's system. Row-Bot gives you **personal AI sovereignty** — local durable state, provider choice when you want it, and all of your long-lived data under your own control.

### How is Row-Bot different from OpenClaw?

[OpenClaw](https://github.com/openclaw/openclaw) is a strong open-source personal assistant aimed at multi-channel delivery and developer-centric workflows. The two projects overlap in ambition but optimize for different users.

| | Row-Bot | OpenClaw |
|---|---|---|
| **Getting started** | One-click installers and GUI-first setup on Windows and macOS, plus one-line Linux install with browser-first launch | CLI-oriented install flow and heavier terminal expectations |
| **Model routing** | Local-first data with local, hosted, ChatGPT / Codex, Ollama Cloud, and custom OpenAI-compatible model paths in one GUI | More cloud-first in typical setups |
| **Memory** | Typed personal knowledge graph with bounded recall, audit/review states, visualization, wiki export, and structured relations | Simpler text-centric memory patterns |
| **Knowledge refinement** | 5-phase Dream Cycle with merge, enrich, decay, infer, and insight passes | Experimental dreaming-style memory promotion flows |
| **Document intelligence** | Structured graph extraction with provenance, dedup, and relation typing | Strong workspace tools but less graph-centric document knowledge modeling |
| **Designer / Canvas** | Designer Studio for decks, one-pagers, reports, published links, plus inline Mermaid and Plotly rendering | A2UI-style interactive workspace focus |
| **Developer / Code** | Developer Studio for Git workspaces with code threads, approval modes, file tree, todos, diffs, tests, GitHub/PR prep, and optional Docker shadow sandbox | Developer-heavy CLI and terminal-first workflows |
| **Tools** | 30+ core tools plus Developer-native tools, Custom Tool Builder, promoted Custom Tools, and auto-generated channel send tools, including Designer Studio, Row-Bot Status, and MCP external tools | Broad built-in toolset with different emphasis |
| **Messaging channels** | 5 bundled channels with streaming, media handling, approvals, and a sidebar monitor | Wider channel catalog and gateway focus |
| **Autonomous workflows** | Step-based workflows with approvals, conditions, triggers, concurrency groups, and safety modes | Strong channel routing and automation, different orchestration model |
| **Desktop experience** | Native Windows and macOS desktop app with tray, splash, and setup wizard; Linux browser-first package with optional native/tray modes | More developer-first and channel-first in practice |
| **Privacy posture** | All durable state local; no Row-Bot servers | Self-hostable and privacy-conscious, but with a different operational model |

> **In short:** OpenClaw is an excellent multi-channel gateway for developer-heavy setups. Row-Bot is optimized for **personal AI sovereignty** — local-first memory, structured knowledge, integrated design and code workspaces, user-created tools, configurable self-knowledge, and a native desktop experience that does not require living in a terminal.
