Introduction

VexCoder is a local coding assistant you run as a binary. It connects to the model endpoint you configure, supports interactive CLI sessions and non-interactive batch runs, and keeps its setup lightweight enough to build from source on macOS, Linux, and Windows.

This book focuses on the public user surface:

building the binary
understanding the current runtime, application, and transport layout
creating a workspace with vex init
configuring the model endpoint and token
using the current CLI flags and interactive commands

The shortest path to a running session is in Quick Start. For the current code layout, see Architecture Overview.

Architecture Overview

VexCoder currently has two operator-facing surfaces in the source tree:

the interactive CLI UI started by src/bin/vex.rs
the non-interactive batch runner in src/batch_mode.rs

Most interactive application coordination is rooted at src/app.rs and its split submodules under src/app/ (for example commands/, slash_commands.rs, and layout.rs). The runtime core is found under src/runtime/, including context assembly, the edit loop, command execution, validation, and task state.

Current code layout

src/bin/vex.rs parses CLI arguments, loads config, and routes startup into the interactive UI, batch mode, export, compatibility helpers, and other CLI paths.
src/app.rs is the interactive application module root. The full-screen TUI command surface is found across src/app.rs, src/app/commands/, src/app/slash_commands.rs, and related helper modules under src/app/.
src/ui/render/ owns the ratatui-native task-surface renderer. It renders the task surface through render_task_layout() using ratatui Frame widgets, with one compact status row above the transcript body and composer. The rendering and infrastructure stack includes: unicode-width and unicode-segmentation for grapheme-aware display width calculations; textwrap for paragraph wrapping; ansi-to-tui for converting raw ANSI escape sequences into ratatui Span/Line structures; arboard for programmatic clipboard access via the /copy slash command; pulldown-cmark and syntect for the shared markdown rendering helpers with inline markdown styling active today and fenced-code highlighting handled inside the shared conversion module; ratatui-macros for line and span construction helpers; similar for unified diff rendering in edit previews (generic diff algorithm for inline structured diffs); color-eyre for structured panic hooks and pretty backtraces; dirs for cross-platform XDG/config directory resolution; tracing-appender for daily-rotated file logging when RUST_LOG is set; indicatif and console for progress spinners in headless batch mode; ignore and globset for gitignore-compliant workspace traversal and glob matching; pathdiff and dunce for cross-platform relative path computation; chrono for ISO 8601 timestamps in SSE event streams and all internal timestamp generation; base64 for binary content encoding in exports; indexmap for ordered insertion-preserving maps used in streaming tool-call accumulation (DerivedTurnState.pending_tool_calls), ensuring tool calls are serialized in the order they were opened; tower-http for the TraceLayer::new_for_http() middleware wired into build_http_router(), providing structured request/response tracing for authorized and unauthorized HTTP requests. crossterm is configured with bracketed-paste (prevents input corruption on multi-line pastes; active in src/terminal.rs) and event-stream (async terminal event integration). ratatui enables unstable-rendered-line-info, unstable-widget-ref, and unstable-backend-writer for scroll-offset tracking, efficient widget updates, and backend-writer parity. Tool calls, waiting-state telemetry, and assistant responses stream into transcript paragraphs on the shared body instead of a dedicated visible timeline strip. Short transcript bodies now start directly below the status row and grow downward until the body fills; only then does the live bottom-follow window scroll older rows upward. The fullscreen composer auto-fits against the current display row and column budget, keeps wrapped /command, @path, and pasted prompt text editable in place, and turns @path suggestions into a repo-wide interactive picker: Up / Down traverse ranked matches across the full workspace tree, Enter inserts the selected workspace-relative path, and Esc dismisses the picker so the raw mention token can still be submitted unchanged. The picker keeps a bounded ranked candidate set per keystroke so large workspaces do not pay a full-tree sort cost on every input edit. Free-form slash commands such as /edit, /plan, and /review consume those selected @path mentions as inline context before the model turn starts, while /explain treats @path as the requested file target. /edit and /fix also seed task-scoped edit grants (write-file, apply-patch, run-command) so the mutation workflow remains active after the slash command starts without downgrading broader session grants. Outside picker mode, the composer still supports visual-row Up / Down / Home / End navigation instead of forcing the operator out of task mode, while cli selection and copy gestures stay with the cli because the UI does not enable mouse capture. While timeline follow mode is active, the output pane stays on the accumulated transcript so each new server response appends to the existing scrollback instead of replacing it. Manual timeline navigation can still switch that pane into per-step detail, and Alt+End returns the surface to live follow mode without restoring a dedicated activity strip. The expand_rows_for_display helper in src/ui/render/transcript.rs splits embedded newlines before word-wrapping each sub-line, so server responses containing literal \n sequences render as separate visual rows. is_structural_transcript_row recognises bullet list items (- , * ) and numbered list items (1. , 2) ) as structural, passing them through the render path without word-wrap reflow. Scroll-offset clamping in apply_output_scroll_action and preserve_transcript_scroll_on_growth uses the expanded (word-wrapped) row count instead of the raw output row count, so the viewport range matches the render path and all rows are reachable.
src/app/model_update.rs pushes a verb-first one-liner into the transcript as each tool result arrives (e.g. "Searched …", "Read …", "Edited …") so the operator sees immediate progress instead of a blank screen while the model produces its response text. Consecutive completed read-only tools (codebase_search, read_file, search, search_files, search_content, find_files, list_files, list_dir, list_directory, glob_files, git_status, git_diff, git_log, git_show) now fold into a single [tool] paragraph regardless of tool name, keeping the transcript compact during multi-tool exploration sequences. Pending and completed edit_file rows also keep the structured multiline diff preview instead of collapsing the change into one JSON line, which preserves per-hunk evidence and add/remove color feedback in both renderers.
src/batch_mode.rs runs the same runtime headlessly for vex exec and writes JSONL or text output.
src/runtime/ contains the reusable runtime machinery: context assembly, the edit loop, command and sandbox plumbing, project instructions, task state, and validation. The Phase 1 ADR-038 split adds src/runtime/context_cache.rs for bounded in-memory file-rollup reuse and src/runtime/git_rollup.rs for opt-in git status/diff capture, so automatic turn assembly no longer has to pay synchronous git overhead by default.
src/state/conversation/ owns the conversation loop safeguards that sit above raw tool execution. Alongside the existing read-only and mutating-tool guards, it now short-circuits malformed read_file calls with missing paths and asks for a concrete file target or a repo-overview flow (list_files / codebase_search) instead of replaying the same raw tool error, including mixed parallel read-only rounds where a good list_files call and a malformed read_file arrive together. Write guards enforce VEX_DIFF_PREFERRED_ABOVE_LINES (warning) and VEX_WRITE_FILE_MAX_LINES (rejection) thresholds, steering the model toward apply_patch or edit_file for large files. Conversation history older than VEX_HISTORY_KEEP_TURNS turns (default 10) is condensed: tool results keep their first 5 lines plus a line-count indicator to stay within the context budget.
src/server/ owns the ADR-026 transport plumbing: HTTP routing and auth middleware (http.rs), SSE response framing (sse.rs), Unix socket binding (socket.rs), request handlers (handlers/mod.rs, handlers/session.rs), TLS helpers and config resolution (util.rs). Transport code reaches the runtime only through facade entrypoints in src/app/.
src/local_api.rs contains the LocalApiMode (RuntimeMode) and LocalApiFrontend (FrontendAdapter) that bridge the local API surface to the runtime engine. The local API surface is transcript-first: live assistant text is normalized into final_text transcript blocks so downstream consumers can render one enriched stream instead of stitching together separate assistant delta/message events.
src/tools/search.rs implements the codebase_search tool using a Tree-sitter-based structural index for Rust source files. The index extracts functions, structs, enums, impls, traits, modules, constants, and type aliases, and ranks results by exact name match, substring match, parent-scope match, and content keyword match.
src/tools/semantic.rs manages the optional semantic vector index persisted at .vex/index/. When VEX_EMBEDDING_PROVIDER is configured, chunks are embedded at logical boundaries and results are reranked by cosine similarity merged with structural scores.
src/tools/embed.rs provides the embedding client for the /v1/embeddings-compatible endpoint used by semantic search.
src/tools/workspace_explore.rs provides the list_dir and glob_files tools for workspace exploration. Both are workspace-confined, .gitignore-aware, and bounded to prevent unbounded output.
src/tools/workspace_ignore.rs implements WorkspaceIgnore on top of the ignore crate's gitignore matcher so that search_files, list_dir, glob_files, and find_files all skip ignored paths with gitignore-compatible directory semantics.

Streaming protocol coverage

The shared SSE parser in src/api/stream.rs and the normalized type surface in src/types/api_types.rs preserve documented streaming values from both messages-v1 and chat-compat backends.

heartbeats and structured stream errors
text, input-json, thinking, and signature deltas
citations, server-tool blocks, and web-search tool results
normalized usage totals plus cache, geography, and token-detail metadata
chat-compat chunk metadata such as service tier, system fingerprint, refusal text, logprobs, choice indexes, and tool-call type

Not every metadata field is rendered in the interactive transcript today, but the parser keeps those values in the normalized event surface instead of dropping them during protocol conversion.

A StreamTextNormaliser layer at the forward_conversation_update boundary intercepts embedded tool call markup (XML-like tags from local inference servers) and converts them into structured [tool]/[detail] transcript lines before they reach the TUI. This prevents raw SSE event data from leaking to the display and ensures all tool invocations render as paragraph blocks in the scrolling transcript pane. The local API handoff in src/runtime/json_handoff.rs and src/local_api.rs preserves those transcript rows plus transcript block start/delta/complete updates as canonical RuntimeEnvelope JSON events, so downstream clients can stay transcript-first over SSE without reparsing a flattened assistant text stream. The normaliser buffers chunk-split <tool_call>, <function=...>, and <parameter=...> fragments until they are complete enough to classify, so transcript-first consumers follow the backend's JSON delta stream without showing raw wrapper or partial tag text when the server breaks markup across arbitrary chunk boundaries.

The current ratatui surface keeps the composer pinned at the bottom edge and scrolls transcript paragraphs upward from that anchor, but the live turn state is still assembled from three sources: history_state.lines, current_turn_stream_segments, and active_stream_blocks. That split is the remaining complexity boundary for the tool-call cutover. The current repair work keeps scroll ownership on the ratatui transcript, fixes net-growth preservation when pending tool paragraphs are replaced by completed results, and defaults local text-protocol parsing to the hybrid tagged-plus-XML chain. The larger single-document cutover plan is recorded in docs/src/tool-call-cutover.md.

The live parser path for interactive turns remains the shared stream parser, the tool-call parser selected by the conversation loop, and the StreamTextNormaliser boundary that converts malformed inline tool markup into transcript-safe rows. The structured_parser module is present in tree as an optional framework and does not replace the live runtime parser path unless the ADR-043 adoption gates are satisfied.

A transcript buffering foundation (src/state/transcript_delta.rs) provides StreamingBlockBuffer plus TranscriptBlockKind for active structured-stream blocks. The buffer map is keyed by block index in TuiMode and runs in parallel with the transcript-first line path: transcript_display_rows() reads the block kind to gate the live streaming cursor, while task_output_view_with() reads buffered byte counts to expose a compact live throughput indicator in the output title during structured streaming. Bounded suffix deduplication still routes through bounded_incremental_suffix() in the shared streaming path, but the render surface no longer carries the earlier staged delta-consumer helpers that never landed in production.

The runtime envelope schema (schemas/runtime_envelope_v1.json) accepts tool names matching [a-z][a-z0-9_-]* and MCP-namespaced tools (mcp.<provider>.<tool>), covering all built-in and external tool registrations.

Crate design boundaries -- text processing

VexCoder uses several crates that touch text at different abstraction layers. Each crate occupies a distinct role with no overlap. The boundary rule is: never use a search/indexing crate for internal text processing, and never use a text-processing crate for file-content search or structural parsing.

Non-overlapping crate roles

Crate	Role	Scope	NOT used for
`aho-corasick`	Multi-pattern literal matching	File content search, keyword extraction from source text	Git output parsing, secret redaction
`regex-lite`	Lightweight internal text processing	Git output parsing, secret redaction, rate-limit extraction, format validation	Code search, RAG, semantic indexing, codebase search
`tree-sitter`	Structural AST indexing	Language-aware parsing of source files into syntax trees	Text processing, log parsing, redaction
`globset` / `ignore`	Filesystem traversal	`.gitignore`-aware path matching and directory walking	File content search, string processing
`quick-xml`	XML tool-call parsing	Structured extraction of `<function=...>` / `<parameter=...>` tags from model output	Git parsing, log analysis
`indexmap`	Ordered insertion-preserving maps	Streaming tool-call accumulation preserving insertion order	Search indexing, text processing
`tower-http`	HTTP middleware	Request/response tracing for the local API server	Application logic, text processing

regex-lite -- ASCII-only internal text processing

regex-lite is the only regex crate in the dependency tree. All patterns are ASCII-only (\d = [0-9], \w = [0-9A-Za-z_]). Non-ASCII characters are not supported in regex-lite patterns. This is intentional -- vexcoder's regex-lite usage exclusively targets machine-readable ASCII output from git, HTTP headers, and API responses.

Conventional use cases DISTINCT from RAG/semantic search/codebase_search:

Parsing structured output from external tools (git status, git diff, git apply, git log)
Extracting known fields from semi-structured strings (retry delays, durations)
Sanitizing/redacting sensitive data from logs, transcripts, and telemetry
Format validation (API key formats, token patterns, connection strings)

None of these overlap with codebase search, RAG, or semantic indexing.

The regex-lite modules live under src/runtime/ as three focused files:

git_parse.rs -- Structured parsing of git status --porcelain, git diff --stat, git diff --name-status, git log --oneline, and git apply output into typed enums and structs. Patterns compile once via OnceLock<regex_lite::Regex> and are reused across calls.
secrets.rs -- Output redaction for vendor API keys (sk-...), AWS access keys (AKIA...), GitHub PATs (ghp_/gho_/ghu_/ghs_/ghr_), PEM private key headers, bearer tokens, connection strings with embedded credentials, and generic secret assignments. Wired into sanitize_assistant_text so secrets never leak into the transcript or logs.
rate_limit.rs -- Extracts retry delay hints from Retry-After header values and error response body text ("try again in N seconds"). The header path is wired into map_api_status_error in the API client with fallback to body text for 429 detection.

Design rationale: regex-lite was chosen over the full regex crate because (a) vexcoder does not allow non-ASCII characters in these internal patterns, (b) the ~94 KB binary size overhead vs ~373 KB for full regex is meaningful for a CLI binary, and (c) the O(m*n) execution guarantee is the same.

Stream parser -- no regex

The stream parser (src/api/stream.rs) and text normaliser (src/api/stream/text_normaliser.rs) handle SSE framing, JSON delta parsing, and embedded XML-like tool call markup using zero-regex string scanning (starts_with, contains, manual index arithmetic). quick-xml handles structured XML extraction. regex-lite is not used in the streaming path.

Full git parsing stack

The git parsing stack is the foundation of vexcoder's value as a CLI tool working with git repos. The following git output formats are parsed:

Command	Parser	Output type
`git status --porcelain`	`parse_git_status`	`ParsedGitStatus` with per-file status entries
`git diff --stat`	`parse_diff_stat`	`ParsedDiffStat` with per-file changes and summary
`git diff --name-status`	`parse_name_status`	`ParsedNameStatus` with status chars and rename detection
`git log --oneline`	`parse_git_log_oneline`	`ParsedGitLog` with hash + subject entries
`git apply` (stdout+stderr)	`parse_git_apply`	`ParsedGitApply` with outcome classification per line

All parsers live in src/runtime/git_parse.rs and are re-exported from src/runtime.rs. git_rollup.rs orchestrates git command execution with timeout and cancellation support, using parse_git_status to produce structured rollups for context assembly.

Secret redaction -- always on

Secret redaction runs on every assistant text output through sanitize_assistant_text in src/runtime/policy.rs. The following patterns are detected and replaced with [REDACTED]:

Vendor API keys (sk- prefix, 20+ chars)
AWS access key IDs (AKIA prefix, 16 uppercase alphanumeric)
GitHub personal access tokens (ghp_, gho_, ghu_, ghs_, ghr_ prefixes, 36+ chars)
PEM private key headers (-----BEGIN ... PRIVATE KEY-----)
Bearer tokens (preserving the Bearer prefix)
Connection strings with embedded passwords (protocol://user:password@host)
Generic secret assignments (API_KEY=..., token: "...", etc.)

Structured tool call design

The stream parser handles three tool-call markup formats from model output:

XML tags (<function=name>, <parameter=key>value</parameter>) -- extracted by quick-xml in the text normaliser. The normaliser uses zero-regex string scanning (starts_with, contains, manual index arithmetic) to detect tag boundaries, then delegates structured extraction to quick-xml.
JSON tool calls -- parsed via serde_json from tool_calls arrays in chat-completion deltas. Streamed deltas accumulate into indexmap::IndexMap entries preserving insertion order.
Structured content blocks -- tool_use blocks with id, name, and input fields parsed from content-block deltas.

No regex is used in the streaming tool-call path. regex-lite is reserved for post-hoc processing of git output and secret redaction, never for real-time stream parsing.

Crate expansion decisions

The following crates appear in comparable open-source Rust CLI toolchains but are not yet in vexcoder's dependency tree. Each is either accepted for the next batch or rejected with rationale.

Accepted now means the design choice is settled in the repo. It does not mean the crate is added immediately without a live integration seam. vexcoder keeps dependency additions coupled to real code paths and tests so the tree does not accumulate unused crates.

Crate	Comparable CLI usage	vexcoder decision	Rationale
`bm25`	Text ranking for code search results	Next batch planned (ADR-033 Phase 5)	Ranked retrieval improves `codebase_search` relevance. Will sit behind the `aho-corasick` literal-match layer, not in the regex-lite text-processing layer.
`similar`	Diff algorithm for computing inline text diffs	Active (replaces `diffy`)	Generic diff algorithm now wired into `src/edit_diff.rs`. No branding dependency.
`which`	Locating executables on `$PATH`	Next batch planned	`git_rollup.rs` currently assumes `git` is on PATH. `which::which("git")` provides a clear error when git is missing.
`walkdir`	Recursive directory traversal	Design rejects	vexcoder uses `ignore` (from the ripgrep ecosystem) which already provides recursive traversal with `.gitignore` support. Adding `walkdir` would duplicate traversal logic. `ignore` is the conventional choice for git-aware CLI tools.
`notify`	Filesystem event watching	Next batch planned	Enables watch-mode for `git_rollup` to detect working-tree changes without polling. Will integrate with the existing `git_rollup.rs` orchestration layer.

Vexcoder-specific crates

The following crates are in vexcoder's tree but not in comparable CLI toolchains. Each serves a design need specific to vexcoder's architecture.

Crate	vexcoder usage	Why comparable CLIs omit it	Design rationale
`axum`	HTTP routing and handler composition for the local API server surface	Comparable CLIs may use a thinner direct HTTP surface or a different server seam.	`axum` is already the active server foundation in vexcoder; `tower-http` sits on top of it for request tracing, not in place of it.
`tower-http`	`TraceLayer` HTTP middleware for the local API server (`src/server/http.rs`)	Comparable CLIs use axum directly without tower middleware. vexcoder's `LocalApiServer` (ADR-026) requires request/response tracing for debugging multi-agent sessions.	Conventional for axum-based servers needing observability.
`fs2`	File-locking for `.vex/state/` durable writes	Comparable CLIs use a different persistence model.	Prevents concurrent vexcoder sessions from corrupting task-state files. `write_json_safe` uses temp+fsync+rename; `fs2` adds advisory locking as a second safety layer.
`portable-pty`	Pseudo-terminal allocation for sandboxed command execution	Comparable CLIs use platform-specific PTY code directly.	vexcoder's command runner needs PTY for interactive tool output (e.g., `git commit` with editor). `portable-pty` provides cross-platform PTY without platform-specific FFI.
`rmcp` (`1.2.x`)	MCP (Model Context Protocol) client for external tool providers	Comparable CLIs implement MCP transport directly using earlier transport library versions (e.g., pre-1.0).	vexcoder supports `[[mcp_servers]]` config for connecting to external tool providers (ADR-024 PM-01). vexcoder pins `rmcp` `1.2.x` to track the current stable MCP transport spec; the version boundary matters because the MCP wire protocol stabilized across the 1.x release series.
`quick-xml`	XML tool-call tag parsing from model output	Comparable CLIs use string-based parsing for tool calls.	vexcoder's stream parser delegates structured XML extraction to `quick-xml` rather than hand-rolling an XML parser. Conventional for XML processing in Rust.

Ongoing boundary work

The long-term architecture work is tracked in the ADR set under adr/.

ADR-025 defines the canonical machine-readable runtime request and event contract.
ADR-026 defines the proposed LocalApiServer transport binding over that contract.
ADR-028 is now active in the current tree: the facade helpers are stored under src/app/, transport code has been extracted from src/local_api.rs into src/server/ submodules (http.rs, sse.rs, socket.rs, handlers/mod.rs, handlers/session.rs, util.rs), and dependency-direction enforcement tests verify inward-only import rules across all layers, including grouped, multiline, and super::-relative crate::{server::...} / crate::{bin::...} imports.
ADR-029 is now accepted: the stream parser covers all documented SSE event types (error envelopes, heartbeats, thinking/signature deltas, citations, server-tool blocks, web-search results, cache/geo/detail metadata) and TaskState persists plan, session notes, context compaction records, and cache usage stats for multi-agent handoff. ADR-029 is a declared dependency of ADR-030 and a prerequisite for full invariant compliance — StreamEvent::Error lets orchestrating agents detect sub-agent stream failures, and the TaskState extensions are the handoff payload that lets an orchestrator reconstruct a sub-agent's context on resume.
ADR-030 is now accepted with an explicit six-point verification suite: provider events normalize into canonical runtime events, task state owns execution truth, the orchestrator decides whether the task continues or stops, and task handoff or resume consumers depend on that same runtime-owned control flow. ADR-030 is also load-bearing for multi-agent orchestration: Invariants 1, 4, and 5 are the semantic correctness guarantees that make agent handoffs coherent. Without these invariants proven end-to-end, multi-agent orchestration has undefined behaviour at handoff points.
ADR-031 extends the active operator surface with timeline selection, stable step identity, explicit approved/running/completed lifecycle rendering, prompt-anchored transcript scrolling, a larger multiline composer, direct ANSI task rendering during orchestration, and keyboard navigation for timeline selection and inspector detail. Each pending tool call carries a stable step_id and compact input preview. The task-state timeline still derives pending rows as AwaitingApproval, Approved, or Running from canonical state, and the Approved state is tracked for manual approvals, session auto-approvals, and capability-grant auto-approvals. Batches A through E are merged into main. Batch C/D implemented viewport alignment (output-pane scroll ownership and six-line inspector cap) across both the direct ANSI and ratatui renderers. The fullscreen composer now also auto-fits to current display row and column changes, including narrower half-screen or quarter-screen display snaps. Batch E removed the legacy activity_rows derivation, draw_timeline_fallback(), draw_legacy_activity_row(), and the legacy_row field from TaskStepView, and the current ANSI path renders those task-state updates as transcript paragraphs instead of reserving a dedicated top strip.
ADR-032 adds prompt-area interactivity: interactive / slash command picker and @path file picker with Up/Down/Enter/Esc navigation and hierarchical directory drill-down, !command shell execution, pasted-block handling, a responsive auto-fit composer surface that keeps those controls visible under display resize, and a context guard that limits project-instructions and notes token budgets.
ADR-033 introduces the hybrid retrieval context architecture: a codebase_search tool (Phase 1) backed by structural keyword indexing, optional semantic vector search via an external embedding endpoint (Phase 2), write guards that steer write_file toward apply_patch/edit_file for large files (Phase 3), and history condensing that compresses older tool results to stay within the context budget (Phase 4).
ADR-034 defines the proposed post-milestone multi-agent lane: worktree-isolated agent definitions, orchestrator-owned session-task lifecycle, /agents, /watch, and explicit session-task release surfaces, plus delegation-time concurrency and prompt-size enforcement built on the canonical ADR-025/ADR-030 contracts. The current hardening pass makes the delegation cap serialized, adds release-route and concurrency-stress coverage, and normalizes parent-task watch rollups onto the same lowercase status surface used by session tasks.
ADR-038 is now Accepted for memory-first TTFC work. Phase 1 is merged in-tree: context assembly reuses a bounded process-local cache for small file rollups, and automatic git status/diff capture is opt-in rather than mandatory. Phase 1a added search lane tightening (search config during index warmup, incremental refresh independence from auto_index). Phase 2 adds src/disk_policy.rs (DiskPermission enum, check_path classifier, VEX_DISK_POLICY env) and src/config/cache.rs (OnceLock-based Config::load_cached). Batch C extracted src/config/load.rs (1361 lines) into a directory module: src/config/load/paths.rs (path discovery), src/config/load/merge.rs (layer merge helpers), and src/config/load/parse.rs (enum + header parsing), with orchestration and tests retained in src/config/load/mod.rs. Batch D splits src/tools/operator.rs (865 lines) into src/tools/operator/mod.rs, core.rs, file_ops.rs, git_ops.rs, and search.rs, preserving behavior while isolating the later disk-policy enforcement seam. Batch E on PR #281 splits src/runtime/context_assembler.rs into src/runtime/context_assembler/mod.rs (orchestration + tests) and src/runtime/context_assembler/reads.rs (candidate-path extraction, rollup conversion, related-path inference). Batch F on the same PR adds enforce() / enforce_runtime() to src/disk_policy.rs, tests/disk_policy_tests.rs, make check-disk-policy, and the arch-contracts.yml CI step. Batch G (PR #282) adds src/tools/operator/policy.rs for operator-boundary disk-policy assertions, wires assert_durable_access() into TaskState::save() and TaskState::load(), and fixes cross-platform check_path() for Windows backslash separators. Batch H (PR #283) extracts src/runtime/task_state.rs (807 lines) into src/runtime/task_state/{mod.rs, persist.rs}, isolating all persistence logic (save/load, directory discovery, file listing, active summary reads) into a dedicated module. WAL evaluation concluded: not warranted because task-state saves are per-session and write_json_safe already performs crash-safe writes (temp + fsync + rename). ADR-038 is now Accepted with 0 remaining items.

The transport layer (src/server/) now reaches the runtime exclusively through the application facade (src/app/), and src/local_api.rs retains only the LocalApiMode / LocalApiFrontend runtime-mode bridge types.

Tool-Call Cutover

This note records the current tool-call and transcript rendering findings for the ratatui task surface, the deliberate cutover choices applied in PR 348, and the remaining architecture work after that cutover.

Current constraints

The ratatui task surface already keeps the composer pinned at the bottom edge. The remaining complexity is no longer the pane split; it is the live transcript state.

Today the transcript is assembled from three mutable sources:

history_state.lines for committed transcript paragraphs and tool rows.
current_turn_stream_segments for in-progress assistant text.
active_stream_blocks for typed block metadata and live cursor state.

That split means paragraph replacement has to keep multiple structures in sync whenever a pending tool preview turns into a completed tool-result paragraph. It also means the renderer has to infer one live transcript from several buffers instead of reading one canonical document.

Research summary

The attached tool-call research compared three approaches.

1. Keep the current split model and patch individual bugs

This is the lowest-disruption option, but it keeps the same root problem: scroll math, parser normalization, and paragraph replacement all remain spread across unrelated buffers.

2. Normalize streamed events into an intermediate adapter layer

This improves protocol coverage, but it still leaves paragraph assembly split between the adapter and the ratatui transcript state. It reduces duplication without removing it.

3. Move to a unified document model with a block-aware virtual viewport

This is the recommended direction. A single paragraph/block store becomes the source of truth for:

pending tool previews
completed tool results
final assistant text
waiting-state telemetry
wrapped-row viewport math

The viewport then consumes one ordered document instead of reconstructing rows from multiple mutable sources.

PR 348 cutover choices

PR 348 keeps the ratatui-native transcript surface and makes four explicit choices so the UI, parser, and API route all move in the same direction.

1. Viewport contract

The composer stays pinned to the bottom edge.
Short transcript bodies now start directly below the status row instead of being bottom-filled with blank space.
As new rows arrive, the transcript grows downward until it fills the body. Once the body is full, the live window follows the bottom and older rows scroll upward out of view.

2. Transcript rendering contract

Pending tool paragraphs still render directly into the transcript body instead of a separate timeline strip.
Completed tool-result replacement preserves scroll position by using the net transcript growth across the full replacement, not the height of the inserted paragraph alone.
Normalized StreamDelta text remains the single visible assistant-text path for downstream consumers. Textual StreamBlockDelta updates keep block identity and cursor metadata, but they do not form a second display-text stream.

3. API-route contract

The local API/runtime envelope is now transcript-first.
Plain StreamDelta text is normalized into synthetic final_text transcript blocks (transcript_block_start, transcript_block_delta, transcript_block_complete) instead of emitting a separate live assistant_delta / terminal assistant_message pair.
The assistant_delta and assistant_message events are removed. All downstream consumers must read transcript block events only.

4. Parser contract

Local text-protocol turns default to the hybrid parser chain.
Tagged <function=...> parsing stays the fast path.
Generic <tool_call>, <invoke>, and <tool_use> wrappers are accepted as fallback input, then normalized into the tagged text protocol for assistant history and the next tool round.

Next cutover

The next architecture step is to replace the split transcript state with one canonical task document. The API route has already cut over to the transcript-first shape; the remaining work is to make the in-process task state match that same model.

That cutover should:

Store pending tool previews, completed tool results, waiting rows, and assistant text as one ordered paragraph list.
Keep block identity stable so scroll math can reason about net insert, replace, and remove operations directly.
Let the ratatui viewport render wrapped display rows from that paragraph list without reconstructing state from history_state.lines, current_turn_stream_segments, and active_stream_blocks.
Remove the remaining split between history_state.lines, current_turn_stream_segments, and active_stream_blocks so the renderer and the runtime both consume one ordered document.

Until that larger cutover lands, the ratatui transcript path should continue to prefer paragraph-preserving repairs over additional side buffers.

Quick Start

This page gets you from clone to a running session in the fewest steps.

1. Build the binary

git clone https://github.com/aistar-au/vexcoder.git
cd vexcoder
cargo build --release

The binary will be at target/release/vex.

2. Create a workspace

./target/release/vex init

This scaffolds:

.vex/config.toml
.vex/validate.toml
AGENTS.md

3. Configure your model endpoint

Local example:

# .vex/config.toml
model_url = "http://localhost:8080/v1"
model_name = "local/default"
model_profile = "models/local-balanced.toml"

For a local Messages-v1 server, use plain HTTP unless you have explicitly configured TLS:

# .vex/config.toml
model_url = "http://localhost:8000/v1/messages"
model_name = "your-model-name"
model_profile = "models/local-balanced.toml"

Remote example:

# .vex/config.toml
model_url = "https://your-endpoint.example/v1/messages"
model_name = "your-model-name"
model_profile = "models/api-structured.toml"

Export a token only when the endpoint requires one:

export VEX_MODEL_TOKEN="your-token"

4. Start the interactive UI

./target/release/vex

5. Run one-shot or batch commands

One-shot plain text:

./target/release/vex -p "summarise this repository"

Batch mode:

./target/release/vex exec --task "review src/app.rs" --format jsonl

6. Verify the local gate

make gate-fast

The local pre-push hook also runs cargo nextest run, which uses nextest's default cross-platform concurrency. The CI workflow runs 8 parallel jobs with cargo registry and build-artifact caching.

Once inside an interactive session, the model can explore the codebase using codebase_search (for functions, types, and code patterns), list_files (for directory structure), list_dir (non-recursive directory listing), and glob_files (workspace-wide glob matching) before making targeted reads.

Configuration

VexCoder reads configuration from layered TOML files plus environment variables. The normal starting point is:

vex init

Resolution order

Highest priority wins:

Environment variables
Repo-local .vex/config.toml
User config: ~/.config/vex/config.toml or ~/.vex/config.toml
System config: /etc/vex/config.toml
Built-in defaults

VEX_MODEL_TOKEN is environment-only. It is never read from config files.

Automatic context assembly now keeps small file rollups in a process-local memory cache. Search indexes under .vex/index/ and task-state JSON under .vex/state/ remain the intended disk-backed layers.

Active config keys

These keys are read by the current runtime from config files:

Key	Purpose	Default
`model_url`	Model endpoint URL	`http://localhost:8080/v1`
`model_url_skip_tls_check`	Skip HTTPS certificate validation for the model endpoint	`false`
`model_name`	Model identifier	`local/default`
`working_dir`	Workspace root for tool execution	current directory
`model_backend`	`local-runtime` or `api-server`	inferred
`model_protocol`	`messages-v1` or `chat-compat`	inferred
`tool_call_mode`	`structured` or `tagged-fallback`	inferred
`model_profile`	Path to a repo-tracked profile under `models/`	backend default profile
`max_project_instructions_tokens`	Project instructions token budget	`4096`
`max_memory_tokens`	Notes token budget	`2048`
`sandbox`	Command sandbox driver: `passthrough`, `macos-exec`, or `container`	`passthrough`
`sandbox_profile`	Sandbox profile path or container image name	unset
`sandbox_require`	Abort startup instead of falling back to passthrough when the sandbox probe fails	`false`
`notes_path`	Notes file used by `/memory`	unset

notes_path is user-config only.

When model_profile is set, the runtime loads the profile at startup and uses its request parameters (temperature, top_p, max_tokens, stop sequences, reasoning budget, and structured-tool fallback). Relative paths are resolved from the workspace repo root when one is available, otherwise from the current working directory.

Tool-call formats

tool_call_mode controls how the runtime expects tool invocations to arrive from the model layer.

Mode	Meaning	Current parser boundary
`structured`	Prefer native structured tool calls from the backend	JSON tool-call arrays and content-block tool-use payloads are parsed via `serde_json`; streamed fragments keep insertion order with `indexmap`
`tagged-fallback`	Accept XML-like fallback tags from local runtimes that do not emit native structured deltas	Tagged `<function=...>` scanning remains the fast path, and the local-runtime fallback now defaults to a tagged-plus-XML parser chain that also accepts generic `<tool_call>` and `<invoke>` wrappers before normalizing them into the tagged text protocol

The runtime currently documents three structured tool-call shapes:

JSON tool_calls arrays from chat-completion style APIs.
Content-block tool_use records from block-oriented APIs.
XML-like fallback tags such as <function=name> and <parameter=key>.

These paths are distinct from regex-lite processing. regex-lite is used for git output parsing, secret redaction, and rate-limit extraction; it is not used for live tool-call parsing.

Feature config sections

`[compaction]`

Controls proactive conversation compaction. When enabled, the runtime compacts the conversation history when the estimated token count approaches the context budget, keeping recent turns verbatim and folding older context into a summary.

Key	Purpose	Default
`enabled`	Enable proactive compaction	`false`
`threshold_percent`	Compact when token usage exceeds this percentage of the context window (10--99)	`80`
`keep_recent_turns`	Number of most-recent turns kept verbatim after compaction (1--32)	`4`
`summary_max_tokens`	Maximum tokens for the compaction summary (64--4096)	`1024`

[compaction]
enabled = true
threshold_percent = 75
keep_recent_turns = 6

`[undo]`

Controls the in-memory checkpoint stack used by /undo.

Key	Purpose	Default
`enabled`	Whether `/undo` is available	`true`
`max_checkpoints`	Maximum checkpoints kept per session	`20`

[undo]
enabled = true
max_checkpoints = 30

`[search]`

Controls structural index builds and codebase_search behavior. When enabled = false, both codebase_search and /reindex are unavailable.

Key	Purpose	Default
`enabled`	Enable codebase search indexing	`true`
`auto_index`	Warm the structural index at interactive and batch session start	`true`
`exclude`	Workspace-relative path prefixes to exclude from indexing	`["target/", "node_modules/", ".git/"]`
`max_file_size`	Skip files larger than this byte count	`1048576` (1 MiB)

Incremental index updates triggered by file writes during a session always apply exclude and max_file_size filters regardless of the auto_index setting. auto_index only controls whether the index is pre-warmed at session startup.

exclude entries are literal workspace-relative prefixes, not glob patterns. Use trailing slashes for directory trees such as target/ or src/vendor/. Entries missing a trailing slash are automatically normalized at config load time (e.g. "src" becomes "src/").

[search]
enabled = true
auto_index = true
exclude = ["target/", "node_modules/", ".git/", "src/vendor/"]
max_file_size = 524288

`[auto_memory]`

Controls automatic memory extraction from assistant turns. When enabled, short factual notes are extracted after each turn and appended to the notes file with timestamped [auto] tags.

Key	Purpose	Default
`enabled`	Enable automatic extraction	`false`
`max_notes_per_turn`	Maximum notes extracted per turn (1--10)	`3`

[auto_memory]
enabled = true
max_notes_per_turn = 5

Environment variables

`VEX_MODEL_URL`

The full model endpoint URL.

URLs containing /chat/completions or ending in /v1 default to chat-compat.
Other URLs default to messages-v1.
For plain local inference servers, prefer explicit HTTP localhost URLs such as http://localhost:8000/v1/messages. If you enter an HTTPS localhost URL in the interactive startup prompt, vex now suggests the equivalent plain-HTTP localhost endpoint before the fullscreen session starts.
Same-machine local inference runtimes commonly expose only plain HTTP. That remains supported when you connect via localhost, 127.x.x.x, ::1, or 0.0.0.0. LAN-reachable model servers on RFC 1918 private addresses (192.168.x.x, 10.x.x.x, 172.16–31.x.x) and link-local addresses (169.254.x.x) are also allowed over plain HTTP. Only truly remote (public-internet) endpoints require HTTPS.
If a local endpoint returns HTTP 400 due to context overflow, the error now shows the server's message verbatim and suggests increasing --ctx-size on the server or using /compact to reset the conversation.
For non-context-overflow 400s, the error includes the detected protocol (MessagesV1 vs ChatCompat) and suggests checking the model name, protocol format, and whether the server supports streaming.

`VEX_MODEL_TOKEN`

Bearer token for authenticated endpoints.

`VEX_MODEL_URL_SKIP_TLS_CHECK`

Development-only escape hatch for HTTPS model endpoints with self-signed or otherwise non-system-trusted certificates.

Accepts true, false, 1, or 0.
Emits a startup warning on every launch when enabled.
Must not be committed in repo-local .vex/config.toml.

For any model endpoint outside local and private networks, HTTPS is mandatory. Plain http:// model URLs are rejected at startup for public-internet hosts so prompts, repository context, and model responses are not sent over unencrypted network paths. This rule does not block local inference servers reached via localhost, 127.x.x.x, ::1, 0.0.0.0, or RFC 1918 / link-local LAN addresses (192.168.x.x, 10.x.x.x, 172.16–31.x.x, 169.254.x.x). VEX_MODEL_URL_SKIP_TLS_CHECK only relaxes certificate verification for HTTPS endpoints; it does not permit plain HTTP for public-internet hosts.

`VEX_MODEL_NAME`

Model identifier sent to the API.

`VEX_MODEL_PROTOCOL`

Overrides protocol inference. Accepted values: messages-v1, chat-compat.

`VEX_MODEL_BACKEND`

Overrides backend inference. Accepted values: local-runtime, api-server.

`VEX_TOOL_CALL_MODE`

Overrides tool-call encoding. Accepted values: structured, tagged-fallback.

`VEX_TOOL_PARSER`

Overrides the local text-protocol parser chain. Accepted values: tagged, hybrid.

tagged keeps the zero-regex <function=...> and <parameter=...> fast path only.
hybrid keeps that fast path and falls back to quick-xml extraction for generic <tool_call>, <invoke>, and <tool_use> wrappers.

Local endpoints default to hybrid so XML-style tool wrappers still execute when the backend does not emit native structured tool deltas.

Example:

export VEX_TOOL_PARSER=tagged

`VEX_MODEL_PROFILE`

Selects a repo-tracked model profile such as models/api-structured.toml. An invalid or missing path is a startup failure.

`VEX_WORKDIR`

Overrides the working directory used for tool execution.

`VEX_MODEL_HEADERS_JSON`

Adds extra request headers as a JSON object.

Example:

export VEX_MODEL_HEADERS_JSON='{"X-Client-Id":"vexcoder"}'

`VEX_MAX_PROJECT_INSTRUCTIONS_TOKENS`

Overrides the project instructions token budget.

`VEX_MAX_MEMORY_TOKENS`

Overrides the notes token budget.

`VEX_CONTEXT_INCLUDE_GIT`

Opt in to automatic git status and diff injection during context assembly.

Accepts true, false, 1, 0, yes, no, on, or off.
Default: false.
Explicit git tools and review flows still call git directly; this flag only controls the automatic context path used before a normal model turn.

`VEX_CONTEXT_GIT_TIMEOUT_MS`

Controls the timeout used by context-related git commands.

Default: 2000.
Applies to automatic git context when VEX_CONTEXT_INCLUDE_GIT=1 and to the existing review helpers that call git through the shared runtime wrapper.

`VEX_DISK_POLICY`

Controls the disk-policy enforcement mode (ADR-038).

Accepted values: off, warn, strict.
Default: off.
When set to strict, forbidden disk access (anything outside .vex/index/ and .vex/state/) causes a panic. warn logs a warning instead.
Intended for CI gates; not typically set in interactive use.

`VEX_SANDBOX`

Selects the command sandbox driver. Accepted values: passthrough, macos-exec, container.

passthrough preserves the current process-spawn behavior.
macos-exec wraps commands with sandbox-exec on macOS.
container wraps commands with the installed container runtime and requires VEX_SANDBOX_PROFILE to name the container image.
The built-in macos-exec default is intentionally compatibility-first: it allows broad file access, network access, process spawning, IPC lookups, and signals so common development tools continue to work. Use a custom profile if you need stricter containment than process wrapping plus policy hooks.

`VEX_SANDBOX_PROFILE`

Optional sandbox driver parameter.

For macos-exec, this is a profile path. When unset, the runtime uses a built-in compatibility-focused policy string.
For container, this is the image name passed to the container runtime. Startup runs a short run --rm <image> true probe through that runtime so the selected image is validated before the first wrapped command.

`VEX_SANDBOX_REQUIRE`

Controls startup fallback when the selected sandbox probe fails.

Accepts true, false, 1, or 0.
When false, startup emits a warning and falls back to passthrough.
When true, startup aborts instead of running without containment.

`VEX_MAX_TOKENS`

Upper bound override for the per-turn generation budget. When set, the value is treated as the maximum max_tokens for a single turn. The runtime also polls the local inference server's context size at startup and derives an effective ceiling of 75% of n_ctx; the actual max_tokens sent is min(VEX_MAX_TOKENS, n_ctx × 0.75). When not set, the model profile's max_tokens value serves as the default, still bounded by the server cap. The runtime also derives per-file read limits and search result budgets from the effective token budget when explicit overrides are not set.

`VEX_MAX_COMMAND_OUTPUT_BYTES`

Maximum bytes kept in the accumulated stdout/stderr buffer returned to the model after a run_command tool call. The full output is always streamed to the TUI transcript. Default: 51200 (50 KiB).

`VEX_READ_FILE_MAX_LINES`

Maximum lines returned by the read_file tool when no explicit limit parameter is provided. When not set, derives from VEX_MAX_TOKENS: roughly 10% of the context budget at ~20 tokens per line.

Context budget	Auto-cap
4 K tokens	~50 lines
32 K tokens	~160 lines
128 K tokens	~640 lines
1 M+ tokens	up to 10,000 lines

The read_file tool also accepts offset (1-based line number) and limit parameters for targeted partial reads.

`VEX_DIFF_PREFERRED_ABOVE_LINES`

Line threshold above which write_file emits a warning suggesting apply_patch or edit_file instead. The model sees the warning in the tool result and is expected to switch strategy on the next attempt. Default: 200.

`VEX_WRITE_FILE_MAX_LINES`

Hard line limit for write_file. Calls exceeding this are rejected outright with an error directing the model to use apply_patch or edit_file. Default: 500.

`VEX_SEARCH_MAX_RESULTS`

Maximum number of results returned by the codebase_search tool. Default: 10.

`VEX_INDEX_MAX_FILES`

Maximum number of files indexed for semantic search. Default: 5000.

`VEX_EMBEDDING_PROVIDER`

Embedding provider for semantic search. Accepted values: compat (standard /v1/embeddings compatible endpoint) or native (single-text embedding endpoint). Semantic search is disabled when this variable is unset.

`VEX_EMBEDDING_MODEL`

Model identifier sent to the embedding endpoint. Required when VEX_EMBEDDING_PROVIDER is set.

`VEX_EMBEDDING_URL`

Base URL for the embedding endpoint. Required when VEX_EMBEDDING_PROVIDER is set.

`VEX_EMBEDDING_API_KEY`

Bearer token for authenticated embedding endpoints. Set this explicitly for the embedding endpoint when required; the runtime does not fall back to VEX_MODEL_TOKEN.

`VEX_EMBEDDING_BATCH_SIZE`

Number of texts sent per embedding API call. Default: 32.

`VEX_HISTORY_KEEP_TURNS`

Number of recent conversation turns kept at full fidelity. Older turns are condensed: tool results keep their first 5 lines plus a (N more lines) indicator, keeping the conversation within the context budget without losing the thread of earlier work. Default: 10.

`VEX_MCP_TIMEOUT`

MCP server connection timeout in seconds applied to every configured server at session start. Each server entry may also set timeout_secs in the config file; the per-server value takes priority over this environment variable. Range: 1–300. Default: 30.

`vex init` scaffold

vex init writes a commented config skeleton. It includes some reserved sections for future expansion.

The active runtime keys are the top-level keys listed above.
[[hooks]] is active today.
sandbox, sandbox_profile, and sandbox_require are active runtime features and apply to TUI, batch mode, inline !command, hooks, and validation subprocesses.
[[mcp_servers]] is active today. MCP servers are connected at session start, loaded from the user config layer, and merged into the runtime tool registry as mcp.<server>.<tool> names. Servers are explicitly shut down when the session ends (TUI exit, batch completion, or API server stop).
Commented [api] remains a scaffold placeholder in config files. VEX_API_* environment variables (transport, host, port, socket, key, protocol, TLS paths) are active and functional for API server configuration.
[[mcp_servers]] is rejected in repo-local and system config layers to avoid committed or machine-global auto-launch of arbitrary MCP processes.

MCP servers

Use [[mcp_servers]] only in the user config file. Each server is connected at session start; load failures abort startup instead of leaving a partial MCP registry in memory. Connected servers are explicitly cancelled at session end via McpRegistry::shutdown().

HTTP headers may be written literally, as bare ${NAME} references, or as templates that mix literal text with ${NAME} segments resolved from the current process environment.

[[mcp_servers]]
name = "docs"
transport = "stdio"
command = "npx"
args = ["-y", "@modelcontextprotocol/server-filesystem", "."]

[[mcp_servers]]
name = "remote"
transport = "http"
url = "https://mcp.example.internal/mcp"
timeout_secs = 60

[mcp_servers.headers]
Authorization = "Bearer ${VEX_MCP_AUTH}"

When MCP servers are loaded successfully:

/mcp list shows the current server inventory.
/mcp show <server> shows the tool names exported by one server.
/tools includes both built-in tools and MCP tools.

Minimal examples

Local endpoint:

model_url = "http://localhost:8080/v1"
model_name = "local/default"
model_profile = "models/local-balanced.toml"

Local Messages-v1 endpoint example:

model_url = "http://localhost:8000/v1/messages"
model_name = "your-model-name"
model_profile = "models/local-balanced.toml"

Remote endpoint:

model_url = "https://api.example.internal/v1/messages"
model_name = "repo-assistant"
model_profile = "models/api-structured.toml"

Token for authenticated endpoints:

export VEX_MODEL_TOKEN="your-token"

CLI and TUI Commands

This page documents the commands and flags implemented in the current binary.

CLI

`vex`

Starts the interactive full-screen CLI UI. While a task is running, the task surface uses the ratatui-native renderer for a human-readable header, optional changed-file row, a full-height transcript body above the composer, and a larger multiline composer. Tool calls, waiting-state telemetry, and assistant responses stream into transcript paragraphs on that shared body instead of a dedicated visible timeline strip. When completed turns record usage metadata, the header appends a compact ~N.Nk ctx cumulative session indicator. The prompt surface keeps active / command hints, active @path file suggestions, a current character count and focus marker in the composer header, submit-time @path expansion, pasted blocks, and multiline editing available in the same fullscreen layout. The composer auto-fits to the available display rows and columns as the window grows, shrinks, or snaps to smaller layouts, so the prompt surface reflows instead of holding onto a stale fixed-height block. For repo-overview prompts, the runtime now steers the model toward list_files at the workspace root or codebase_search before any targeted read_file; read_file itself requires an explicit non-empty path.

`vex --resume [task-id]`

Resumes a saved task. With no task id, VexCoder offers recent tasks for selection.

`vex -p "PROMPT"` or `vex --print "PROMPT"`

Runs one prompt turn and prints the result to stdout. If stdin is piped, the stdin content is prepended to the prompt.

`vex exec --task "TEXT"`

Runs a non-interactive batch task.

Useful flags:

--task-file PATH
--max-turns N
--auto-approve once|task
--format jsonl|text
--output PATH

Each JSONL turn record includes a tokens object with input, output, and estimated fields.

`vex doctor [--json]`

Runs a read-only environment health check. It validates config loading, checks model endpoint reachability, reports sandbox fallback status, probes configured MCP servers without starting them, inspects state-directory writability, and verifies that any present policy file parses cleanly.

Exit code is non-zero only when one or more checks fail. --json emits a JSON array of {check,status,message} objects.

`vex export <task-id> [--format jsonl|markdown] [--output PATH] [--force]`

Exports a saved task from .vex/state (or VEX_STATE_DIR).

jsonl matches the batch-turn schema used by vex exec
markdown omits full assistant response text and only includes tool outcomes
--output PATH writes to a file instead of stdout
--force allows overwriting an existing output file

`vex init [--dir PATH]`

Creates .vex/config.toml, .vex/validate.toml, and AGENTS.md without overwriting existing files.

`vex branch <name>`

Creates and switches to a new git branch from HEAD.

If a saved task state exists, VexCoder records the branch name on the most recent task file in .vex/state (or VEX_STATE_DIR).

`vex pr-summary`

Builds a diff from the current branch against the merge-base of the default remote branch (origin/HEAD) and runs one model turn to draft a PR title and body.

The result prints to stdout. The current template starts with a Title: line followed by a Markdown body, so you can review it locally or pipe it into your own git-hosting CLI workflow.

`vex migrate config [--output PATH]`

Writes a TOML fragment based on legacy environment variables.

`vex completions <bash|zsh|fish|powershell>`

Writes shell completion scripts to stdout.

`vex install-hooks` and `vex uninstall-hooks`

Installs or removes the repository prepare-commit-msg hook.

`vex skills list`

Lists installed skills.

`vex skills install SOURCE [--subdir PATH]`

Installs a skill from a git URL or tarball URL.

`vex skills remove NAME`

Removes an installed skill by name.

TUI slash commands

Commands entered inside the interactive UI start with /.

Session and task state

/new — save the current task and start a fresh session with a new task ID.
/resume [task-id] — restore a previously saved task. Lists recent tasks when no ID is given.
/compact — reset conversation history, turn evidence, and token counters while keeping the current task ID and permission grants. Use this to recover from context-window overflow or to free up context budget.
/fork [label] — save the current task and start a new task seeded with the same grants.
/undo — revert the last file-modifying tool call from the in-memory checkpoint stack. Binary-safe: restores raw bytes for text and binary files and removes rename destinations when applicable. Returns a diagnostic when the stack is empty or when undo is disabled via [undo] enabled = false.
/quit / /exit — end the session.
/about — show version and build info.

Memory

/memory
/memory add <note>
/memory clear
/memory auto on — enable automatic memory extraction for the current session. After each assistant turn, short factual notes are extracted and appended to the notes file with [auto] tags.
/memory auto off — disable automatic memory extraction for the current session.
/memory auto clear — remove all [auto]-tagged notes from the notes file.

Permissions

/permissions
/allow <capability> [once|session]
/deny <capability>

Model and diff helpers

/model
/model <name>
/diff
/diff --staged

Edit loop

/edit <instruction>
- Expands @path mentions inside the instruction before the edit loop starts so picked files can be inlined as context.
- Grants task-scoped write-file, apply-patch, and run-command permissions for the active edit workflow unless that capability is already session-scoped.
/fix
- Restores the edit loop from the last validation failure and re-seeds the same task-scoped edit permissions without narrowing existing session grants.

Read-only semantic turns

/explain [path]
- Accepts either a plain workspace-relative path or @path; @path is normalized to the requested file target before context assembly runs.
/review [--base <git-ref>] [--files <glob>] [<instruction>]
- Starts a single review turn without entering the edit loop.
- With no flags, reviews git diff HEAD.
- --base <git-ref> reviews git diff <git-ref> after validating the ref.
- --files <glob> assembles matching workspace files instead of a diff and cannot be combined with --base.
- Expands @path mentions inside the free-form review instruction before the review turn starts. When --files receives @glob, the leading @ is stripped before file matching.
- Patch requests are silently denied during the turn.
/plan <instruction>
- Generates a concise implementation plan for the given instruction.
- Assembles workspace context via ContextAssembler; renders plan_template.txt.
- Expands @path mentions inside the instruction before the plan turn starts.
- Never enters the edit loop; patch requests are silently denied during the turn.
/init [environment]
- Scaffolds .vex/config.toml, .vex/validate.toml, and AGENTS.md in the current workspace.
- Reports the selected environment label in the transcript when one is supplied.
/context
/mcp [list|show <server>]
- Zero-turn MCP inspection surface.
- /mcp and /mcp list show loaded servers, transports, and tool counts.
- /mcp show <server> lists the server's fully qualified mcp.<server>.<tool> names.
- If no servers are loaded, the transcript shows [mcp] no MCP servers loaded.
/tools [desc]
- Zero-turn tool inventory.
- Always shows built-in tools and retrieval/mutation guidance.
- Includes loaded MCP tools under a dedicated [tools:mcp] section.
- /tools desc adds one-line descriptions from the tool schemas.
/usage
/commands
/help

When a read-only turn asks for a repo summary instead of a specific file, the runtime prefers list_files and codebase_search first. If the model emits a read_file call without a concrete path, VexCoder returns a clarification instead of looping the raw tool error, even when the malformed read_file arrives in the same parallel tool round as other read-only calls.

/usage prints the most recent turn's token counts and the cumulative session totals. If the runtime does not return usage metadata, the values are estimated from character counts and marked (estimated). /new and /compact reset the session totals.

Test generation

/generate-tests [path] [--framework <name>]
- Starts a single semantic turn using the test-generation prompt template.
- Assembles context for the requested path, or the most recently assembled file when no path is provided.
- Only test-file mutations are allowed; source-file edits must use /edit.

Custom commands

/.vex/commands/*.toml
~/.config/vex/commands/*.toml
- Custom slash commands load at session start from project and user command directories.
- Project-scoped commands override user-scoped commands with the same name.
- Templates support {{context}} and {{input}} substitution.

Validation helpers

/run [command]
/test
- Run without starting a model turn.
- Command output is captured for the transcript, with per-command stdout, stderr, and exit status summarized after each command completes.
/reindex
- Rebuilds the codebase structural index in the background without blocking the TUI. Reports completion back to the transcript when finished.
- Refuses to run when [search].enabled = false.

Free-form input transforms

@path
- Expands a workspace-relative file or directory into the prompt when the turn is submitted.
- While composing, the prompt footer searches the entire repo tree, including nested subdirectories, ranks matches by basename and path relevance, and keeps a bounded top-ranked candidate set per keystroke instead of sorting the full workspace on every keypress.
- When a file mention is active, Up and Down move the suggestion picker through the full match list, Enter inserts the selected workspace-relative path into the composer, and Esc dismisses the picker so the raw mention can still be submitted unchanged.
- Files are inlined as fenced text blocks. Missing paths are annotated inline instead of aborting the turn.
- Directories render a compact workspace-relative listing.
- Slash commands with free-form instructions (/edit, /plan, /review) expand selected @path mentions before the model turn starts. /explain treats @path as the requested file target.
- Repo summaries still need tool evidence: use a plain prompt when you want the model to start with list_files or codebase_search, and use @path only when you already know the file or directory you want to inline.
!command
- Runs a shell command immediately from the workspace without starting a model turn when the composer is submitted.
- Uses the same run_command approval gate as tool calls.
- Starts a captured command session inside the managed TUI instead of yielding control back to the parent CLI session.
- The transcript records the command, PID, streamed output, and final [command session exit: N] status.

Tool inventory

The model can invoke the following tools during a turn. Read-only tools run without confirmation; mutating tools require operator approval (or a session/capability auto-approval grant).

Read-only tools

Tool	Purpose
`read_file`	Read file content from an explicit non-empty path. Accepts `offset` (1-based line) and `limit` for partial reads. For repo overviews, use `list_files` or `codebase_search` first.
`list_files`	List files and directories under a path, or the workspace root when omitted. Prefer this for initial repo exploration.
`list_directory`	Alias for `list_files`.
`search_files`	Search text across files and return matching lines.
`search`	Alias for `search_files`.
`find_files`	Find files by name pattern (glob) within the workspace.
`list_dir`	Non-recursive directory listing. Workspace-confined and `.gitignore`-aware. Optional `path` (defaults to workspace root); optional `max_entries` (default 200, hard cap 500).
`glob_files`	Workspace-wide glob matching. `.gitignore`-aware with bounded results. Required `pattern` (supports ``, `*`, `?`, `[abc]`, `[a-z]`, `[^x]`); optional `max_results` (default 50, hard cap 200).
`codebase_search`	Search the structural index for functions, types, and code patterns by name or keyword. Returns ranked code snippets with file paths and line numbers. When embeddings are configured, also performs semantic reranking. Prefer this over `read_file` for exploring unfamiliar code.
`git_status`	Show git repository status.
`git_diff`	Show git diff output.

Mutating tools

Tool	Purpose
`write_file`	Write full file content. Files above `VEX_DIFF_PREFERRED_ABOVE_LINES` (default 200) trigger a warning suggesting `apply_patch` or `edit_file`. Files above `VEX_WRITE_FILE_MAX_LINES` (default 500) are rejected.
`edit_file`	Replace one exact unique snippet (`old_str` → `new_str`). Preferred for targeted edits. Transcript previews keep multiline diff hunks so added and removed rows stay visible during review.
`apply_patch`	Apply full-file content as a patch. Preferred for large-scale changes where `edit_file` is impractical.
`rename_file`	Rename or move a file within the workspace.
`run_command`	Execute a shell command in the workspace.

Search ranking

codebase_search uses a Tree-sitter-based structural index that extracts functions, structs, enums, impls, traits, modules, constants, and type aliases from Rust source files. The index is built at session start and updated incrementally on file writes.

Results are scored by:

Exact name match: highest priority
Substring / fuzzy name match
Parent scope match
Content keyword match (per word)

Results are capped at VEX_SEARCH_MAX_RESULTS (default 10). When an embedding provider is configured (VEX_EMBEDDING_PROVIDER), results are additionally reranked by semantic similarity using the persisted vector index at .vex/index/.

Error handling

Context-overflow recovery

When the conversation exceeds the server's context window, VexCoder detects the overflow from the HTTP 400 response body and provides actionable guidance:

Local endpoints: suggests restarting the server with a larger context size (e.g. --ctx-size 8192) or using /compact to reset the conversation.
Remote endpoints: suggests using /compact to reset the conversation.

The server's error message is shown verbatim, capped at 300 characters.

For non-context-overflow HTTP 400 errors from local endpoints, the error includes the detected protocol (MessagesV1 vs ChatCompat) and suggests checking the model name, protocol format, and whether the server supports streaming.

Keyboard notes

Ctrl+C requests cancellation for the active turn.
Alt+Up and Alt+Down move the selected entry in the adaptive task timeline.
Tab and Shift+Tab also move timeline selection forward and backward while the task surface is active.
The visible timeline window scales with display height instead of staying fixed at six rows.
The composer auto-fits to the current display row and column budget, so snapping the display to half-screen or quarter-screen sizes reflows the prompt surface instead of overflowing or leaving empty space.
PageUp, PageDown, Ctrl+Up, and Ctrl+Down scroll the transcript/output pane upward from the prompt edge instead of moving the cursor.
Ctrl+Home jumps to the oldest visible transcript content, and Ctrl+End returns to the current bottom edge.
The transcript pane keeps the full session scrollback visible while follow mode is on; new model responses append at the bottom instead of replacing the prior response view.
Transcript scrolling follows wrapped display rows, so long paragraphs, embedded newlines, and multiline diff previews remain reachable in both fullscreen and fallback transcript views.
Selecting older timeline entries manually switches the output pane into inspector detail for that step until follow mode resumes.
Shift+Enter inserts a newline without submitting the turn.
Pasted text is inserted into the larger multiline prompt surface during normal editing.
The composer header shows a current focus indicator (focused / unfocused) and a character count that updates as you type.

Legacy Config Note

VexCoder keeps vex migrate config as a small compatibility helper for older local setups that still export legacy VEX_* values.

There is no separate migration workflow documented for normal installs. For current setup guidance, use the main docs instead:

If you do need the compatibility helper, run vex migrate config --help to see its current CLI surface.