Derek J. Russell drussell23

For the past 12 months, I have been executing a solo build of JARVIS — a three-repository, multi-process autonomous AI operating system spanning Python, C++, Rust, Go, Swift, Objective-C, and TypeScript. The system orchestrates 60+ asynchronous agents across a neural mesh, routes inference dynamically between local Apple Silicon and GCP, performs real-time voice biometric authentication, controls macOS at the native API level, and continuously trains its own models through a self-improving feedback loop.

Now Vibing

_{DS4EVER · Click to listen on Spotify}

Currently Building

Tech Stack

Languages

ML, Inference and Data

Infrastructure and Cloud

Backend and Frontend

Full Stack Inventory (text)

Category	Technologies
Languages	Python, C, C++, Rust, Go, Swift, Objective-C, Objective-C++, TypeScript, JavaScript, SQL, Shell/Bash, ARM64 Assembly (NEON SIMD), Metal Shading Language, AppleScript, Protobuf, HCL/Terraform, CUDA, HTML/CSS
ML / Inference	PyTorch, Transformers, llama.cpp, llama-cpp-python, GGUF quantization, ONNX Runtime, CoreML Tools, SpeechBrain, scikit-learn, SentenceTransformers, HuggingFace Hub, safetensors, tiktoken, Numba (JIT), sympy, LangChain, YOLO
Training	LoRA, DPO, RLHF, FSDP, MAML (meta-learning), curriculum learning, federated learning, causal reasoning, world model training, online learning, active learning, EWC
Models / Vision	LLaVA (multimodal), ECAPA-TDNN (speaker verification), Whisper (faster-whisper, openai-whisper), Porcupine/Picovoice (wake word), Piper TTS, OmniParser (OCR)
LLM APIs	Anthropic Claude API (chat, vision, computer use), OpenAI API (chat completions, embeddings), Google Gemini API, Ollama (local inference)
Rust	PyO3, ndarray, rayon, parking_lot, DashMap, crossbeam, serde, mimalloc, image crate, Metal (GPU compute), tokio, zstd, lz4, candle (on-device ML)
Swift / macOS	Swift Package Manager, CoreLocation, WeatherKit, AppKit, Foundation, Quartz/CoreGraphics, Accessibility API, AVFoundation, pyobjc, launchd, osascript, yabai
Vector / Data	ChromaDB, FAISS, Redis, PostgreSQL (asyncpg, psycopg2), SQLite (aiosqlite), NetworkX, bloom filters
Infrastructure	GCP (Compute Engine, Cloud SQL, Cloud Run, Secret Manager, Monitoring), Docker, docker-compose, Terraform, Kubernetes, systemd, CMake, pybind11, cpp-httplib
CI/CD	GitHub Actions (30+ workflows), CodeQL, Super-Linter, Dependabot, Gitleaks, Postman/Newman, git worktrees
Backend	FastAPI, uvicorn, uvloop, gRPC, Protobuf, asyncio, aiohttp, httpx, WebSocket, Cloud SQL Proxy, circuit breakers, exponential backoff, distributed locks, epoch fencing
Observability	OpenTelemetry (tracing + metrics + OTLP/gRPC export), Prometheus, structlog, psutil, Pydantic, JSONL telemetry pipeline, LangFuse, Helicone, PostHog
Frontend	React 19, Next.js, Framer Motion, Axios, WebSocket real-time streaming
Audio / Vision	OpenCV, sounddevice, PyAudio, webrtcvad (VAD), Silero VAD, speexdsp (AEC), librosa, pyautogui, CoreML VAD, Tesseract OCR
Voice / TTS	ElevenLabs, GCP TTS, Piper TTS, Edge-TTS, gTTS, pyttsx3, macOS Say, Wav2Vec2
C++ (ReactorCore)	Custom `mlforge` ML library: KD-trees, graph structures, trie, matrix ops, linear/logistic regression, decision trees, neural nets, model serialization, deployment API
AI Orchestration	LangChain, LangGraph, CrewAI, OpenHands, Open Interpreter, OmniParser
Experiment Tracking	Weights & Biases (wandb), TensorBoard
Browser Automation	Playwright, DuckDuckGo Search, Beautiful Soup
Quality / Linting	pytest, Ruff, Black, isort, Flake8, mypy, Pyright, Bandit, ESLint, pre-commit
Notifications	Discord, Slack, Telegram, SMTP/Email
External APIs	OpenWeather, Alpha Vantage, News API, Wikipedia API, Google Safe Browsing

AI Tools & Development

Full AI & Dev Tools Inventory

Category	Tools
LLM Platforms	Anthropic Claude (chat, vision, computer use), OpenAI (Whisper, embeddings), Google Gemini, Ollama, HuggingFace Transformers, llama.cpp (GGUF), Apple MLX, Candle (Rust ML), ONNX Runtime, CoreML
AI Development	Cursor IDE, Claude Code CLI, Claude GitHub Actions (5 workflows: PR analyzer, docs generator, test generator, security analyzer, auto-fix)
AI Orchestration	LangChain, LangGraph, CrewAI (multi-agent), OpenHands (coding assistant), Open Interpreter, OmniParser (vision parsing)
Experiment Tracking	Weights & Biases (wandb), TensorBoard, LangFuse (LLM observability), Helicone (LLM cost tracking), PostHog (product analytics)
Voice & Audio	OpenAI Whisper, Faster-Whisper, SpeechBrain, Wav2Vec2, ElevenLabs TTS, GCP TTS, Piper TTS, Edge-TTS, gTTS, pyttsx3, Picovoice/Porcupine (wake word), WebRTC VAD, Silero VAD, CoreML VAD
Browser Automation	Playwright, DuckDuckGo Search, Beautiful Soup, Google Safe Browsing API
Testing & Quality	pytest, Ruff, Black, isort, Flake8, mypy, Pyright, Bandit, ESLint, Super-Linter, CodeQL, Dependabot, Gitleaks, Postman/Newman, pre-commit hooks
Notifications	Discord, Slack, Telegram, SMTP/Email (Gmail)
External Data APIs	OpenWeather, Alpha Vantage (stocks), News API, Wikipedia API, Google NotebookLM

Demo

_{JARVIS Boot Sequence Interface · System startup telemetry and cognitive engine warmup state}

Data Structures & Algorithms

Every component below is production code running in the JARVIS ecosystem — not academic exercises.

Data Structures (50+ types)

Category	Structures	Implementation
Trees	Quadtree (spatial indexing), KD-Tree (nearest neighbor + radius search), Trie (prefix search), DAG (startup dependency graph), Scene Graph, Knowledge Graph, Process Tree	Python + Rust + C++
Graphs	Reasoning Graph, Dependency Graph, Multi-Space Context Graph, Window Relationship Graph, Service Mesh Discovery Graph, LangGraph state machines, Causal Graphs (do-calculus)	Python
Hash-Based	Bloom Filters (3 languages), LSH Semantic Cache, LRU Cache, TTL Cache, Consistent Hashing, DashMap (lock-free concurrent), Bitmaps/Bitsets	Python + Rust + Swift
Heaps & Queues	Binary Heap (heapq), Priority Queue, Bounded Queue, Ring Buffer, Circular Buffer, Work-Stealing Queue, Zero-Copy IPC (mmap), Lock-Free SPSC Queue	Python + Rust + JS
Concurrent	Arc<Mutex<>>, RwLock, DashMap, mpsc channels, Vector Clock, CRDT, Distributed Lock, asyncio.Queue	Rust + Python
Matrices & Tensors	Matrix2D, Matrix3D (row-major), Sparse Matrices (nalgebra-sparse), PyTorch Tensors, Quantized Tensors (INT8/INT4), Embedding Vectors	Rust + C++ + Python
Memory	Memory Pool, Slab Allocator, Zero-Copy Buffers, Object Recycler, mmap Ring Buffers	Rust + Python
State	Finite State Machine, Event Bus, Event Store, Sliding Window, Bounded Collections	Python

Algorithms (80+ implementations)

Category	Algorithms	Where
Resilience	Circuit Breaker (5 variants), Exponential Backoff w/ Jitter, Graceful Degradation, Self-Healing, Leader Election, Distributed Locking, Distributed Transactions, Distributed Dedup	JARVIS + Prime
Scheduling	Round Robin, Token Bucket, Leaky Bucket, Sliding Window Rate Limiter, Work Stealing, Backpressure Control, Adaptive ML-Based Rate Limiting	All three repos
Graph / Search	Topological Sort (DAG), BFS/DFS, A* Search, Dijkstra's Shortest Path, K-Nearest Neighbor, PageRank (file importance ranking)	All three repos
Statistical / Bayesian	Bayesian Inference (Beta-Bernoulli, Normal-Normal posteriors), Bayesian Confidence Fusion, Multi-Armed Bandit (Thompson Sampling, epsilon-greedy), Monte Carlo Validation, Kalman Filter (RSSI smoothing), Markov Chain Prediction	JARVIS + Prime
ML Training	LoRA/QLoRA, DPO (preference optimization), RLHF (PPO pipeline), FSDP (parameter sharding), MAML/Reptile (meta-learning), Federated Learning (FedAvg, FedProx, Byzantine-robust), Curriculum Learning, Causal Reasoning (do-calculus), Online Learning w/ EWC, World Model Training (Dreamer/MuZero-inspired), Knowledge Distillation (Hinton, FitNets, attention transfer, multi-teacher), Gradient Accumulation, Mixed Precision (BF16/FP16)	ReactorCore + Prime
ML Inference	Quantized INT8/INT4, Cosine Similarity, LSH, Vector Search, Anomaly Detection, Pattern Recognition, Goal Inference, Activity Recognition, Tiered Complexity Routing, Flash Attention	JARVIS + Prime
Neural Networks	Multi-Head Self-Attention, Dropout, BatchNorm, LayerNorm, LSTM + Attention, Feedforward w/ Backpropagation, Cognitive Layers (cross-attention + residual)	All three repos
Clustering & Reduction	K-Means, DBSCAN, PCA, Truncated SVD, TF-IDF Vectorization	JARVIS + Reactor
Ensemble Methods	Random Forest, Gradient Boosting, Isolation Forest, Ensemble STT (multi-model voting), Weighted Model Ensemble (majority/cascade)	JARVIS + Reactor
Signal Processing	VAD (WebRTC + Silero + CoreML), MFCC/Mel Filterbanks, Spectrogram, Anti-Spoofing, Barge-In Detection, ECAPA-TDNN Speaker Verification	JARVIS
Compression	Zstd, LZ4, Gzip/Zlib, Custom Vision Compression	Rust + Python
Cryptography	HMAC, SHA-256, MD5, JWT, Secure Password Hashing, File Integrity Checksums, Checkpoint Verification	All three repos
Caching	LRU Eviction, TTL Eviction, Predictive Cache Warming (EWMA + time-series), LSH Semantic Cache, Bloom Filter Negative Cache, Memoization (lru_cache)	All three repos
Evolutionary	Genetic Algorithm (Ouroboros self-programming loop — B+ branch-isolated sagas, v262.0 fully activated)	JARVIS
Concurrency	Deadlock Prevention, CPU Affinity Pinning, Parallel DAG Initialization, Zero-Copy mmap IPC, Lock-Free Channels	JARVIS + Prime
GPU / SIMD	Metal Compute Shaders, ARM64 NEON SIMD Intrinsics	JARVIS (Rust + C + Assembly)
C++ ML (mlforge)	Linear Regression (Ridge/Lasso), Logistic Regression, Decision Tree (Gini), Neural Net (backprop), Matrix Serialization, KD-Tree, Graph (BFS/DFS), Trie	ReactorCore

GitHub Stats

Metrics Dashboard

The JARVIS Ecosystem

JARVIS is not a chatbot wrapper. It is a distributed AI operating system composed of three interdependent repositories — each a standalone production system, together forming a self-improving autonomous intelligence.

Hero Architecture (TL;DR)

Single command control plane: python3 unified_supervisor.py boots Body, Mind, and Forge with deterministic lifecycle ownership
Trinity operating model: JARVIS executes, JARVIS-Prime reasons/routes, ReactorCore trains and redeploys
Reliability-first inference: policy-based failover from GCP golden image to local Apple Silicon to API fallback
Closed learning loop: runtime telemetry flows to Reactor training, then gated deployment returns improved models to Prime
Native autonomy stack: async agent mesh, Google Workspace workflows, voice biometrics, and vision-driven macOS control
Safety by design: policy gates, contract checks, kill-switch controls, circuit breakers, and probation-based rollback

flowchart TD
    K["UNIFIED SUPERVISOR<br/>single control plane"] --> B["JARVIS (Body)<br/>agents + tools + execution"]
    K --> P["JARVIS-Prime (Mind)<br/>routing + reasoning"]
    K --> R["ReactorCore (Forge)<br/>training + deployment gates"]

    B <--> P
    P --> R
    R --> P
    B --> R

    P --> T1["Tier 1: GCP Golden Image"]
    T1 -->|"degraded"| T2["Tier 2: Local Apple Silicon"]
    T2 -->|"degraded"| T3["Tier 3: API Fallback"]

    R --> G["Gate + Probation"]
    G -->|"pass"| P
    G -->|"fail"| RB["Rollback"]

Triple Authority Resolution — Status Overview

Three repos previously made independent lifecycle decisions (restart/health/kill), which created restart storms, readiness split-brain, and contract drift. This architecture is now unified under a single root authority model.

flowchart TD
    U["UNIFIED SUPERVISOR<br/>Root Control Plane"] --> W["RootAuthorityWatcher<br/>Policy Brain"]
    U --> O["ProcessOrchestrator<br/>Execution Plane"]
    O --> P["JARVIS-Prime<br/>managed mode"]
    O --> R["Reactor-Core<br/>managed mode"]

    W -->|LifecycleVerdict| O
    O -->|ExecutionResult| W
    P -->|health + drain contract| W
    R -->|health + drain contract| W

    W --> H{"Handshake Gate"}
    H -->|"schema N/N-1 + capability hash pass"| READY["ALIVE/READY"]
    H -->|"contract mismatch"| REJECT["REJECTED"]

    W --> E["Escalation Engine"]
    E --> D["drain"]
    E --> T["SIGTERM"]
    E --> K["process-group SIGKILL"]

What we built (21 tasks, 5 waves, 3 repos)

Wave 0 — Foundation types: canonical lifecycle contracts (LifecycleAction, SubsystemState, ProcessIdentity, LifecycleVerdict, policy/timeout structures) + managed-mode contract + golden conformance tests
Wave 1 — Root authority watcher: lifecycle state machine ownership, verdict emission, incident dedup, and policy/execution separation via VerdictExecutor
Wave 2 — Prime/Reactor conformance: managed-mode behavior (JARVIS_ROOT_MANAGED), health envelope enrichment, authenticated /lifecycle/drain
Wave 3 — Orchestrator integration + shadow mode: ProcessOrchestrator adapter methods wired; active crash watch (proc.wait) + jittered health polling
Wave 4 — Activation hardening: active verdict dispatch, contract hash gating at boot handshake, policy delegation hooks for restart/health ownership

What this resolved

Restart storms: single restart policy with budgeted windows and deduplication
Readiness split-brain: unified two-field liveness/readiness state ownership
Contract drift: cross-repo managed-mode parity with conformance tests and compatibility gates
Crash blind spots: ms-latency process-exit detection plus health-path observability
Competing supervisors: Prime/Reactor demoted to managed mode while root authority owns lifecycle decisions
Escalation ambiguity: deterministic kill ladder (drain -> SIGTERM -> process-group SIGKILL)
PID reuse risk: identity validation strengthened via multi-factor ProcessIdentity
Control-plane auth gaps: HMAC-authenticated lifecycle commands and session-aware checks

Production rollout path (remaining ops work)

Shadow soak: run in shadow mode and verify decision parity against legacy behavior
Per-subsystem activation: promote one subsystem at a time (reactor-core then jarvis-prime)
Final policy cut-wire: fully bypass legacy autonomous monitor decisions when delegation flags are enabled
CI anti-drift: enforce cross-repo parity checks for managed-mode contract files on every PR

Hidden profile bullet packs (copy-ready)

Ultra-short TL;DR

Triple Authority Fixed: one root control plane governs restart/readiness/lifecycle
Safe by Contract: managed-mode + authenticated lifecycle endpoints + handshake gating
Staged Rollout: shadow parity -> subsystem activation -> full active cutover

Recruiter-friendly

Architecture leadership: unified three competing supervisors into one production control plane
Reliability outcome: removed restart storms and readiness split-brain via centralized lifecycle policy
Security hardening: added authenticated lifecycle controls and contract-gated activation
Operational rigor: designed staged rollout for safe production adoption

Infra-architect

Control-plane convergence: root watcher owns lifecycle state transitions across Body/Prime/Reactor
Policy/execution isolation: watcher emits verdicts; orchestrator executes side effects
Deterministic escalation: bounded drain -> term -> group-kill with race-safe identity checks
Protocol hardening: schema/capability handshake gates + managed-mode health/drain envelopes
Progressive activation: shadow validation, per-subsystem enablement, legacy path retirement

Disease 1: God File / Monolith Paradox

unified_supervisor.py grew into a ~96K-line orchestration monolith with multiple high-impact domains in one file. The risk is not just size; it is coupling density: local edits can create non-local regressions.

flowchart TD
    E["Single Entry Point<br/>python3 unified_supervisor.py"] --> S["Kernel Shell (thin)"]
    S --> R["Domain Controller Registry"]

    R --> L["Lifecycle Controller"]
    R --> H["Health Controller"]
    R --> W["Workflow Controller"]
    R --> M["Resource Controller"]
    R --> X["Self-Healing Controller"]
    R --> A["AGI/Training Controller"]

    L --> C["Contract Boundaries<br/>typed interfaces + DTOs"]
    H --> C
    W --> C
    M --> C
    X --> C
    A --> C

    C --> T["Isolated Domain Tests"]
    C --> O["Cross-Domain Observability"]

Why this is dangerous

Reasoning collapse: too many orthogonal responsibilities in one file
Test isolation gap: difficult to unit-test a single subsystem without broad kernel context
High merge friction: concentrated edit surface increases conflict rate
Refactor risk: tooling and human review quality degrade as coupling grows
Mandate conflict: monolith bottleneck violates "no single structural choke point"

Structural cure path

Preserve single boot command while shrinking policy from the shell
Extract domain controllers behind protocol boundaries
Replace direct cross-calls with typed contract interfaces
Enforce isolation tests per domain before integration tests
Ship in waves with parity gates to avoid behavioral drift

Hidden profile bullets (copy-ready)

Ultra-short TL;DR

Monolith Risk Neutralized (in progress): convert a 96K-line supervisor choke point into contract-bounded controllers
Single Entry Point Preserved: one boot command, modular internals
Safer Evolution: isolation tests + parity-gated extraction waves

Recruiter-friendly

Architecture insight: identified the monolith paradox as the largest systemic reliability and velocity risk
Execution strategy: designed a phased decomposition that keeps runtime stable while reducing coupling
Engineering rigor: paired extraction with contract boundaries and isolation testing to prevent regressions

Infra-architect

Kernel shell model: retain entrypoint authority but move domain policy to controller registry
Protocol-first decomposition: typed interfaces replace direct cross-domain invocation
Risk-managed migration: parity validation, observability gates, and staged rollout per domain

System Architecture

Purpose, Problem, Challenge, Solution

Purpose: Define the three-system operating model (JARVIS, JARVIS-Prime, ReactorCore) under one unified kernel.
Problem: Most AI systems stop at a single model endpoint and fail at end-to-end autonomy, coordination, and lifecycle management.
Core Challenge: Keep orchestration, inference, and training decoupled enough to scale independently while still behaving like one product.
What This Solves: Creates a durable systems contract: JARVIS runs operations, Prime serves intelligence, Reactor continuously improves intelligence.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'primaryBorderColor': '#70a5fd', 'lineColor': '#545c7e', 'secondaryColor': '#24283b', 'tertiaryColor': '#1a1b27', 'fontSize': '14px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart TD
    KERNEL["<b>UNIFIED SUPERVISOR KERNEL</b><br/>Single Entry Point · 50K+ LOC<br/>7-Zone Parallel Initialization"]

    KERNEL -->|"orchestrates"| JARVIS
    KERNEL -->|"routes inference"| PRIME
    KERNEL -->|"triggers training"| REACTOR

    subgraph JARVIS["<b>JARVIS — The Body</b> &nbsp; Python / Rust / Swift &nbsp; :8010"]
        direction TB
        J1["🕸️ Neural Mesh<br/><i>16+ async agents · capability routing</i>"]
        J2["🎙️ Voice & Auth<br/><i>ECAPA-TDNN · full-duplex · wake word</i>"]
        J3["👁️ Vision & Spatial<br/><i>LLaVA · YOLO · Ghost Display · OCR</i>"]
        J4["🍎 macOS Native<br/><i>Swift 203 files · ObjC · Rust · CoreML</i>"]
        J5["🧠 Intelligence<br/><i>RAG · Ouroboros · Google Workspace</i>"]
    end

    subgraph PRIME["<b>JARVIS-Prime — The Mind</b> &nbsp; Python / GGUF &nbsp; :8000-8001"]
        direction TB
        P1["📡 Task-Type Router<br/><i>11 specialist models · 40.4 GB</i>"]
        P2["⚡ Neural Switchboard<br/><i>v98.1 · WebSocket contracts</i>"]
        P3["👁️ LLaVA Vision Server<br/><i>multimodal · OpenAI-compatible API</i>"]
        P4["💭 Reasoning Engine<br/><i>CoT / ToT / self-reflection</i>"]
        P5["📊 Telemetry Capture<br/><i>JSONL · deployment feedback loop</i>"]
    end

    subgraph REACTOR["<b>ReactorCore — The Forge</b> &nbsp; C++ / Python &nbsp; :8090"]
        direction TB
        R1["🔥 Training Pipeline<br/><i>LoRA · DPO · RLHF · FSDP</i>"]
        R2["🚪 Deployment Gate<br/><i>integrity validation · probation monitor</i>"]
        R3["🧬 Model Lineage<br/><i>full provenance chain · append-only JSONL</i>"]
        R4["☁️ GCP Spot Recovery<br/><i>checkpoint persistence · 60% cost savings</i>"]
        R5["⚙️ C++ Kernels<br/><i>CMake · pybind11 · native performance</i>"]
    end

    PRIME -.->|"telemetry + experiences"| REACTOR
    REACTOR -.->|"improved GGUF models"| PRIME
    JARVIS <-.->|"inference requests / responses"| PRIME
    REACTOR -.->|"training signals"| JARVIS

    style KERNEL fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style JARVIS fill:#0d1117,stroke:#70a5fd,stroke-width:2px,color:#a9b1d6
    style PRIME fill:#0d1117,stroke:#bf91f3,stroke-width:2px,color:#a9b1d6
    style REACTOR fill:#0d1117,stroke:#bb9af7,stroke-width:2px,color:#a9b1d6

Data Flow

Purpose, Problem, Challenge, Solution

Purpose: Show the runtime request path from multimodal inputs to routed inference and back to user-visible action.
Problem: Input streams (voice, screen, command) are heterogeneous and require different model strategies and latencies.
Core Challenge: Route by task type in real time while capturing high-quality telemetry for future model improvement.
What This Solves: Demonstrates a closed execution path where each response both serves the user now and improves the system later.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    A["🎤 Voice Input"] --> B["JARVIS Kernel"]
    C["👁️ Screen Capture"] --> B
    D["⌨️ User Command"] --> B
    B --> E["JARVIS-Prime<br/><i>inference routing</i>"]
    E --> F{"Task Type?"}
    F -->|"math"| G["Qwen2.5-7B"]
    F -->|"code"| H["DeepCoder"]
    F -->|"vision"| I["LLaVA"]
    F -->|"simple"| J["Fast 2.2GB"]
    F -->|"complex"| K["Claude API"]
    G & H & I & J & K --> L["Response"]
    L --> B
    E -->|"telemetry"| M["ReactorCore"]
    M -->|"LoRA/DPO training"| N["Improved Model"]
    N -->|"deploy + probation"| E

    style B fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style E fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style M fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style F fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6

Three-Tier Inference Routing

Purpose, Problem, Challenge, Solution

Purpose: Define a deterministic fallback ladder for reliability under changing infrastructure and hardware conditions.
Problem: A single inference backend is a single point of failure (downtime, cold starts, local resource pressure, API outages).
Core Challenge: Preserve quality and uptime while controlling cost and avoiding hard dependency on any one execution tier.
What This Solves: Guarantees service continuity through policy-based failover: GCP -> Local Metal -> Claude API.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    REQ["Inference Request"] --> T1
    T1["☁️ Tier 1: GCP Golden Image<br/><i>11 models · ~30s cold start</i>"]
    T1 -->|"unavailable"| T2["💻 Tier 2: Local Apple Silicon<br/><i>M1 Metal GPU · on-device</i>"]
    T2 -->|"resource constrained"| T3["🔑 Tier 3: Claude API<br/><i>emergency fallback</i>"]
    T1 -->|"✅ success"| RES["Response"]
    T2 -->|"✅ success"| RES
    T3 -->|"✅ success"| RES

    style T1 fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style T2 fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style T3 fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style REQ fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6
    style RES fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6

Trinity Autonomy Wiring (Phase 2)

Purpose, Problem, Challenge, Solution

Purpose: Wire autonomy lifecycle events through the Trinity loop so the system can learn from its own autonomous actions.
Problem: JARVIS Body performs autonomous actions (Google Workspace agent) but the outcomes are not captured as structured training signals.
Core Challenge: Events must be strictly validated, deduplicated, and classified before reaching the training pipeline — malformed or replayed events would corrupt model weights.
What This Solves: Creates a closed feedback loop where autonomous actions generate training data, improving future autonomy decisions without manual intervention.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart TD
    AGENT["🤖 Google Workspace Agent<br/><i>execute_task()</i>"]

    AGENT -->|"7 event types"| EMIT["📡 _emit_autonomy_event()<br/><i>strict metadata schema</i>"]
    EMIT -->|"token-bucket<br/>rate limiter"| FWD["🔀 CrossRepoExperienceForwarder<br/><i>forward_autonomy_event()</i>"]

    FWD -->|"ExperienceEvent<br/>(type=METRIC)"| ING["🔬 AutonomyEventIngestor"]

    ING --> V{"Validate<br/>7 required keys?"}
    V -->|"❌ malformed"| Q["🗃️ Quarantine<br/><i>disk-based · 7d retention</i>"]
    V -->|"✅ valid"| D{"Deduplicate<br/>composite key?"}
    D -->|"duplicate"| SKIP["⏭️ Skip"]
    D -->|"unique"| CLS["🏷️ AutonomyEventClassifier"]

    CLS -->|"committed / failed"| TRAIN["🔥 UnifiedPipeline<br/><i>DPO / LoRA training</i>"]
    CLS -->|"infrastructure /<br/>excluded"| EXCLUDE["📊 Metrics Only<br/><i>no training</i>"]

    AGENT <-.->|"autonomy_policy /<br/>action_plan"| PRIME["💭 JARVIS-Prime<br/><i>policy gate</i>"]

    SUP["🛡️ Supervisor Boot"] -->|"check_autonomy_contracts()"| COMPAT{"Schema<br/>Compatible?"}
    COMPAT -->|"✅ pass"| FULL["Full Autonomy Mode"]
    COMPAT -->|"❌ mismatch"| RO["Read-Only Mode"]

    style AGENT fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style PRIME fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style ING fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style TRAIN fill:#1a1b27,stroke:#9ece6a,stroke-width:2px,color:#9ece6a
    style Q fill:#1a1b27,stroke:#f7768e,stroke-width:2px,color:#f7768e
    style SUP fill:#1a1b27,stroke:#e0af68,stroke-width:2px,color:#e0af68

How it works:

Body emits 7 canonical events — Every autonomous action (email send, calendar create, doc edit) emits a lifecycle event: intent_written (about to execute), committed (success), failed (error), policy_denied (blocked by Prime), deduplicated (suppressed duplicate), superseded (stale intent), no_journal_lease (fail-closed safety)
Strict metadata schema — Each event carries 7 required keys (autonomy_event_type, autonomy_schema_version, idempotency_key, trace_id, correlation_id, action, request_kind). Malformed events are quarantined to disk, never silently coerced
Token-bucket rate limiter — Prevents replay storms during startup reconciliation (default: 50 events/second)
Effectively-once semantics — Deduplication by composite key (idempotency_key, autonomy_event_type, trace_id) with a 50K sliding window
Centralized classification — AutonomyEventClassifier is the single source of truth: only committed and failed are trainable; infrastructure events are excluded from training but retained for observability
Boot contract validation — Supervisor checks schema version compatibility across all three repos at startup. Any mismatch degrades to read-only autonomy mode (no autonomous writes)
Prime as policy gate — Body attaches autonomy_policy (allowed/denied actions, risk thresholds) to commands; Prime validates and returns structured action_plan with policy_compatible flag

Ouroboros — Autonomous Self-Development (v262.0 B+)

Purpose, Problem, Challenge, Solution

Purpose: Enable JARVIS to autonomously detect, generate, validate, and apply code improvements across all three repos (JARVIS, JARVIS-Prime, Reactor-Core) in real time — without human intervention.
Problem: Cross-repo code applies without isolation are dangerous: partial failures leave repos in inconsistent states, no rollback exists, TARGET_MOVED (another commit landing mid-apply) goes undetected, and forensics branches are lost on failure.
Core Challenge: Production-grade saga apply safety across three independent git repos — ephemeral branch isolation, deterministic lock ordering, ff-only promote gates, and bounded passive observability — all without changing the external execution contract.
What This Solves (v262.0 B+): Full activation of the autonomous self-development loop with B+ branch-isolated sagas, passive SagaMessageBus observer, TestFailureSensor with real polling watcher, and all 4 P0 config blockers resolved. JARVIS_SAGA_BRANCH_ISOLATION=true + JARVIS_GOVERNANCE_MODE=governed = fully operational.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart TD
    subgraph INTAKE["Zone 6.9 — Intake Layer (per repo × 3)"]
        B["📋 BacklogSensor<br/><i>polls .jarvis/backlog.json · 30s</i>"]
        T["🧪 TestFailureSensor + TestWatcher<br/><i>pytest subprocess · streak ≥ 2 · 300s</i>"]
        M["⛏️ OpportunityMiner<br/><i>complexity ≥ 10 · 300s</i>"]
        V["🎤 VoiceCommandSensor<br/><i>event-driven · always on</i>"]
    end

    subgraph GLS["Zone 6.8 — Governed Loop Service"]
        Q["📥 UnifiedIntakeRouter<br/><i>dedup · priority · human-ack</i>"]
        FSM["🔄 PreemptionFsmEngine<br/><i>IDLE→ACTIVE→PAUSED→TERMINAL</i>"]
        ORCH["🎯 Orchestrator<br/><i>CLASSIFY→ROUTE→EXPAND→GENERATE→VALIDATE→GATE→APPLY→VERIFY→COMPLETE</i>"]
        BUS["📡 SagaMessageBus<br/><i>passive observer · max 500 msgs · TTL 300s</i>"]
    end

    subgraph SAGA["B+ Saga Apply (branch_isolation=True)"]
        PRE["1. Preflight: assert clean worktree"]
        BR["2. Create ouroboros/saga-&lt;op_id&gt;/&lt;repo&gt;"]
        AP["3. Apply patch + git commit"]
        LOCK["Two-Tier Lock:<br/>asyncio.Lock + fcntl.flock<br/><i>sorted order: jarvis → prime → reactor</i>"]
        PROM["4. promote_all()<br/><i>check_promote_safe → git merge --ff-only</i>"]
        COMP["5. On failure: _bplus_compensate_all()<br/><i>restore original_ref · keep forensics branch</i>"]
    end

    subgraph JPRIME["GCP J-Prime (Golden Image · 136.113.252.164:8000)"]
        GEN["🧠 Code Generation<br/><i>schema 2c.1 · multi-repo patches</i>"]
        NOOP["⚡ Noop Fast-Path<br/><i>2b.1-noop → GENERATE→COMPLETE</i>"]
    end

    B & T & M & V --> Q
    Q --> FSM --> ORCH
    ORCH -->|"GENERATE"| JPRIME
    GEN & NOOP --> ORCH
    ORCH -->|"APPLY"| PRE
    PRE --> LOCK --> BR --> AP
    AP -->|"success"| PROM
    AP -->|"failure"| COMP
    PROM -->|"SAGA_SUCCEEDED"| BUS
    PROM -->|"TARGET_MOVED"| BUS
    PROM -->|"SAGA_PARTIAL_PROMOTE"| BUS
    COMP -->|"SAGA_ROLLED_BACK"| BUS
    ORCH -->|"VERIFY fail"| BUS

    style INTAKE fill:#0d1117,stroke:#70a5fd,stroke-width:2px,color:#a9b1d6
    style GLS fill:#0d1117,stroke:#bf91f3,stroke-width:2px,color:#a9b1d6
    style SAGA fill:#0d1117,stroke:#9ece6a,stroke-width:2px,color:#a9b1d6
    style JPRIME fill:#0d1117,stroke:#e0af68,stroke-width:2px,color:#a9b1d6
    style BUS fill:#1a1b27,stroke:#7dcfff,stroke-width:2px,color:#7dcfff
    style LOCK fill:#1a1b27,stroke:#f7768e,stroke-width:2px,color:#f7768e

How it works:

Zone 6.9 sensors fan out per repo — Each of the three repos (JARVIS, JARVIS-Prime, Reactor-Core) gets its own BacklogSensor, TestFailureSensor (with real TestWatcher subprocess poller), and OpportunityMinerSensor. VoiceCommandSensor is always-on and event-driven.
TestWatcher polls continuously — Runs pytest in a subprocess every 300s per repo. Emits a stable intent:test_failure envelope only after streak ≥ 2 consecutive failures — preventing false alarms from transient flakes.
B+ branch isolation — Every apply creates an ephemeral branch ouroboros/saga-<op_id>/<repo>. Patches are committed there. Promote uses git merge --ff-only — if the target moved (TARGET_MOVED), the gate fails and the saga compensates cleanly.
Two-tier locking — asyncio.Lock (in-process) + fcntl.flock (cross-process) acquired in sorted repo name order (jarvis → prime → reactor-core) — deterministic, deadlock-free across concurrent ops.
SAGA_PARTIAL_PROMOTE — If promotion succeeds for some repos but fails for others, the new SAGA_PARTIAL_PROMOTE terminal state triggers a scoped pause (cross_repo_saga scope) until the operator reviews the partial state.
SagaMessageBus — A passive, fault-isolated observer (zero execution authority) records 8 event types: SAGA_CREATED, SAGA_ADVANCED, SAGA_COMPLETED, SAGA_FAILED, SAGA_ROLLED_BACK, SAGA_PARTIAL_PROMOTE, TARGET_MOVED, ANCESTRY_VIOLATION. Fire-and-forget — a broken bus never blocks an apply.
SagaLedgerArtifact — A 15-field frozen dataclass records every saga op: original_ref, saga_branch, promoted_sha, rollback_reason, kept_forensics_branches, and timestamp_ns. Full audit trail in the durable ledger.
J-Prime generates patches — GCP golden image at 136.113.252.164:8000 generates schema 2c.1 multi-repo patches. Noop fast-path (2b.1-noop) skips directly to COMPLETE if the change is already present.
Voice narration — VoiceNarrator announces intent, decision, and postmortem at each significant phase. OUROBOROS_VOICE_DEBOUNCE_S prevents over-narration (default 60s).

Activation (v262.0 — all green):

# .env (required for full autonomous operation)
JARVIS_GOVERNANCE_MODE=governed
JARVIS_SAGA_BRANCH_ISOLATION=true
JARVIS_SAGA_KEEP_FORENSICS_BRANCHES=true

# Start
python3 unified_supervisor.py --force

Ouroboros: Honest Capability Assessment

What it does:

Detects opportunities across all 3 repos (test failures, backlog, complexity, voice commands)
Calls J-Prime on GCP, receives a schema 2c.1 multi-repo patch
Applies with B+ saga safety — ephemeral branches, two-tier locks, ff-only promote gates, rollback
Narrates every decision in real time via voice + TUI
Commits and promotes across jarvis + prime + reactor without human touch

Where it stands vs. Claude Code:

Capability	Claude Code	Ouroboros v262.0
Read arbitrary files during an op	Full `Read` tool	Partial — TheOracle + context_expander (10 files max)
Run bash commands	Yes	No
Search the web	Yes	No
Edit code iteratively with feedback	Multi-turn, sees results	One-shot patch + apply
Test before committing	Runs tests, reads output, fixes	Applies first, verifies after
Persistent strategic goal memory	Deep conversation context	Per-op intent only

Core difference: Claude Code is a full agentic loop with tool use — reads, runs, observes, revises, converges. Ouroboros is a code generation + automated apply pipeline. J-Prime generates a patch once; the B+ saga applies it. No iterative tool-use loop within an operation yet.

What would close the gap:

Tool use in the generation loop — J-Prime calls read_file, run_command, run_tests during generation
Multi-turn op execution — generate → run → observe → revise → converge
Persistent goal memory — accumulates your long-running intent across sessions into every op's context
Sandboxed shell — Ouroboros verifies its own changes before committing

Bottom line: Real, production-grade autonomous code delivery — not a demo. JARVIS will find work, generate patches via J-Prime, and commit across 3 repos without you touching anything. Not yet at Claude Code-level agentic tool use. That is the explicit next evolution.

GCP Hybrid Cloud Spot Architecture

Purpose, Problem, Challenge, Solution

Purpose: Run high-throughput inference and training on GCP while preserving local fallback and cost control.
Problem: On-demand cloud is expensive at scale, while local-only inference cannot absorb peak load or large-model demand.
Core Challenge: Balance latency, uptime, and spend when Spot VMs can be preempted without warning.
What This Solves: Introduces hybrid execution with preemption-aware orchestration, checkpoint recovery, and automatic failover to local/API tiers.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    REQ["Inference / Training Request"] --> ORCH["Hybrid Orchestrator"]
    ORCH --> SPOT["GCP Spot VM Pool<br/><i>primary cost-optimized execution</i>"]
    ORCH --> LOCAL["Local Apple Silicon Tier<br/><i>low-latency fallback</i>"]
    ORCH --> API["Claude API Tier<br/><i>emergency overflow</i>"]

    SPOT --> PREEMPT{"Preempted?"}
    PREEMPT -->|"no"| RUN["Run Workload"]
    PREEMPT -->|"yes"| RECOVER["Resume From Checkpoint"]
    RECOVER --> RUN

    RUN --> TELE["Telemetry + Cost Signals"]
    TELE --> ORCH
    RUN --> RES["Response / Model Artifact"]
    LOCAL --> RES
    API --> RES

    style ORCH fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style SPOT fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style LOCAL fill:#1a1b27,stroke:#7dcfff,stroke-width:2px,color:#7dcfff
    style API fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style PREEMPT fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6

Golden Image Architecture (Model-Ready Compute)

Purpose, Problem, Challenge, Solution

Purpose: Eliminate repeated cold setup by pre-baking model runtimes and dependencies into immutable machine images.
Problem: Dynamic provisioning causes long startup times, dependency drift, and inconsistent behavior across nodes.
Core Challenge: Keep images reproducible and secure while continuously shipping model/runtime updates.
What This Solves: Establishes an immutable golden-image pipeline with validation gates and rollout controls for consistent low-latency boot.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    SRC["Model + Runtime Source"] --> BUILD["Image Builder Pipeline"]
    BUILD --> BAKE["Bake Golden Image<br/><i>models + deps + startup contracts</i>"]
    BAKE --> VALIDATE["Validation Gate<br/><i>health, integrity, startup SLA</i>"]
    VALIDATE -->|"pass"| REG["Image Registry"]
    VALIDATE -->|"fail"| REJECT["Reject Build"]

    REG --> SCALE["Autoscaled GCP Inference Nodes"]
    SCALE --> PRIME["JARVIS-Prime Router"]
    PRIME --> MON["Observability + Drift Monitoring"]
    MON --> BUILD

    style BUILD fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style BAKE fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style VALIDATE fill:#1a1b27,stroke:#7dcfff,stroke-width:2px,color:#7dcfff
    style REG fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style REJECT fill:#1a1b27,stroke:#f7768e,stroke-width:2px,color:#f7768e

Execution Planes (Control / Data / Model)

Purpose, Problem, Challenge, Solution

Purpose: Separate operational concerns into control, data, and model planes for clearer ownership and safer evolution.
Problem: Without plane separation, policy, state, and model behavior become tightly coupled and brittle during scale-out.
Core Challenge: Enforce governance and safety globally while allowing model and data pipelines to move quickly.
What This Solves: Makes architecture auditable and composable: control governs, data persists context, models execute decisions.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart TB
    subgraph CONTROL["🛡️ Control Plane"]
        C1["Policy Engine"]
        C2["Auth + Approval Gates"]
        C3["Secrets + Key Management"]
        C4["Kill Switch + Guardrails"]
    end

    subgraph DATA["📦 Data Plane"]
        D1["JARVIS Runtime Events"]
        D2["Redis + Cloud SQL State"]
        D3["ChromaDB / FAISS Memory"]
        D4["JSONL Telemetry + Lineage"]
    end

    subgraph MODEL["🧠 Model Plane"]
        M1["Prime Inference Router"]
        M2["Tiered Execution (GCP/Local/Claude)"]
        M3["Reactor Training Pipeline"]
        M4["Deployment Gate + Probation"]
    end

    CONTROL -->|"policy constraints"| DATA
    CONTROL -->|"permit / deny"| MODEL
    DATA -->|"context + telemetry"| MODEL
    MODEL -->|"decisions + artifacts"| DATA
    MODEL -->|"health + risk signals"| CONTROL

    style CONTROL fill:#0d1117,stroke:#70a5fd,stroke-width:2px,color:#a9b1d6
    style DATA fill:#0d1117,stroke:#bf91f3,stroke-width:2px,color:#a9b1d6
    style MODEL fill:#0d1117,stroke:#bb9af7,stroke-width:2px,color:#a9b1d6

Memory Control Plane (UMA-Aware Resource Governance)

Purpose, Problem, Challenge, Solution

Purpose: Govern shared Apple Silicon UMA memory with explicit, lease-based control across model loads, display surfaces, and agent runtime.
Problem: GPU/compositor pressure is often invisible to process-level memory metrics, so systems can appear healthy while heading into swap thrash.
Core Challenge: Coordinate memory decisions across heterogeneous consumers while preventing flapping and preserving critical capabilities.
What This Solves: Introduces deterministic memory governance with pressure-aware lease grants, stepwise shedding, and crash-safe lease reconciliation.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'primaryBorderColor': '#70a5fd', 'lineColor': '#545c7e', 'secondaryColor': '#24283b', 'tertiaryColor': '#1a1b27', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart TB
    subgraph OBS["📊 UMA Observability"]
        Q["MemoryQuantizer<br/><i>system + process sampling</i>"]
        S["Frozen MemorySnapshot<br/><i>headroom, pressure tier, thrash state</i>"]
        Q --> S
    end

    subgraph BROKER["🧠 MemoryBudgetBroker"]
        B1["Lease Manager<br/><i>grant / deny / preempt</i>"]
        B2["Budget Engine<br/><i>tier multipliers + safety reserve</i>"]
        B3["Recovery Ledger<br/><i>epoch fencing + stale lease reclaim</i>"]
    end

    subgraph CONSUMERS["📦 Lease Holders"]
        M["Model Loaders<br/><i>LLM, vision, speaker ID</i>"]
        A["Agent Runtime<br/><i>mesh workers + queues</i>"]
        D["Ghost Display<br/><i>display:ghost@v1</i>"]
    end

    subgraph CONTROL["🖥️ DisplayPressureController"]
        C1["Policy State Machine<br/><i>one-step downgrade invariant</i>"]
        C2["Shedding Ladder<br/><i>1080p -> 900p -> 720p -> 576p -> off</i>"]
        C3["Flap Guards<br/><i>dwell, cooldown, rate limits</i>"]
    end

    S -->|"pressure tier + headroom"| B2
    B2 --> B1
    B3 --> B1
    B1 -->|"lease outcomes"| M
    B1 -->|"lease outcomes"| A
    B1 -->|"lease outcomes"| D

    B1 -->|"pressure signal"| C1
    C1 --> C2
    C2 -->|"resolution action"| D
    C1 --> C3
    C3 -->|"allow / delay"| C2

    D -->|"amend_lease_bytes"| B1
    B1 -->|"events + decisions"| T["Telemetry Pipeline"]
    T -->|"drift + anomaly feedback"| Q

    style OBS fill:#0d1117,stroke:#70a5fd,stroke-width:2px,color:#a9b1d6
    style BROKER fill:#0d1117,stroke:#bf91f3,stroke-width:2px,color:#a9b1d6
    style CONSUMERS fill:#0d1117,stroke:#bb9af7,stroke-width:2px,color:#a9b1d6
    style CONTROL fill:#0d1117,stroke:#7dcfff,stroke-width:2px,color:#a9b1d6
    style T fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6

Key design decisions

Lease-first memory policy — Components must request memory leases before expensive allocations; brokered leases are the source of truth.
Typed pressure tiers — Budget aggressiveness changes by pressure tier to avoid hardcoded, brittle thresholds.
Deterministic shedding — Display degradation follows ordered one-step transitions, preventing abrupt multi-level drops.
Flap prevention controls — Dwell windows, cooldowns, and rate limits stop oscillation under noisy pressure signals.
Crash-safe reconciliation — Epoch fencing and stale lease recovery reclaim orphaned allocations after process failures.
Closed-loop observability — Broker and controller events feed telemetry so memory policy can be calibrated over time.

Safety & Governance Path

Purpose, Problem, Challenge, Solution

Purpose: Document the decision policy from risk classification to approval, execution, blocking, and audit.
Problem: Autonomous systems can perform high-impact actions where incorrect execution is costly or irreversible.
Core Challenge: Balance autonomy and velocity with explicit human control boundaries for high-risk operations.
What This Solves: Provides a predictable safety envelope: low-risk auto-exec, medium-risk constrained mode, high-risk human-in-the-loop.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    IN["Incoming Action"] --> CLASS["Risk Classifier"]
    CLASS -->|"low risk"| AUTO["Auto Execute"]
    CLASS -->|"medium risk"| SAFE["Safe Mode + Limits"]
    CLASS -->|"high risk"| HITL["Human Approval Required"]

    SAFE --> EXEC["Controlled Execution"]
    HITL -->|"approved"| EXEC
    HITL -->|"denied"| BLOCK["Blocked + Logged"]

    EXEC --> MON["Runtime Monitor"]
    MON -->|"policy violation"| TRIP["Circuit Breaker Trip"]
    TRIP --> FB["Fallback Route / Degrade Gracefully"]
    MON -->|"healthy"| OK["Commit Result"]

    BLOCK --> AUD["Audit Trail"]
    FB --> AUD
    OK --> AUD

    style IN fill:#24283b,stroke:#545c7e,stroke-width:1px,color:#a9b1d6
    style CLASS fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style HITL fill:#1a1b27,stroke:#ffb86c,stroke-width:2px,color:#ffb86c
    style TRIP fill:#1a1b27,stroke:#f7768e,stroke-width:2px,color:#f7768e
    style AUD fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7

Observability & Closed-Loop Learning

Purpose, Problem, Challenge, Solution

Purpose: Show how runtime signals become training data, deployment decisions, and measurable model upgrades.
Problem: Teams often collect telemetry but fail to operationalize it into safe, repeatable improvement cycles.
Core Challenge: Detect regressions early, gate bad models, and continuously retrain without destabilizing production.
What This Solves: Establishes a true learning loop: observe -> detect -> curate -> train -> gate/probation -> deploy or rollback.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a1b27', 'primaryTextColor': '#a9b1d6', 'lineColor': '#545c7e', 'fontSize': '13px', 'fontFamily': 'JetBrains Mono, monospace' }}}%%

flowchart LR
    RUN["Live Inference + Agent Runtime"] --> OTEL["OpenTelemetry Traces/Metrics"]
    RUN --> LOGS["Structured JSONL Logs"]
    RUN --> COST["LangFuse + Helicone + PostHog"]

    OTEL --> HUB["Unified Observability Hub"]
    LOGS --> HUB
    COST --> HUB

    HUB --> ALERT["Anomaly/Regression Detection"]
    ALERT -->|"critical"| ROLLBACK["Auto Rollback / Gate Fail"]
    ALERT -->|"acceptable"| CURATE["Telemetry Curation"]

    CURATE --> TRAIN["Reactor Training (LoRA/DPO/RLHF)"]
    TRAIN --> GATE["Deployment Gate + Probation"]
    GATE -->|"pass"| PRIME["Prime Model Registry"]
    GATE -->|"fail"| ROLLBACK

    PRIME --> RUN

    style RUN fill:#1a1b27,stroke:#70a5fd,stroke-width:2px,color:#70a5fd
    style HUB fill:#1a1b27,stroke:#bf91f3,stroke-width:2px,color:#bf91f3
    style TRAIN fill:#1a1b27,stroke:#bb9af7,stroke-width:2px,color:#bb9af7
    style ROLLBACK fill:#1a1b27,stroke:#f7768e,stroke-width:2px,color:#f7768e

Repository Breakdown

Port 8010
60+ Agent Neural Mesh
Voice Biometrics
Ghost Display + Vision
macOS Native (203 Swift files)
RAG + Ouroboros Self-Programming

Port 8000-8001
11 Specialist GGUF Models (40.4 GB)
Task-Type Inference Routing
LLaVA Vision Server
CoT/ToT Reasoning Engine
Neural Switchboard v98.1

Port 8090
LoRA / DPO / RLHF Training
Deployment Gate + Probation
Model Lineage Tracking
GCP Spot VM Auto-Recovery
Native C++ Training Kernels

Deep Dive

Agent Architecture

Neural Mesh — 16+ specialized agents (activity recognition, adaptive resource governor, context tracker, error analyzer, goal inference, Google Workspace, health monitor, memory, pattern recognition, predictive planning, spatial awareness, visual monitor, web search, coordinator) with asynchronous message passing, capability-based routing, and cross-agent data flow
Autonomous Agent Runtime — multi-step goal decomposition, agentic task execution, tool orchestration, error recovery, and intervention decision engine with human-in-the-loop approval for destructive actions
AGI OS Coordinator — proactive event stream, notification bridge, owner identity service, voice approval manager, and intelligent startup announcer

Voice and Authentication

Real-time voice biometric authentication via ECAPA-TDNN speaker verification with cloud/local hybrid inference and multi-factor fusion (voice + proximity + behavioral)
Real-time voice conversation — full-duplex audio (simultaneous mic + speaker), acoustic echo cancellation (speexdsp), streaming STT (faster-whisper), adaptive turn detection, barge-in control, and sliding 20-turn context window
Wake word detection (Porcupine/Picovoice), Apple Watch Bluetooth proximity auth, continuous learning voice profiles
Unified speech state management — STT hallucination guard, voice pipeline orchestration, parallel model loading

Vision and Spatial Intelligence

Never-skip screen capture — two-phase monitoring (always-capture + conditional-analysis), self-hosted LLaVA multimodal analysis, Claude Vision escalation
Ghost Display — virtual macOS display for non-intrusive background automation, Ghost Hands orchestrator for autonomous visual workflows
Claude Computer Use — automated mouse, keyboard, and screenshot interaction via Anthropic's Computer Use API
OCR / OmniParser — screen text extraction, window analysis, workspace name detection, multi-monitor and multi-space intelligence via yabai window manager
YOLO + Claude hybrid vision — object detection with LLM-powered semantic understanding
Rust vision core — native performance for fast image processing, bloom filter networks, and sliding window analysis

macOS Native Integration (Swift / Objective-C / Rust)

Swift bridge (203 files) — CommandClassifier, SystemControl (preferences, security, clipboard, filesystem), PerformanceCore, ScreenCapture, WeatherKit, CoreLocation GPS
Objective-C voice unlock daemon — JARVISVoiceAuthenticator, JARVISVoiceMonitor, permission manager, launchd service integration
Rust performance layer — PyO3 bindings for memory pool management, quantized ML inference, vision fast processor, command classifier, health predictor; ARM64 SIMD assembly optimizations
CoreML acceleration — on-device intent classification, voice processing

Infrastructure and Reliability

Parallel initializer with cooperative cancellation, adaptive EMA-based deadlines, dependency propagation, and atomic state persistence
CPU-pressure-aware cloud shifting — automatic workload offload to GCP when local resources are constrained
Enterprise hardening — dependency injection container, enterprise process manager, system hardening, governance, Cloud SQL with race-condition-proof proxy management, TLS-safe connection factories, distributed lock manager
Three-tier inference routing: GCP Golden Image (primary) → Local Apple Silicon (fallback) → Claude API (emergency)
Trinity event bus — cross-repo IPC hub, heartbeat publishing, knowledge graph, state management, process coordination
Cost tracking and rate limiting — GCP cost optimization with Bayesian confidence fusion, intelligent rate orchestration
File integrity guardian — pre-commit integrity verification across the codebase

Intelligence and Learning

Google Workspace Agent — Gmail read/search/draft, Google Calendar, natural language intent routing via tiered command router
Proactive intelligence — predictive suggestions, proactive vision monitoring, proactive communication, emotional intelligence module
RAG pipeline — ChromaDB vector store, FAISS similarity search, embedding service, long-term memory system
Chain-of-thought / reasoning graph engine — LangGraph-based multi-step reasoning with conditional routing and reflection loops
Ouroboros (v262.0 B+ — fully activated) — autonomous self-development across JARVIS, JARVIS-Prime, and Reactor-Core: B+ branch-isolated saga applies (ephemeral branches, two-tier locks, ff-only promote gates, rollback-via-branch-delete), SagaMessageBus passive observer, TestFailureSensor with real TestWatcher per repo, GCP J-Prime code generation (schema 2c.1), voice narration at every decision phase
Web research service — autonomous web search and information synthesis
A/B testing framework — vision pipeline experimentation
Repository intelligence — code ownership analysis, dependency analyzer, API contract analyzer, AST transformer, cross-repo refactoring engine

Deep Dive

Inference and Routing

11 specialist GGUF models (~40.4 GB) pre-baked into a GCP golden image with ~30-second cold starts
Task-type routing — math queries hit Qwen2.5-7B, code queries hit DeepCoder, simple queries hit a 2.2 GB fast model, vision hits LLaVA
GCP Model Swap Coordinator with intelligent hot-swapping, per-model configuration, and inference validation
Neural Switchboard v98.1 — stable public API facade over routing and orchestration with WebSocket integration contracts
Hollow Client mode for memory-constrained hardware — strict lazy imports, zero ML dependencies at startup on 16 GB machines

Reasoning and Telemetry

Continuous learning hook — post-inference experience recording for Elastic Weight Consolidation via ReactorCore
Reasoning engine activation — chain-of-thought scaffolding (CoT/ToT/self-reflection) for high-complexity requests above configurable thresholds
APARS protocol (Adaptive Progress-Aware Readiness System) — 6-phase startup with real-time health reporting to the supervisor
LLaVA vision server — multimodal inference on port 8001 with OpenAI-compatible API, semaphore serialization, queue depth cap
Telemetry capture — structured JSONL interaction logging with deployment feedback loop and post-deployment probation monitoring

Deep Dive

Training Pipeline

Full training pipeline: telemetry ingestion → active learning selection → gatekeeper evaluation → LoRA SFT → GGUF export → deployment gate → probation monitoring → feedback loop
DeploymentGate validates model integrity before deployment; rejects corrupt or degenerate outputs
Post-deployment probation — 30-minute health monitoring window with automatic commit or rollback based on live inference quality
Model lineage tracking — full provenance chain (hash, parent model, training method, evaluation scores, gate decision) in append-only JSONL
Tier-2/Tier-3 runtime orchestration — curriculum learning, meta-learning (MAML), causal discovery with correlation-based fallback, world model training

Infrastructure and Integration

GCP Spot VM auto-recovery with training checkpoint persistence and 60% cost reduction over on-demand instances
Native C++ training kernels via CMake/pybind11/cpp-httplib for performance-critical operations
Atomic experience snapshots — buffer drain under async lock, JSONL with DataHash for dataset versioning
PrimeConnector — WebSocket path rotation, health polling fallback, contract path discovery for cross-repo communication
Cross-repo integration — Ghost Display state reader, cloud mode detection, Trinity Unified Loop Manager, pipeline event logger with correlation IDs

Technical Footprint

Metric	Value
Total commits	3,900+ across 3 repositories
Codebase	~2.5 million lines across 18+ languages
Build duration	12 months, solo
Unified kernel	50,000+ lines in a single orchestration file
Neural Mesh agents	16+ specialized agents with async message passing
Models served	11 specialist GGUF models via task-type routing
Inference tiers	GCP Golden Image → Local Metal GPU → Claude API
Training pipeline	Automated: telemetry → active learning → gatekeeper → training → GGUF export → deployment gate → probation → feedback
Voice auth	Multi-factor: ECAPA-TDNN biometric + Apple Watch proximity + behavioral analysis
Vision pipeline	Never-skip capture, LLaVA self-hosted, Claude escalation, YOLO hybrid, OCR/OmniParser
Swift components	203 files — system control, command classifier, screen capture, GPS, weather
Rust crates	5 Cargo workspaces — memory pool, vision processor, ML inference, SIMD optimizations
Terraform modules	7 modules (compute, network, security, storage, monitoring, budget, Spot templates)
Dockerfiles	6 (backend, backend-slim, frontend, training, cloud, GCP inference)
GitHub Actions	20+ workflows (CI/CD, CodeQL, e2e testing, deployment, database validation, file integrity)
macOS integration	Native Swift/ObjC daemons, yabai WM, Ghost Display, multi-space/multi-monitor, launchd services
Cloud infrastructure	GCP (Compute Engine, Cloud SQL, Cloud Run, Secret Manager, Monitoring), Spot VM auto-recovery
Google Workspace	Gmail read/search/draft, Calendar, natural language routing via tiered command router

Background

I graduated from Cal Poly San Luis Obispo with a B.S. in Computer Engineering after a 10-year non-traditional academic path that started in remedial algebra at community college. I retook courses, studied through the loss of family, and spent most of my twenties earning a degree that others finish in four years. The path was not conventional. The outcome was.

JARVIS is what happens when that level of persistence meets engineering capability. Twelve months of daily commits, architectural decisions at every layer of the stack, and a refusal to ship anything that is not production-grade.

Derek J. Russell drussell23

Achievements

Achievements

Now Vibing

Currently Building

Tech Stack

Languages

ML, Inference and Data

Infrastructure and Cloud

Backend and Frontend

AI Tools & Development

Demo

Data Structures & Algorithms

GitHub Stats

Metrics Dashboard

The JARVIS Ecosystem

Hero Architecture (TL;DR)

Triple Authority Resolution — Status Overview

Disease 1: God File / Monolith Paradox

System Architecture

Data Flow

Three-Tier Inference Routing

Trinity Autonomy Wiring (Phase 2)

Ouroboros — Autonomous Self-Development (v262.0 B+)

Ouroboros: Honest Capability Assessment

GCP Hybrid Cloud Spot Architecture

Golden Image Architecture (Model-Ready Compute)

Execution Planes (Control / Data / Model)

Memory Control Plane (UMA-Aware Resource Governance)

Safety & Governance Path

Observability & Closed-Loop Learning

Repository Breakdown

Deep Dive

Deep Dive

Deep Dive

Technical Footprint

Background

Pinned Loading

Uh oh!