System Design

Architecture

askLenny is three purpose-built containers communicating over a private Docker network. Each container has a single responsibility and a minimal attack surface.

System overview

Docker network — internal bridge
React 19 + Vite:5173 exposed

Frontend Container

  • Schema Discovery UI
  • AI enrichment controls
  • Natural language query bar
  • SQL viewer + results table
  • Served as static bundle
Calls Python App API for all data operations
Python + FastAPI:8000 internal

App Layer Container

  • Database connectors (6 engines)
  • INFORMATION_SCHEMA discovery
  • Embedding generation
  • LLM integration (configurable)
  • SQL generation + execution
  • Results returned to frontend
Only container with optional external connectivity (LLM endpoint)
Rust + Axum:3030 internal only

Graph Engine Container

  • Binary graph database (LichenEngine)
  • 64-byte cache-line records
  • Memory-mapped file I/O
  • Cosine similarity vector search
  • BFS GraphRAG context extraction
  • Never touches source databases
Persistent volume: nodes.dat · edges.dat · strings.dat · vectors.dat
App Layer only — LLM endpoint (configurable; use local model for zero egress)
App Layer only — Source database connections (INFORMATION_SCHEMA + SQL execution)
01Frontend Container

React dashboard — what the user sees

The frontend container serves a React 19 + Vite single-page application on port 5173. It is the only container you expose to your internal network. Users open it in a browser to manage schema discovery, run AI enrichment, and ask natural-language questions.

The query interface sends the user's question to the Python app layer and receives back both the generated SQL and the actual query results. Results are rendered as a sortable data table — no copy-paste, no manual execution required.

Frontend never touches databases directly

All database operations (discovery, queries, execution) go through the Python app layer API. The browser is isolated from your database network.

Python App Layer API consumed by frontend

MethodEndpoint
GET/api/connectors
GET/api/schema/{db}
POST/api/enrich/column
POST/api/enrich/table
POST/api/commit
POST/api/query
02App Layer Container

Python orchestration — the integration hub

The Python container runs a FastAPI application that acts as the integration layer between the frontend, your source databases, the Rust graph engine, and the AI model. It is the only container that holds database credentials or makes outbound LLM calls.

When a query arrives, it embeds the question, calls the graph engine for context, sends the assembled prompt to the LLM, receives SQL, executes it against the source database, and returns both the SQL and the result rows to the frontend.

SQL Server / Azure SQLpyodbc + ODBC Driver 18
PostgreSQLasyncpg / psycopg2
MySQLPyMySQL
Snowflakesnowflake-connector-python
Oraclecx_Oracle / python-oracledb
SQLitesqlite3 (stdlib)

Graph Engine API consumed by Python layer

MethodEndpoint
POST/nodes
PATCH/nodes/{id}
DELETE/nodes/{id}
POST/edges
POST/query/vector_search
GET/nodes/find
GET/health

Configurable LLM endpoint

Set LLM_BASE_URL to any OpenAI-compatible server:

  • • Google Gemini (default)
  • • Ollama — zero egress, fully on-prem
  • • Azure OpenAI — stays within your tenant
  • • vLLM, LM Studio, LocalAI, or any compatible server
03Graph Engine Container — LichenEngine

Rust graph database — the semantic core

LichenEngine is a custom binary graph database written in Rust, built specifically for AI-assisted schema navigation. It stores nodes (databases, tables, columns) and edges (containment, foreign keys) as fixed 64-byte records aligned to CPU cache lines.

All four data files are memory-mapped. The OS page cache serves as the buffer pool — warm reads hit L3 cache at near-RAM speed. Cold reads cost one disk seek. The engine never issues network calls and never touches your source databases.

Vector search is a brute-force cosine similarity scan over 1536-dimensional float32 embeddings stored in a parallel file. After finding the top matches, BFS traversal through the pointer-linked adjacency lists assembles the full schema context in a single HTTP response.

Persistent volume — four binary files

nodes.dat

Label, description ptr, category, SQL type, is_pk, deleted flag

64 bytes/node

edges.dat

Source/target IDs, join type, cardinality, linked-list pointers

64 bytes/edge

strings.dat

Append-only text heap — labels and AI-generated descriptions

variable

vectors.dat

1536 × float32 semantic embeddings per node

6 144 bytes/node

MethodEndpoint
POST/nodes
PATCH/nodes/{id}
POST/edges
POST/query/vector_search
GET/nodes/find
GET/health

End-to-end data flows

Schema ingestion

  1. 1User opens Schema Discovery in the frontend
  2. 2Frontend calls Python app → app queries INFORMATION_SCHEMA
  3. 3App sends table/column names to Rust engine for lookup (gap analysis)
  4. 4User clicks ✨ AI — app sends name + type to LLM → receives description
  5. 5User clicks Commit → app sends node + embedding to Rust engine
  6. 6Engine writes 64-byte record + 1536-float vector to persistent volume

Natural language query

  1. 1User types question in the frontend query bar
  2. 2Frontend sends question to Python app layer
  3. 3App embeds question → sends 1536-dim vector to Rust engine
  4. 4Engine returns top-5 schema matches + full GraphRAG context markdown
  5. 5App assembles prompt (context + question + dialect) → calls LLM
  6. 6LLM returns SQL — app executes it against source database
  7. 7App returns { sql, rows, row_count } to frontend
  8. 8Frontend renders SQL + results table to the user