System Design

Architecture

askLenny is three purpose-built containers communicating over a private Docker network. Each container has a single responsibility and a minimal attack surface.

System overview

Docker network — internal bridge

React 19 + Vite:5173 exposed

Frontend Container

Schema Discovery UI
AI enrichment controls
Natural language query bar
SQL viewer + results table
Served as static bundle

Calls Python App API for all data operations

Python + FastAPI:8000 internal

App Layer Container

Database connectors (6 engines)
INFORMATION_SCHEMA discovery
Embedding generation
LLM integration (configurable)
SQL generation + execution
Results returned to frontend

Only container with optional external connectivity (LLM endpoint)

Rust + Axum:3030 internal only

Graph Engine Container

Binary graph database (LichenEngine)
64-byte cache-line records
Memory-mapped file I/O
Cosine similarity vector search
BFS GraphRAG context extraction
Never touches source databases

Persistent volume: nodes.dat · edges.dat · strings.dat · vectors.dat

HTTP REST

Frontend → App Layer

HTTP REST

App Layer → Engine

⚠App Layer only — LLM endpoint (configurable; use local model for zero egress)

⚠App Layer only — Source database connections (INFORMATION_SCHEMA + SQL execution)

01Frontend Container

React dashboard — what the user sees

The frontend container serves a React 19 + Vite single-page application on port 5173. It is the only container you expose to your internal network. Users open it in a browser to manage schema discovery, run AI enrichment, and ask natural-language questions.

The query interface sends the user's question to the Python app layer and receives back both the generated SQL and the actual query results. Results are rendered as a sortable data table — no copy-paste, no manual execution required.

Frontend never touches databases directly

All database operations (discovery, queries, execution) go through the Python app layer API. The browser is isolated from your database network.

Python App Layer API consumed by frontend

Method	Endpoint	Purpose
GET	/api/connectors	List configured database connections
GET	/api/schema/{db}	Fetch table + column structure
POST	/api/enrich/column	AI-generate column description
POST	/api/enrich/table	AI-generate table description
POST	/api/commit	Write enriched schema to graph engine
POST	/api/query	NL question → SQL + execute → results

02App Layer Container

Python orchestration — the integration hub

The Python container runs a FastAPI application that acts as the integration layer between the frontend, your source databases, the Rust graph engine, and the AI model. It is the only container that holds database credentials or makes outbound LLM calls.

When a query arrives, it embeds the question, calls the graph engine for context, sends the assembled prompt to the LLM, receives SQL, executes it against the source database, and returns both the SQL and the result rows to the frontend.

SQL Server / Azure SQLpyodbc + ODBC Driver 18

PostgreSQLasyncpg / psycopg2

MySQLPyMySQL

Snowflakesnowflake-connector-python

Oraclecx_Oracle / python-oracledb

SQLitesqlite3 (stdlib)

Graph Engine API consumed by Python layer

Method	Endpoint	Purpose
POST	/nodes	Add schema node (db/table/column)
PATCH	/nodes/{id}	Update node description
DELETE	/nodes/{id}	Soft-delete a node
POST	/edges	Add containment or FK edge
POST	/query/vector_search	Embed → top-k nodes + GraphRAG context
GET	/nodes/find	O(1) label lookup via in-memory index
GET	/health	Node / edge count health check

Configurable LLM endpoint

Set LLM_BASE_URL to any OpenAI-compatible server:

• Google Gemini (default)
• Ollama — zero egress, fully on-prem
• Azure OpenAI — stays within your tenant
• vLLM, LM Studio, LocalAI, or any compatible server

03Graph Engine Container — LichenEngine

Rust graph database — the semantic core

LichenEngine is a custom binary graph database written in Rust, built specifically for AI-assisted schema navigation. It stores nodes (databases, tables, columns) and edges (containment, foreign keys) as fixed 64-byte records aligned to CPU cache lines.

All four data files are memory-mapped. The OS page cache serves as the buffer pool — warm reads hit L3 cache at near-RAM speed. Cold reads cost one disk seek. The engine never issues network calls and never touches your source databases.

Vector search is a brute-force cosine similarity scan over 1536-dimensional float32 embeddings stored in a parallel file. After finding the top matches, BFS traversal through the pointer-linked adjacency lists assembles the full schema context in a single HTTP response.

Deep dive into the engine design →

Persistent volume — four binary files

nodes.dat

Label, description ptr, category, SQL type, is_pk, deleted flag

64 bytes/node

edges.dat

Source/target IDs, join type, cardinality, linked-list pointers

64 bytes/edge

strings.dat

Append-only text heap — labels and AI-generated descriptions

variable

vectors.dat

1536 × float32 semantic embeddings per node

6 144 bytes/node

Method	Endpoint	Purpose
POST	/nodes	Insert schema node + embed vector
PATCH	/nodes/{id}	Update description
POST	/edges	Create containment or FK edge
POST	/query/vector_search	cosine search + BFS GraphRAG
GET	/nodes/find	O(1) label lookup
GET	/health	Node + edge counts

End-to-end data flows

Schema ingestion

1User opens Schema Discovery in the frontend
2Frontend calls Python app → app queries INFORMATION_SCHEMA
3App sends table/column names to Rust engine for lookup (gap analysis)
4User clicks ✨ AI — app sends name + type to LLM → receives description
5User clicks Commit → app sends node + embedding to Rust engine
6Engine writes 64-byte record + 1536-float vector to persistent volume

Natural language query

1User types question in the frontend query bar
2Frontend sends question to Python app layer
3App embeds question → sends 1536-dim vector to Rust engine
4Engine returns top-5 schema matches + full GraphRAG context markdown
5App assembles prompt (context + question + dialect) → calls LLM
6LLM returns SQL — app executes it against source database
7App returns { sql, rows, row_count } to frontend
8Frontend renders SQL + results table to the user

Deep dive: the Rust engine →Deploy with Docker Compose