Architecture
askLenny is three purpose-built containers communicating over a private Docker network. Each container has a single responsibility and a minimal attack surface.
System overview
Frontend Container
- Schema Discovery UI
- AI enrichment controls
- Natural language query bar
- SQL viewer + results table
- Served as static bundle
App Layer Container
- Database connectors (6 engines)
- INFORMATION_SCHEMA discovery
- Embedding generation
- LLM integration (configurable)
- SQL generation + execution
- Results returned to frontend
Graph Engine Container
- Binary graph database (LichenEngine)
- 64-byte cache-line records
- Memory-mapped file I/O
- Cosine similarity vector search
- BFS GraphRAG context extraction
- Never touches source databases
React dashboard — what the user sees
The frontend container serves a React 19 + Vite single-page application on port 5173. It is the only container you expose to your internal network. Users open it in a browser to manage schema discovery, run AI enrichment, and ask natural-language questions.
The query interface sends the user's question to the Python app layer and receives back both the generated SQL and the actual query results. Results are rendered as a sortable data table — no copy-paste, no manual execution required.
Frontend never touches databases directly
All database operations (discovery, queries, execution) go through the Python app layer API. The browser is isolated from your database network.
Python App Layer API consumed by frontend
| Method | Endpoint |
|---|---|
| GET | /api/connectors |
| GET | /api/schema/{db} |
| POST | /api/enrich/column |
| POST | /api/enrich/table |
| POST | /api/commit |
| POST | /api/query |
Python orchestration — the integration hub
The Python container runs a FastAPI application that acts as the integration layer between the frontend, your source databases, the Rust graph engine, and the AI model. It is the only container that holds database credentials or makes outbound LLM calls.
When a query arrives, it embeds the question, calls the graph engine for context, sends the assembled prompt to the LLM, receives SQL, executes it against the source database, and returns both the SQL and the result rows to the frontend.
pyodbc + ODBC Driver 18asyncpg / psycopg2PyMySQLsnowflake-connector-pythoncx_Oracle / python-oracledbsqlite3 (stdlib)Graph Engine API consumed by Python layer
| Method | Endpoint |
|---|---|
| POST | /nodes |
| PATCH | /nodes/{id} |
| DELETE | /nodes/{id} |
| POST | /edges |
| POST | /query/vector_search |
| GET | /nodes/find |
| GET | /health |
Configurable LLM endpoint
Set LLM_BASE_URL to any OpenAI-compatible server:
- • Google Gemini (default)
- • Ollama — zero egress, fully on-prem
- • Azure OpenAI — stays within your tenant
- • vLLM, LM Studio, LocalAI, or any compatible server
Rust graph database — the semantic core
LichenEngine is a custom binary graph database written in Rust, built specifically for AI-assisted schema navigation. It stores nodes (databases, tables, columns) and edges (containment, foreign keys) as fixed 64-byte records aligned to CPU cache lines.
All four data files are memory-mapped. The OS page cache serves as the buffer pool — warm reads hit L3 cache at near-RAM speed. Cold reads cost one disk seek. The engine never issues network calls and never touches your source databases.
Vector search is a brute-force cosine similarity scan over 1536-dimensional float32 embeddings stored in a parallel file. After finding the top matches, BFS traversal through the pointer-linked adjacency lists assembles the full schema context in a single HTTP response.
Persistent volume — four binary files
nodes.datLabel, description ptr, category, SQL type, is_pk, deleted flag
64 bytes/node
edges.datSource/target IDs, join type, cardinality, linked-list pointers
64 bytes/edge
strings.datAppend-only text heap — labels and AI-generated descriptions
variable
vectors.dat1536 × float32 semantic embeddings per node
6 144 bytes/node
| Method | Endpoint |
|---|---|
| POST | /nodes |
| PATCH | /nodes/{id} |
| POST | /edges |
| POST | /query/vector_search |
| GET | /nodes/find |
| GET | /health |
End-to-end data flows
Schema ingestion
- 1User opens Schema Discovery in the frontend
- 2Frontend calls Python app → app queries INFORMATION_SCHEMA
- 3App sends table/column names to Rust engine for lookup (gap analysis)
- 4User clicks ✨ AI — app sends name + type to LLM → receives description
- 5User clicks Commit → app sends node + embedding to Rust engine
- 6Engine writes 64-byte record + 1536-float vector to persistent volume
Natural language query
- 1User types question in the frontend query bar
- 2Frontend sends question to Python app layer
- 3App embeds question → sends 1536-dim vector to Rust engine
- 4Engine returns top-5 schema matches + full GraphRAG context markdown
- 5App assembles prompt (context + question + dialect) → calls LLM
- 6LLM returns SQL — app executes it against source database
- 7App returns { sql, rows, row_count } to frontend
- 8Frontend renders SQL + results table to the user