Architecture
askLenny is three purpose-built containers communicating over a private Docker network. Each container has a single responsibility and a minimal attack surface.
System overview
Frontend Container
- Schema Discovery UI
- AI enrichment controls (per-column + batch)
- FK relationship management
- Natural language query bar
- SQL viewer + results table
- Served as static bundle
App Layer Container
- Database connectors (6 engines)
- INFORMATION_SCHEMA discovery
- Embedding generation
- LLM integration (configurable)
- SQL generation + execution
- Results returned to frontend
Graph Engine Container
- Binary graph database (LichenEngine)
- 64-byte cache-line records
- Memory-mapped file I/O
- Cosine similarity vector search
- BFS GraphRAG context extraction
- Never touches source databases
React dashboard - what the user sees
The frontend container serves a React 19 + Vite single-page application on port 5173. It is the only container you expose to your internal network. Users open it in a browser to manage schema discovery, run AI enrichment, and ask natural-language questions.
The query interface sends the user's question to the Python app layer and receives back both the generated SQL and the actual query results. Results are rendered as a sortable data table - no copy-paste, no manual execution required.
Frontend never touches databases directly
All database operations (discovery, queries, execution) go through the Python app layer API. The browser is isolated from your database network.
Python App Layer API consumed by frontend
| Method | Endpoint |
|---|---|
| GET | /sync/gapdiscovery |
| POST | /sync/enrich |
| POST | /sync/enrich_table |
| POST | /sync/commit |
| POST | /query/ask |
| POST | /query/execute |
Python orchestration - the integration hub
The Python container runs a FastAPI application that acts as the integration layer between the frontend, your source databases, the Rust graph engine, and the AI model. It is the only container that holds database credentials or makes outbound LLM calls.
When a query arrives, it embeds the question, calls the graph engine for context, sends the assembled prompt to the LLM, receives SQL, executes it against the source database, and returns both the SQL and the result rows to the frontend.
pyodbc + ODBC Driver 18asyncpg / psycopg2PyMySQLsnowflake-connector-pythoncx_Oracle / python-oracledbsqlite3 (stdlib)Graph Engine API consumed by Python layer
| Method | Endpoint |
|---|---|
| POST | /nodes |
| PATCH | /nodes/{id} |
| POST | /edges |
| GET | /nodes/find |
| GET | /schema/nodes |
| GET | /schema/edges |
| POST | /query/vector_search |
| GET | /health |
Configurable AI endpoint
Three fields in connectors.yaml control the AI integration:
model_to_useModel name passed to the SDK - change without touching codebase_urlOptional. Set to any OpenAI-compatible endpoint (Ollama, Azure OpenAI, vLLM). Omit for default Gemini.AI_API_KEYAPI key - always from.env, never stored in a config file
Rust graph database - the semantic core
LichenEngine is a custom binary graph database written in Rust, built specifically for AI-assisted schema navigation. It stores nodes (databases, tables, columns) and edges (containment, foreign keys) as fixed 64-byte records aligned to CPU cache lines.
All four data files are memory-mapped. The OS page cache serves as the buffer pool - warm reads hit L3 cache at near-RAM speed. Cold reads cost one disk seek. The engine never issues network calls and never touches your source databases.
Vector search is a brute-force cosine similarity scan over 1536-dimensional float32 embeddings stored in a parallel file. After finding the top matches, BFS traversal through the pointer-linked adjacency lists assembles the full schema context in a single HTTP response.
Persistent volume - four binary files
nodes.datLabel, description ptr, category, SQL type, is_pk, deleted flag
64 bytes/node
edges.datSource/target IDs, join type, cardinality, linked-list pointers
64 bytes/edge
strings.datAppend-only text heap - labels and AI-generated descriptions
variable
vectors.dat1536 × float32 semantic embeddings per node
6 144 bytes/node
| Method | Endpoint |
|---|---|
| POST | /nodes |
| PATCH | /nodes/{id} |
| POST | /edges |
| GET | /nodes/find |
| GET | /schema/nodes |
| GET | /schema/edges |
| POST | /query/vector_search |
| GET | /health |
End-to-end data flows
Schema ingestion
- 1User opens Schema Discovery in the frontend
- 2Frontend calls Python app → app queries INFORMATION_SCHEMA (tables, columns, FK constraints)
- 3App diffs results against graph engine to assign New / Synchronized / Modified status badges
- 4User clicks ✨ AI per column, or ✦ Enrich All for the whole table in a single LLM call
- 5User confirms or overrides detected FK relationships on any column card
- 6User clicks Commit → app writes nodes + embeddings + FK edges to Rust engine
- 7Engine writes 64-byte records + 1536-float vectors to persistent volume
Natural language query
- 1User types question in the frontend query bar
- 2Frontend sends question to Python app layer
- 3App embeds question → sends 1536-dim vector to Rust engine
- 4Engine returns top-5 schema matches + full GraphRAG context markdown
- 5App assembles prompt (context + question + dialect) → calls LLM
- 6LLM returns SQL - app executes it against source database
- 7App returns { sql, rows, row_count } to frontend
- 8Frontend renders SQL + results table to the user