Pipeline

How It Works

askLenny builds a semantic understanding of your databases once, then uses it to answer every question. Here is what happens at each stage.

01Connect your databases

Configure your database connections

askLenny connects to your databases using a connectors.yaml file. Database passwords are not stored in the file - they reference environment variable names only. The actual value is passed at Docker runtime and exists only in the container process environment.

Supported engines: SQL Server · PostgreSQL · MySQL · Snowflake · Oracle · SQLite

askLenny reads only schema structure

Connections query INFORMATION_SCHEMA for table and column names. No SELECT * is ever issued. Row data is never accessed.

connectors.yaml

connectors:
  - id: "production_warehouse"
    display_name: "Production Warehouse"
    engine: "mssql"
    server: "10.0.1.55"
    port: 1433
    username: "asklenny_reader"
    password_env_var: "PROD_DB_PASSWORD"  # ← reads from env, never stored
    database: "AdventureWorks2019"
    schemas:                              # optional - omit to include all schemas
      - "Sales"
      - "HumanResources"
      - "Production"

ai_integration:
  model_to_use: "gemini-2.0-flash-lite"
  # base_url: "http://localhost:11434/v1"  # optional - Ollama, Azure OpenAI, vLLM, etc.
  # API key is never stored here - set AI_API_KEY in your .env file

02Schema discovery

Gap analysis - what is new and what has changed

When the Schema Discovery view loads, askLenny reads the current structure of every configured database and compares it against what is already stored in the graph engine. Each table and column receives a status badge.

You only need to act on New and Modified items. Synchronized objects are already in the graph and require no action.

FK relationships detected automatically

askLenny reads foreign key constraints from INFORMATION_SCHEMA and surfaces them as suggested links on each column card. You can confirm, override, or clear any relationship - confirmed links are written as FK edges in the graph and persist across restarts. Relationships defined on either the child or parent column are visible on both sides.

New

Exists in the source database, not yet in the graph

Synchronized

Present in both; descriptions match

Modified

Description has been updated and not yet committed

Commit Pending

Has a description and is ready to be written to the graph

03AI enrichment

AI-generated descriptions and semantic embeddings

Clicking the ✨ AI button on a column calls the configured AI model with a structured prompt. The returned description is a single, plain-English sentence. You can accept it, edit it, or write your own.

To enrich an entire table at once, click ✦ Enrich All at the top of the column list. This sends all unenriched columns to the AI in a single request, filling every description in one step. Existing descriptions are never overwritten.

Descriptions are important: they become part of the semantic embedding that makes vector search accurate. A column named net_rev_usd with description "Net revenue in US dollars after returns" will match queries about "revenue" and "sales total" - even though neither phrase appears in the column name.

What is sent to the AI model

✓ Database name, table name, column name
✓ SQL data type
✗ No row data - ever

prompt template + response

"You are a database architect. Write a single, concise sentence
explaining the purpose of a column named '{column_name}'
(type: {data_type}) which lives inside the '{table_name}' table
of the '{db_name}' database. Return ONLY the sentence."

→  "Stores the net revenue in US dollars after discounts and
    returns have been applied to the order."

04Commit to graph

Write the enriched schema to the graph

Clicking Commit to Graph writes everything you have approved into the LichenEngine binary graph database. For each node:

1The label, description, SQL data type, engine dialect, and primary key flag are written as a 64-byte binary record.
2The description is converted to a 1536-dimensional float32 embedding vector and stored in the parallel vector file.
3Containment edges (Database→Table, Table→Column) and ForeignKey edges are created as separate 64-byte records with encoded cardinality and join type.

This step only needs to run when your schema changes. Once committed, the graph persists across container restarts.

Written to persistent volume

nodes.datBinary node records - 64 bytes each

edges.datBinary edge records - 64 bytes each

strings.datLabel and description text heap

vectors.dat1536 float32 values per node

05Ask a question

Question in - SQL and real results out

A user types a natural-language question. The Python app embeds it, the graph engine finds the relevant schema context, the LLM generates SQL, and the Python app executes it against your source database - returning both the SQL and the actual result rows in a single response.

Row data travels from your database directly to the user's browser via the Python app container. It is never stored, logged, or forwarded to any third party.

User question"Show total revenue by customer segment for Q3"

↓

EmbeddingPython app embeds query → 1536 float32 values

↓

Vector searchGraph engine scores all nodes by cosine similarity → top-5

↓

Graph traversalBFS from each match → tables, columns, types, dialect

↓

LLM promptContext + question + dialect rules → SQL generation

↓

SQL generatedSELECT segment, SUM(net_rev_usd) FROM orders GROUP BY…

↓

SQL executedPython app runs query against your source database

↓

Results shown[{segment: "Enterprise", revenue: 8420000}, …]

New in this architecture

Previous versions returned SQL only - you had to run it yourself. askLenny now executes the query for you and displays the data directly in the dashboard. The Python app container handles execution; the Rust engine and frontend never touch row data.

Explore the architecture →Skip ahead to deployment