Pipeline

How It Works

askLenny builds a semantic understanding of your databases once, then uses it to answer every question. Here is what happens at each stage.

01Connect your databases

Configure your database connections

askLenny connects to your databases using a connectors.yaml file. Database passwords are not stored in the file — they reference environment variable names only. The actual value is passed at Docker runtime and exists only in the container process environment.

Supported engines: SQL Server · PostgreSQL · MySQL · Snowflake · Oracle · SQLite

askLenny reads only schema structure

Connections query INFORMATION_SCHEMA for table and column names. No SELECT * is ever issued. Row data is never accessed.

connectors.yaml
connectors:
  - id: "production_warehouse"
    display_name: "Production Warehouse"
    engine: "mssql"
    server: "10.0.1.55"
    port: 1433
    username: "asklenny_reader"
    password_env_var: "PROD_DB_PASSWORD"  # ← reads from env, never stored
    database: "Analytics"
02Schema discovery

Gap analysis — what is new and what has changed

When the Schema Discovery view loads, askLenny reads the current structure of every configured database and compares it against what is already stored in the graph engine. Each table and column receives a status badge.

You only need to act on New and Modified items. Synchronized objects are already in the graph and require no action.

New

Exists in the source database, not yet in the graph

Synchronized

Present in both; descriptions match

Modified

Description has been updated and not yet committed

Commit Pending

Has a description and is ready to be written to the graph

03AI enrichment

AI-generated descriptions and semantic embeddings

Clicking the ✨ AI button on a table or column calls the configured AI model with a structured prompt. The returned description is a single, plain-English sentence. You can accept it, edit it, or write your own.

The description is important: it becomes part of the semantic embedding that makes vector search work accurately. A column named net_rev_usd with description "Net revenue in US dollars after returns" will match queries about "revenue" and "sales total" — even though neither phrase appears in the column name.

What is sent to the AI model

  • ✓ Database name, table name, column name
  • ✓ SQL data type
  • No row data — ever
prompt template + response
"You are a database architect. Write a single, concise sentence
explaining the purpose of a column named '{column_name}'
(type: {data_type}) which lives inside the '{table_name}' table
of the '{db_name}' database. Return ONLY the sentence."

→  "Stores the net revenue in US dollars after discounts and
    returns have been applied to the order."
04Commit to graph

Write the enriched schema to the graph

Clicking Commit to Graph writes everything you have approved into the LichenEngine binary graph database. For each node:

  1. 1The label, description, SQL data type, engine dialect, and primary key flag are written as a 64-byte binary record.
  2. 2The description is converted to a 1536-dimensional float32 embedding vector and stored in the parallel vector file.
  3. 3Containment edges (Database→Table, Table→Column) and ForeignKey edges are created as separate 64-byte records with encoded cardinality and join type.

This step only needs to run when your schema changes. Once committed, the graph persists across container restarts.

Written to persistent volume

nodes.datBinary node records — 64 bytes each
edges.datBinary edge records — 64 bytes each
strings.datLabel and description text heap
vectors.dat1536 float32 values per node
05Ask a question

Question in — SQL and real results out

A user types a natural-language question. The Python app embeds it, the graph engine finds the relevant schema context, the LLM generates SQL, and the Python app executes it against your source database — returning both the SQL and the actual result rows in a single response.

Row data travels from your database directly to the user's browser via the Python app container. It is never stored, logged, or forwarded to any third party.

User question"Show total revenue by customer segment for Q3"
EmbeddingPython app embeds query → 1536 float32 values
Vector searchGraph engine scores all nodes by cosine similarity → top-5
Graph traversalBFS from each match → tables, columns, types, dialect
LLM promptContext + question + dialect rules → SQL generation
SQL generatedSELECT segment, SUM(net_rev_usd) FROM orders GROUP BY…
SQL executedPython app runs query against your source database
Results shown[{segment: "Enterprise", revenue: 8420000}, …]

New in this architecture

Previous versions returned SQL only — you had to run it yourself. askLenny now executes the query for you and displays the data directly in the dashboard. The Python app container handles execution; the Rust engine and frontend never touch row data.