LichenEngine · Rust

The Engine

LichenEngine is a purpose-built binary graph database written in Rust, optimised from first principles for 64-bit hardware. Every design decision is made for one specific workload: storing SQL schema as a graph and retrieving it by semantic similarity.

A purpose-built engine, not a general-purpose graph database

LichenEngine does not support a query language, arbitrary property graphs, or generic traversal APIs. Every design decision — the fixed record size, the pointer-based edges, the parallel vector array, the GraphRAG context builder — is made specifically for one workload: store SQL schema structure as a graph, attach vector embeddings to every node, and retrieve relevant subgraphs by semantic similarity. Specialisation is the source of the performance.

Hardware Optimisation 1 of 4

64-byte records, aligned to CPU cache lines

Modern CPUs — every x86-64 and ARM64 processor — fetch memory in units called cache lines. A cache line is exactly 64 bytes. When the CPU reads any byte from RAM, the entire 64-byte block containing that byte is loaded into the L1 cache.

By making every record exactly 64 bytes, reading any field of a NodeRecord loads the entire record in a single cache-line fetch. Subsequent reads of other fields in the same record cost zero additional memory operations — the data is already in L1 cache.

The padding: [u8; 5] field at the end of every struct is not wasted space. It is load-bearing: it fills the record to exactly 64 bytes so no record can straddle two cache lines.

64B
every record
1
cache-line fetch per record
0
wasted bandwidth

Traditional variable-width records

43B
71B
← 64B cache line
28B
36B
overflow →
← 64B cache line

Records straddle cache lines — each read may fetch 2 cache lines

LichenEngine — fixed 64-byte records

NodeRecord64 bytes = exactly 1 cache line
EdgeRecord64 bytes = exactly 1 cache line
DbHeader64 bytes = exactly 1 cache line

1 memory fetch = 1 complete record. Zero wasted bandwidth.

NodeRecord binary layout — #[repr(C)]

NodeRecord — #[repr(C)]64 bytes total
8
id
u64
node identifier
8
first_out_edge_ptr
u64
head of outgoing edge list
8
first_in_edge_ptr
u64
head of incoming edge list
8
label_ptr
u64
offset into string heap
8
desc_ptr
u64
offset into string heap
4
label_len
u32
4
desc_len
u32
4
source_id
u32
owning database
2
ordinal
u16
1
category
u8
db / table / column
1
data_type
u8
INT / VARCHAR / etc.
1
is_pk
u8
primary key flag
1
deleted
u8
soft delete marker
1
engine_type
u8
SQL dialect
5
padding
[u8;5]
fills to exactly 64 bytes
Hardware Optimisation 2 of 4

Memory-mapped files — no copies, no buffer pool

All four data files are mapped directly into the process's virtual address space at startup using mmap (via the memmap2 crate). Reading a node record is a pointer dereference — not a read() system call, not a copy from kernel to userspace, not a buffer pool lookup.

The OS page cache serves as the buffer pool automatically. Pages accessed recently remain in RAM. Pages not recently accessed are transparently evicted and reloaded on demand. askLenny writes zero buffer management code.

Read latency by cache level
L1/L2 cache hit (warm)~4–10 cycles
L3 cache hit~40 cycles
RAM (OS page fault, page in cache)~100–200 cycles
Disk (cold page, SSD)~100,000 cycles
Memory access path

Process virtual address space

engine dereferences pointer

↕ transparent page fault

OS page cache (RAM)

warm hit: ~10 cycles · cold miss: load from disk

↕ transparent page fault

Disk (.dat files)

nodes.dat · edges.dat · strings.dat · vectors.dat

Hardware Optimisation 3 of 4

Pointer-based adjacency lists — O(degree) traversal

Each NodeRecord stores the memory address of its first outgoing and incoming edge directly in the struct. Each EdgeRecord stores the address of the next edge in the chain. This forms a singly-linked list of edges per direction, rooted at the node itself.

When a new edge is inserted, it is prepended to the list in O(1) — the new edge's next_out_ptr is set to the old head, and the node's first_out_edge_ptr is updated to the new edge index.

Traversing all outgoing edges from a node costs one pointer dereference per edge. There is no index lookup, no B-tree traversal, no query planner. For a node with 5 outgoing edges, finding all 5 costs 5 memory reads. The cost scales with the degree of the node — not with the total number of edges in the graph.

LichenEngine (linked list)

O(degree)

~10 cycles per hop

Relational DB (B-tree index)

O(log e)

~5–10 reads per node

Traversing outgoing edges from Node A

NodeRecord[A]first_out_edge_ptr = 7
1 pointer dereference (~10 cycles)
EdgeRecord[7]source=A, target=Bnext_out_ptr = 3
1 pointer dereference (~10 cycles)
EdgeRecord[3]source=A, target=Cnext_out_ptr = NULL
↳ traversal complete
Total reads: 3B-tree equivalent: 8–12 reads
Hardware Optimisation 4 of 4

Zero-copy serialisation with zerocopy

Reading a 64-byte struct from the mmap'd buffer without copying it requires the struct's in-memory layout to match its on-disk byte representation exactly. LichenEngine uses the zerocopy crate, which verifies at compile time that a struct has a fixed, stable byte layout (#[repr(C)], explicit padding, no uninitialised bytes) and then provides safe casts between the struct and its raw byte representation.

Reading a NodeRecord from the mmap'd buffer produces a reference directly into the mapped memory — no heap allocation, no byte-by-byte parsing, no copy. Writing is a single copy_from_slice of 64 bytes — exactly the bytes that will be stored on disk.

Reading a node

Cast &[u8; 64] slice → &NodeRecord reference. Zero bytes copied. Zero allocations.

Writing a node

Single copy_from_slice of 64 bytes into the mmap'd buffer. No serialisation step.

Semantic Search

Vector embeddings and cosine similarity

Every node carries a 1536-dimensional float32 embedding stored in vectors.dat at a fixed stride: node_id × 6144 bytes (1536 floats × 4 bytes each). The vector for any node is located with a single multiplication — no secondary index required.

Cosine similarity measures the angle between two vectors, independent of their magnitudes. A score of 1.0 means semantically identical; 0.0 means unrelated. This is what enables matching net_rev_usd with a query for "revenue" — the embedding captures meaning that literal string search cannot.

cosine similarity formula

similarity(q, n) = (q · n) / (|q| × |n|)

What makes this work for schema retrieval

net_rev_usd+"Net revenue in US dollars after discounts and returns"

Matches query: "show me revenue by region"

cust_id+"Unique identifier for each customer account"

Matches query: "list all customers"

ord_dt+"Date and time the order was placed"

Matches query: "orders in Q3 2024"

Context Assembly

GraphRAG — context from the graph, not just the node

Vector similarity finds relevant nodes but not their context. A column node that matches "revenue" is useless to an LLM without knowing which table it belongs to, which database that table is in, and what other columns exist in the same table.

After vector search, LichenEngine performs a breadth-first traversal outward from each matched node, collecting related nodes up to a configurable depth. The entire subgraph — database, tables, columns, types, dialect, primary key flags — is serialised into a structured markdown block and returned in the same HTTP response as the matched node IDs.

The Python layer receives both search results and assembled context in a single round trip. No N+1 problem. No separate context-fetch pass.

GraphRAG context output example
--- Anchor Context for [net_rev_usd] ---
[Database] (ID: 0, Depth: 0)
Label: production_db
Engine Dialect: Microsoft SQL Server (T-SQL)

[Table] (ID: 5, Depth: 1)
Label: orders
Description: One row per customer order

[Column] (ID: 12, Depth: 2) [PRIMARY KEY]
Label: order_id
Data Type: INT/INTEGER

[Column] (ID: 13, Depth: 2)
Label: net_rev_usd
Data Type: DECIMAL/FLOAT
Description: Net revenue after discounts
Current Limitations

What is not built yet

Brute-force vector search

Linear scan over all nodes (O(n × d)). Fast for schemas up to ~100,000 nodes. Approximate nearest-neighbour (HNSW/IVF) is on the roadmap.

Node limit

Current hardcoded limit: 100,000 nodes. The config file specifies 20M — wiring is in progress.

No ACID transactions

Crash-safe header ordering is implemented. Full write-ahead log and MVCC are not yet built.

Fixed embedding dimension

Currently hardcoded to 1536 floats. Making this a runtime config parameter is planned.