IRI Interning

The Problem with Strings

An IRI like http://www.wikidata.org/entity/Q11064932 is 46 bytes. A triple has three of them. Every index entry stores all three. With three indexes (SPO/POS/OSP), that is 46 × 3 × 3 = 414 bytes per triple just for the keys — before any data structure overhead.

String comparison is also expensive: comparing two 46-byte strings requires up to 46 byte-by-byte comparisons. B-tree traversal does this at every node.

IRI Interning

Loka interns every IRI, blank node, and literal to a u64 (8-byte integer) at write time. A bidirectional lookup table maps strings to IDs and back:

Forward map: "http://example.org/Tokyo" → 42
Reverse map: 42 → "http://example.org/Tokyo"

Now a triple index entry is 8 × 3 = 24 bytes (fixed), and comparison is a single 64-bit integer comparison — one CPU instruction.

Inline Literals

Small values are encoded directly into the TermId without a dictionary lookup:

Integers that fit in 56 bits are packed inline (high bits tag the type)
Booleans are packed inline (true = 1, false = 0, with type tag)

This means a triple like :person :age 42 never touches the dictionary for the literal 42 — it is encoded directly in the index key.

Content-Addressed RDF-star IDs

RDF-star allows any triple to be used as a subject or object of another triple (a "quoted triple"). Loka assigns a deterministic u64 ID to each quoted triple by hashing its three components:

quoted_id = hash(subject_id, predicate_id, object_id)

This means:

The same quoted triple always gets the same ID (content-addressed)
No separate reification table needed — the quoted triple ID is just another TermId
Edge annotation (<< :paper :discusses :AI >> :confidence 0.91) is zero-overhead — it's a regular triple where the subject happens to be a quoted triple ID

Why This Matters

3× smaller index entries compared to string-based keys
Single-instruction comparison instead of byte-by-byte string comparison
SIMD-friendly: packed u64 arrays enable AVX2 to compare 4 TermIds per CPU cycle
Zero-overhead RDF-star: quoted triples are just another integer ID in the index

IRI Interning & Content-Addressed RDF-star

The Problem with Strings

IRI Interning

Inline Literals

Content-Addressed RDF-star IDs

Why This Matters