The Problem with Strings

An IRI like http://www.wikidata.org/entity/Q11064932 is 46 bytes. A triple has three of them. Every index entry stores all three. With three indexes (SPO/POS/OSP), that is 46 × 3 × 3 = 414 bytes per triple just for the keys — before any data structure overhead.

String comparison is also expensive: comparing two 46-byte strings requires up to 46 byte-by-byte comparisons. B-tree traversal does this at every node.

IRI Interning

Loka interns every IRI, blank node, and literal to a u64 (8-byte integer) at write time. A bidirectional lookup table maps strings to IDs and back:

Now a triple index entry is 8 × 3 = 24 bytes (fixed), and comparison is a single 64-bit integer comparison — one CPU instruction.

Inline Literals

Small values are encoded directly into the TermId without a dictionary lookup:

This means a triple like :person :age 42 never touches the dictionary for the literal 42 — it is encoded directly in the index key.

Content-Addressed RDF-star IDs

RDF-star allows any triple to be used as a subject or object of another triple (a "quoted triple"). Loka assigns a deterministic u64 ID to each quoted triple by hashing its three components:

quoted_id = hash(subject_id, predicate_id, object_id)

This means:

Why This Matters