The Standard Approach: Three Indexes

Every RDF triplestore maintains three permutation indexes over its triples. Each index sorts the same data in a different key order, enabling efficient lookups regardless of which positions are bound in a query pattern:

IndexKey OrderAnswers
SPOSubject → Predicate → Object"What does Alice know?" (star-shaped queries, prefix scan on subject)
POSPredicate → Object → Subject"Who is a Person?" (type lookups, reverse resolution)
OSPObject → Subject → Predicate"What links to Tokyo?" (reverse traversal, incoming edges)

All three indexes store the same triples — they are redundant by design. The cost is 3× storage; the benefit is that any triple pattern can be resolved by a prefix scan on the appropriate index, never requiring a full table scan.

The Problem: Vectors Don't Fit

When you add vector search to a triplestore, the traditional approach is to bolt on a separate vector database. This creates two systems with different APIs, different storage, and a JSON handoff between them. The query planner has no visibility into the vector index — it can't reason about whether to use the HNSW index or the SPO index for a given pattern.

Loka's Approach: HNSW as a Fourth Index

Loka treats the HNSW index as a fourth index type that sits alongside SPO/POS/OSP. The query planner sees all four indexes and chooses the best access path for each pattern:

IndexKey OrderData StructureWhen Used
SPOSubject → Predicate → ObjectB-tree / LSMStar-shaped queries, entity lookups
POSPredicate → Object → SubjectB-tree / LSMType lookups, reverse resolution
OSPObject → Subject → PredicateB-tree / LSMReverse traversal, incoming edges
VECTOR(p)One per vector predicateHNSW graphApproximate nearest neighbor search

How It Works

When a vector predicate (e.g., :hasEmbedding) is declared, Loka creates an HNSW index for that predicate. The vector literal is stored as a regular triple in SPO/POS/OSP, and the HNSW index maintains a parallel proximity graph keyed by the vector object's TermId.

The query planner's decision is straightforward:

This is the same cost-based reasoning the planner already does for SPO vs. POS vs. OSP — it just has a fourth option now.

Why This Matters