HNSW is not a separate system. It is the fourth index.
Every RDF triplestore maintains three permutation indexes over its triples. Each index sorts the same data in a different key order, enabling efficient lookups regardless of which positions are bound in a query pattern:
| Index | Key Order | Answers |
|---|---|---|
| SPO | Subject → Predicate → Object | "What does Alice know?" (star-shaped queries, prefix scan on subject) |
| POS | Predicate → Object → Subject | "Who is a Person?" (type lookups, reverse resolution) |
| OSP | Object → Subject → Predicate | "What links to Tokyo?" (reverse traversal, incoming edges) |
All three indexes store the same triples — they are redundant by design. The cost is 3× storage; the benefit is that any triple pattern can be resolved by a prefix scan on the appropriate index, never requiring a full table scan.
When you add vector search to a triplestore, the traditional approach is to bolt on a separate vector database. This creates two systems with different APIs, different storage, and a JSON handoff between them. The query planner has no visibility into the vector index — it can't reason about whether to use the HNSW index or the SPO index for a given pattern.
Loka treats the HNSW index as a fourth index type that sits alongside SPO/POS/OSP. The query planner sees all four indexes and chooses the best access path for each pattern:
| Index | Key Order | Data Structure | When Used |
|---|---|---|---|
| SPO | Subject → Predicate → Object | B-tree / LSM | Star-shaped queries, entity lookups |
| POS | Predicate → Object → Subject | B-tree / LSM | Type lookups, reverse resolution |
| OSP | Object → Subject → Predicate | B-tree / LSM | Reverse traversal, incoming edges |
| VECTOR(p) | One per vector predicate | HNSW graph | Approximate nearest neighbor search |
When a vector predicate (e.g., :hasEmbedding) is declared, Loka creates an HNSW index for that predicate. The vector literal is stored as a regular triple in SPO/POS/OSP, and the HNSW index maintains a parallel proximity graph keyed by the vector object's TermId.
The query planner's decision is straightforward:
This is the same cost-based reasoning the planner already does for SPO vs. POS vs. OSP — it just has a fourth option now.