Four-Index Architecture

The Standard Approach: Three Indexes

Every RDF triplestore maintains three permutation indexes over its triples. Each index sorts the same data in a different key order, enabling efficient lookups regardless of which positions are bound in a query pattern:

Index	Key Order	Answers
SPO	Subject → Predicate → Object	"What does Alice know?" (star-shaped queries, prefix scan on subject)
POS	Predicate → Object → Subject	"Who is a Person?" (type lookups, reverse resolution)
OSP	Object → Subject → Predicate	"What links to Tokyo?" (reverse traversal, incoming edges)

All three indexes store the same triples — they are redundant by design. The cost is 3× storage; the benefit is that any triple pattern can be resolved by a prefix scan on the appropriate index, never requiring a full table scan.

The Problem: Vectors Don't Fit

When you add vector search to a triplestore, the traditional approach is to bolt on a separate vector database. This creates two systems with different APIs, different storage, and a JSON handoff between them. The query planner has no visibility into the vector index — it can't reason about whether to use the HNSW index or the SPO index for a given pattern.

Loka's Approach: HNSW as a Fourth Index

Loka treats the HNSW index as a fourth index type that sits alongside SPO/POS/OSP. The query planner sees all four indexes and chooses the best access path for each pattern:

Index	Key Order	Data Structure	When Used
SPO	Subject → Predicate → Object	B-tree / LSM	Star-shaped queries, entity lookups
POS	Predicate → Object → Subject	B-tree / LSM	Type lookups, reverse resolution
OSP	Object → Subject → Predicate	B-tree / LSM	Reverse traversal, incoming edges
VECTOR(p)	One per vector predicate	HNSW graph	Approximate nearest neighbor search

How It Works

When a vector predicate (e.g., :hasEmbedding) is declared, Loka creates an HNSW index for that predicate. The vector literal is stored as a regular triple in SPO/POS/OSP, and the HNSW index maintains a parallel proximity graph keyed by the vector object's TermId.

The query planner's decision is straightforward:

Subject bound before VECTOR_SIMILAR: Execute the graph pattern first (SPO/POS/OSP), then filter candidates through the HNSW index.
Subject unbound: Execute the HNSW search first (returns top-k candidates), then evaluate graph patterns over those candidates.

This is the same cost-based reasoning the planner already does for SPO vs. POS vs. OSP — it just has a fourth option now.

Why This Matters

No separate vector database. One system, one storage format, one query language.
Unified query planning. The planner can interleave vector and graph operations based on cost, not API boundaries.
No JSON handoff. Results from the HNSW index are TermIds, the same type used by SPO/POS/OSP. No serialization/deserialization between systems.