What SIMD Does

SIMD (Single Instruction, Multiple Data) applies the same operation to multiple values simultaneously. On modern x86 CPUs:

The key requirement: data must be contiguous in memory. SIMD cannot help with pointer-chasing through B-trees or linked lists — the CPU stalls waiting on cache misses, not arithmetic.

Where Loka Uses SIMD

1. Vector Distance Functions (HNSW hot path)

During HNSW search, every candidate node requires a distance computation against the query vector. For a 1536-dimensional embedding, that is 1536 multiply-add operations per comparison. This is the single hottest loop in vector search.

Loka uses explicit SIMD intrinsics for three distance functions:

FunctionAVX2SSEUse
Dot product8 f32/cycle (FMA)4 f32/cycleCosine similarity (pre-normalized vectors)
Squared Euclidean8 f32/cycle (FMA)4 f32/cycleEuclidean distance (avoids sqrt)
L2 norm8 f32/cycle (FMA)4 f32/cycleVector normalization at insert time

Runtime feature detection dispatches to the best available path. Scalar fallback on non-x86 architectures.

2. Pseudo-Table Column Scanning (columnar hot path)

When a SPARQL query hits a pseudo-table, the executor scans columnar arrays of u64 TermIds. Loka stores these as packed dense arrays (null values use a sentinel) and scans them with SIMD:

OperationAVX2SSE2Use
Equality scan4 u64/cycle2 u64/cycle?x :city :Tokyo
Not-null scan4 u64/cycle2 u64/cycle?x :email ?e (has property?)
Range scanCache-friendly scalarCache-friendly scalarFILTER(?age > 25)

The packed layout (dense u64 array instead of Option<u64>) eliminates 8 bytes of overhead per element and ensures data is contiguous for SIMD loads.

Where SIMD Does NOT Help

Triple index scans (SPO/POS/OSP)

Regular triple pattern lookups traverse a B-tree or LSM tree. The bottleneck is pointer-chasing through tree nodes, not arithmetic comparison. The CPU is waiting on cache misses, not running out of comparison bandwidth. SIMD cannot help here — the data is not contiguous.

This is why pseudo-tables exist: they transform the pointer-chasing problem into a columnar scan problem, which is SIMD-friendly.

Cosine via Normalized Dot Product

Computing cosine similarity directly requires two norms and a dot product per comparison (3 passes over the data). Loka follows Qdrant's pattern: normalize vectors at insert time, then use dot product for all similarity computations at search time. This is mathematically equivalent but requires only 1 pass instead of 3.