The Problem

A SPARQL query with three triple patterns can be evaluated in six different orders. The wrong order might produce millions of intermediate results that the next pattern filters down to ten. The right order might produce ten results immediately.

Structural Heuristic Weights

Before looking at any data statistics, the planner assigns each pattern a structural weight based on which positions are bound or unbound:

This heuristic captures a key insight: run HNSW first when the subject is unbound (it returns few results), run it last when the subject is already bound (the graph pattern was more selective).

Cardinality Estimation

When data statistics are available (per-predicate cardinality from the store), the planner refines structural weights with actual selectivity estimates. The combined score is:

cost = structural_weight × cardinality_estimate

Patterns are evaluated in ascending cost order (cheapest first). This minimizes the number of rows flowing through downstream patterns.

Predicate Pushdown

FILTER expressions are repositioned to execute immediately after the pattern that binds their last required variable. A filter like FILTER(?age > 25) is pushed down to run right after the ?person :age ?age pattern, not at the end of the query.

This reduces the number of rows that flow through subsequent join patterns, sometimes by orders of magnitude.

Pattern Ordering Rules

Pattern TypePlacementReason
Bound subject patternFirstPoint lookup, O(1)
VECTOR_SIMILAR (unbound)EarlyReturns top-k, very selective
Regular pattern (unbound)MiddleOrdered by cardinality estimate
VECTOR_SIMILAR (bound)LateSubject already bound, acts as filter
OPTIONALLastCannot reduce result count, only add columns
UNIONAfter regular patternsBranches evaluated independently, results concatenated