Cost-Based Query Planning

The Problem

A SPARQL query with three triple patterns can be evaluated in six different orders. The wrong order might produce millions of intermediate results that the next pattern filters down to ten. The right order might produce ten results immediately.

Structural Heuristic Weights

Before looking at any data statistics, the planner assigns each pattern a structural weight based on which positions are bound or unbound:

All positions bound: weight 0 (point lookup, cheapest possible)
Each unbound position adds weight proportional to index fanout
VECTOR_SIMILAR with unbound subject: weight 1 (HNSW is highly selective — it returns top-k, not all matches)
VECTOR_SIMILAR with bound subject: weight 5 (must filter already-bound candidates, less selective)

This heuristic captures a key insight: run HNSW first when the subject is unbound (it returns few results), run it last when the subject is already bound (the graph pattern was more selective).

Cardinality Estimation

When data statistics are available (per-predicate cardinality from the store), the planner refines structural weights with actual selectivity estimates. The combined score is:

cost = structural_weight × cardinality_estimate

Patterns are evaluated in ascending cost order (cheapest first). This minimizes the number of rows flowing through downstream patterns.

Predicate Pushdown

FILTER expressions are repositioned to execute immediately after the pattern that binds their last required variable. A filter like FILTER(?age > 25) is pushed down to run right after the ?person :age ?age pattern, not at the end of the query.

This reduces the number of rows that flow through subsequent join patterns, sometimes by orders of magnitude.

Pattern Ordering Rules

Pattern Type	Placement	Reason
Bound subject pattern	First	Point lookup, O(1)
VECTOR_SIMILAR (unbound)	Early	Returns top-k, very selective
Regular pattern (unbound)	Middle	Ordered by cardinality estimate
VECTOR_SIMILAR (bound)	Late	Subject already bound, acts as filter
OPTIONAL	Last	Cannot reduce result count, only add columns
UNION	After regular patterns	Branches evaluated independently, results concatenated