The order you evaluate patterns matters more than how fast you evaluate each one.
A SPARQL query with three triple patterns can be evaluated in six different orders. The wrong order might produce millions of intermediate results that the next pattern filters down to ten. The right order might produce ten results immediately.
Before looking at any data statistics, the planner assigns each pattern a structural weight based on which positions are bound or unbound:
This heuristic captures a key insight: run HNSW first when the subject is unbound (it returns few results), run it last when the subject is already bound (the graph pattern was more selective).
When data statistics are available (per-predicate cardinality from the store), the planner refines structural weights with actual selectivity estimates. The combined score is:
cost = structural_weight × cardinality_estimate
Patterns are evaluated in ascending cost order (cheapest first). This minimizes the number of rows flowing through downstream patterns.
FILTER expressions are repositioned to execute immediately after the pattern that binds their last required variable. A filter like FILTER(?age > 25) is pushed down to run right after the ?person :age ?age pattern, not at the end of the query.
This reduces the number of rows that flow through subsequent join patterns, sometimes by orders of magnitude.
| Pattern Type | Placement | Reason |
|---|---|---|
| Bound subject pattern | First | Point lookup, O(1) |
| VECTOR_SIMILAR (unbound) | Early | Returns top-k, very selective |
| Regular pattern (unbound) | Middle | Ordered by cardinality estimate |
| VECTOR_SIMILAR (bound) | Late | Subject already bound, acts as filter |
| OPTIONAL | Last | Cannot reduce result count, only add columns |
| UNION | After regular patterns | Branches evaluated independently, results concatenated |