RDF has no tables, but relational structure exists. Loka finds it automatically.
When many nodes share the same set of predicates — like all Person nodes having name, age, email — they form a "characteristic set." This group behaves like rows in a relational table, even though no table was declared.
Loka discovers these groups automatically during the background maintenance cycle and materializes pseudo-tables: columnar indexes that accelerate the SQL-like portions of SPARQL execution.
A group qualifies when:
Statistical significance testing filters out spurious clusters that appear by chance. Frequency-only thresholds would produce false positives on noisy data.
Rows are stored in segments of ~2048 rows. Each segment maintains per-column statistics (min, max, null count, distinct count). When a query filter doesn't overlap a segment's min/max range, the entire segment is skipped without examining any rows.
Example: if a segment's maximum age is 30 and the query asks for ?age > 50, the segment is pruned. This is the DuckDB pattern applied to RDF.
The distribution of property coverage across a pseudo-table reveals data quality:
This metric is exposed through the health endpoint and Loka Studio, making pseudo-table discovery double as a data quality audit.
When a query mixes pseudo-table columns with properties not in the pseudo-table, the planner uses a two-phase strategy:
The columnar phase produces bound subjects cheaply. The join phase is fast because the subject is already bound (point lookup). Neither phase requires a full table scan.