-
Notifications
You must be signed in to change notification settings - Fork 70
Description
Although TPF queries are implemented, they are very slow and won't scale - a single TPF query iterates over all individual atoms in the store. To solve this, we need some type of index. Since we're using Sled, a key-value store, we can't use some SQL index, we need to build it ourselves.
One solution is to create two new Sled tree (a new k-v store). In the first one (for searching by Value) every k represents an Atomic Value, and v a vector of all subjects. In the second one for Properties, k = property, v = subject.
However, a very common TPF query will be like this: * isA SomeClass. If we only do above indexes, this will still be a costly query, because we'll still iterate over many resources - pretty much all resources will have the isA property.
We could improve performance if we'd also store the Property in the v fields mentioned above, instead of only storing the subjects. To prevent unnecessary data duplication / minimize storage impact, it might make sense to not store entire atoms, but to leave out the thing that's already known (the thing in the key).
A TPF query such as * isA SomeClass would probably start with using the ValueIndex, which return all SubjectProperty combinations. Then, the implementation will iterate over all SubjectProperties, filtering by property, returning all subjects.
I think Atomic Collections will rely on this query quite a bit: make a list of all Persons (or some class), sorted by some thing. This will do such a TPF query using the indexes, than returns all subjects.
Another possible optimization strategy is caching Collections (which internally use TPF queries). We could rebuild (or invalidate) them on Commits.