Skip to content

TPF indexing in Db #14

@joepio

Description

@joepio

Although TPF queries are implemented, they are very slow and won't scale - a single TPF query iterates over all individual atoms in the store. To solve this, we need some type of index. Since we're using Sled, a key-value store, we can't use some SQL index, we need to build it ourselves.

One solution is to create two new Sled tree (a new k-v store). In the first one (for searching by Value) every k represents an Atomic Value, and v a vector of all subjects. In the second one for Properties, k = property, v = subject.

However, a very common TPF query will be like this: * isA SomeClass. If we only do above indexes, this will still be a costly query, because we'll still iterate over many resources - pretty much all resources will have the isA property.

We could improve performance if we'd also store the Property in the v fields mentioned above, instead of only storing the subjects. To prevent unnecessary data duplication / minimize storage impact, it might make sense to not store entire atoms, but to leave out the thing that's already known (the thing in the key).

A TPF query such as * isA SomeClass would probably start with using the ValueIndex, which return all SubjectProperty combinations. Then, the implementation will iterate over all SubjectProperties, filtering by property, returning all subjects.

I think Atomic Collections will rely on this query quite a bit: make a list of all Persons (or some class), sorted by some thing. This will do such a TPF query using the indexes, than returns all subjects.

Another possible optimization strategy is caching Collections (which internally use TPF queries). We could rebuild (or invalidate) them on Commits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions