-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Binomial data files #299
Comments
Nice design! :) What happens exactly when log_async is full? Say that:
|
The sequence you describe feels like the "default" result of adapting the current merge stacking semantics to this scheme, and it sounds reasonable to me. It's possible that there are cleverer merge strategies that try to merge smaller data segments first in order to unblock the log_async. Unless there's an obvious win here, I'm tempted to say that effort spent in that direction would be better spent just making the merges faster so that they stack less often. |
This is exactly what I wanted to point out: there may be clever merge strategies. My insight is there may also be clever parallel merge strategies, where several merges could take place at the same time, on separate domains. I don't have a specific design in mind, I just thought it was worth mentioning. |
That would indeed be possible. |
Since I'm not sure I understand the second solution, I'll try to explain what I understood. The current version was straight-forward:
In the following image describing the new version:
Thanks for the RFC ;-) |
This is only true if bindings cannot be overwritten. The current semantics is
Actually, searching in a |
Ah yes, of course!
Oh right, I forgot about the fan-out, thanks for the clarification |
A note about these binomial data files: in practice it also degrades the performance of additions in the index by |
Could the reads be improved by some kind of disk pre-fetching? In practice, it would mean querying the |
As a side note: @let-def cleverly noticed that the shape of the data files is exactly the binary representation of the number of merges, where 1 is in position |
Aren't we trying to read from datak-1 before datak since an entry can be present in multiple data files and we have to return the most recent one? (it doesn't change anything about pre-fetching but you would pre-fetch bigger files while reading smallest ones) |
Pursuing the analogy with binary numbering, it is possible control the number of merges by using a suitable numbering system (e.g https://en.wikipedia.org/wiki/Balanced_ternary). This allow to lazily merge so that we never have to suddenly merge e.g. 30 files. However the size of individual files will grow (however in my benchmarks with a custom repr, the more files to merge the faster the process, until at least 8-way merge) |
If half of the entries are contained in |
This describes a potential significant improvement to the write amplification properties of large indices, at the cost of bounded read performance.
Currently, Index uses a very simple scheme for maintaining a sorted sequence of entries: all updates first go to the log, and then once the log exceeds a certain size we merge it with the data file. This has the unfortunate property that each merge is
O(n)
in the total number of bindings written so far (and each merge isO(1)
apart, leading toO(n²)
write amplification). In an index with 700 million bindings andlog_size = 10 million * len(entry)
, we rewrite 690 million redundant entries per 10 million writes (and this gets worse at a linear rate).A different scheme might use more than one data file, allowing the sorted property to be only partially restored on each merge. We could arrange these data files in a sequence and size them in powers of two. Each merge then reconstructs the smallest missing data slot by doing a k-way merge on the log + the (k-1) data files below it:
With this scheme, the amortized time complexity of merges is:
... with the worst case still being
O(n)
(or more preciselyO(n log log n)
), for a merge that happens each time the index doubles in size.Advantages
O(n²)
toO(n log n)
. On a Tezos-sized index, we can expect the average duration of merges to decrease by a factor of ~ 30. (!)Disadvantages
O(log n)
, since the data files must be read in sequence.len (merge a b)
can be less thanlen a + len b
due to duplicate bindings, so data file sizes will not be nice powers of two.The text was updated successfully, but these errors were encountered: