-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion for improving Set performance and utility #122
Comments
We don't actually get the "value" of the key -- (*KV)::get() just returns the pointer to the value (unless it's small enough to be inlined in the LSM tree). Then |
The biggest thing to answer here is, what problem are you trying to solve? The complexity of the solution must match the complexity of the problem that's being solved. I think the main takeaway I get here is to do concurrent writes. The problem there is that we have to maintain a write ahead log, and file writes are by default serial. If all writes to disk are serial, then their application to RAM (LSM tree), whether that happens serially or not, is irrelevant. On top of that, if we have the update key twice, and we apply them concurrently, then it's possible that the latter update gets applied before the former. Which would result in data loss. So, to apply them concurrently, we'll have to maintain a dependency tree; which further complicates things. So, concurrent writes won't result in performance gain, because the writes to disk have to be serialized. This doesn't even consider the fact that we're bound by the efficiency of compactions, not application to memtable in LSM. But, what we have done, is to allow concurrent reads, while writes are going on. And that gives almost all the performance benefits you'd need. To do atomic increments, CAS operation would be your friend. It was designed for exactly that purpose. The idea to do optional CAS has some merit, but it requires more care from users perspective. They must remember to always do CAS Sets for the same key. A single normal Set would no longer have CAS, which would render any future operations sensitive to failure. So, it's not a clearly better solution than providing CAS counter by default, and letting user decide when and how they want to use it. |
Disclaimer: I know what I'm about to suggest is quite challenging and may not fit with your plans for Badger. But I want to record my thoughts on this, in case you haven't yet considered an approach like this. Or maybe you have, and have found it to be flawed...
How it is
Currently, the various
Set
methods have some drawbacks:CAS requests require multiple traversals of the LSM tree:
a.
Get
the current value of some key, including its CAS (traversal Compaction logic #1)b.
Put
an updated value, which then callsget
in order to check the CAS (traversal Various small tweaks #2)c.
Put
then traverses the Skiplist to find the insertion location (traversal Implement bloom filters #3)CAS counters are uint16 values, which may not be what some applications want/need. It would be much better if applications can define their own CAS values, or not define them at all in order to save space.
All mutating requests are funneled to a single goroutine, which executes the requests in series. This means that heavy write loads can only take advantage of a single CPU.
How it could be
Add additional mutating methods to iterators rather than keeping them separate on the KV interface:
Here's how to use the API to atomically create and increment a value when there are concurrent writers:
Here's how to use the API to utilize a custom int64 CAS value:
What's nice about integrating the mutating operations into the iterator is that it gives callers a great deal of control over the CAS semantics, without sacrificing efficiency. In the common case, the atomic increment operation only needs to traverse the LSM tree once (though the iterator would need to be implemented properly to enable that). And the
compareAndSet
function shows that applications can maintain their own CAS value using whatever mechanism they choose.To make all this as efficient as possible, consider doing away with a single writer goroutine. All tree traversal can take place without locking. Then, when Add/Set/Delete methods are called on the iterator, a lock can be acquired while checking to see if any conflicting updates have occurred, as well as to make the actual update. In common cases, this can be done very quickly. For example, in most cases, the mutable skiplist is already positioned correctly to do the Add/Set/Delete. No further traversal is necessary - just a quick check to make sure that the skiplist is still mutable. By not traversing inside locks and allowing many concurrent writes, a heavy "put" load should be able to scale well across CPU's. This would definitely be some tricky code to write, but it would bring some very nice benefits.
The text was updated successfully, but these errors were encountered: