-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection Merge Operator #3620
base: main
Are you sure you want to change the base?
Collection Merge Operator #3620
Conversation
9634f7c
to
f88b9a3
Compare
@adamretter has updated the pull request. |
@adamretter has updated the pull request. View: changes |
f6cf56e
to
e28b938
Compare
@adamretter has updated the pull request. |
e28b938
to
2fb6642
Compare
@adamretter has updated the pull request. |
2fb6642
to
f31546e
Compare
@adamretter has updated the pull request. |
f31546e
to
dfb9bb5
Compare
@adamretter has updated the pull request. |
dfb9bb5
to
6c37e0a
Compare
@adamretter has updated the pull request. |
@yuslepukhin I am getting a compile error here with Visual Studio that I don't get with clang or gcc, I wonder if there is something simple you could point out to me that is Visual Studio specific? The error is:
|
I am tied on some production issues so it is going to be awhile before I take a look at it. |
Did not get a change to fetch and compile. However, |
65494a9
to
6eada61
Compare
6eada61
to
1cf6a0c
Compare
5699cb7
to
8ed5bd9
Compare
8ed5bd9
to
7e47bf4
Compare
@adamretter Thanks for the effort and sorry for the delay in reviewing this PR. What is the status on this one? Are you still actively interested in contributing this? The overall idea is fantastic, we have a new API (#5604) to get all the merge operands in DB, I was wondering if that can be used to do what is done in this PR. In the value users could store the operation like you mentioned [OperationType, Operand] and when they get all the merge operands then in user code they code simply iterate over all the operands and do what ever they like which includes sorting. That would avoid a lot of Ser/DeSer operations. Let me know your thoughts. |
@vjnadimpalli I think these are quite different things if I understand you correctly. Your feature allows to bring all merge operands into user code. Whereas this feature allows you to just call |
@adamretter overall the change is awesome. One thing I was curious about in |
0d3f01c
to
634a364
Compare
@vjnadimpalli we could use a set, but I wanted to avoid making copies which would use more memory than necessary. Also the code at the moment is succint as the algorithms work on both set and vector like storage. This could of course be further changed/enhanced in future if needed, but I think it serves quite well as a first version. I have also rebased as requested. |
634a364
to
66fd869
Compare
aa7e4e0
to
3f3ef65
Compare
@pdillinger Can you take a look at this please? |
3f3ef65
to
bdba1cd
Compare
@pdillinger @siying could you take a look at this PR please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
*/ | ||
struct SliceHasher { | ||
size_t operator()(const Slice& slice) const { | ||
size_t result = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use GetSliceNPHash64(slice)
cast to size_t
* your records into some format that is readable for you. The default | ||
* is just a simple hex serialization. | ||
*/ | ||
CollectionMergeOperator(const uint16_t fixed_record_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because there is often compiler tolerance for implicit narrowing conversions, I suggest using size_t
with a check on supported record len.
|
||
bool CollectionMergeOperator::exists(Slice& value_record, std::string& existing) const { | ||
const char* existing_ptr = existing.data(); | ||
for (size_t i = 0; i < existing.size(); i += fixed_record_len_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use binary search if ordered?
|
||
// loop through existing records, remove the first that matches the record to remove | ||
const char* existing_ptr = existing.data(); | ||
for (size_t j = 0; j < existing_records_len; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?: consider sharing code with exists()
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
std::string Clear() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We internally have a test that makes sure that everything in the repo can be compiled together. This means we should avoid populating the top-level rocksdb namespace with generically-named but specific-use symbols, even in test code. Can you put this stuff in a sub-namespace and import that in the test?
|
||
namespace ROCKSDB_NAMESPACE { | ||
|
||
const uint16_t DEFAULT_TEST_RECORD_SIZE = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test using size other than DEFAULT_TEST_RECORD_SIZE?
return result; | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on adding a factory function to MergeOperators?
bdba1cd
to
4175971
Compare
@adamretter has updated the pull request. You must reimport the pull request before landing. |
This is a new merge operator which allows you to store a collection of records within the value of a key/value pair.
A record is just a fixed length byte string. Features include:
Options for controlling the collection semantics, supporting both:
The Collection Merge Operator also optionally supports ordering, where a Comparator can be provided to ensure the ordering of records.
A simple example of vector like behaviour without ordering would look like: