-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to use a string as a delimiter in StringAppendOperator #8536
Allow to use a string as a delimiter in StringAppendOperator #8536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree with these changes (and made many of the same ones in #8481).
My one concern is in the stringappend_test. The way it is set up now it does not look as if the old 'char' interface is ever invoked, which would reduce the test coverage. Is it possible to somehow restructure the tests so that both the new and old constructors can be exercised?
Oh, wow, I should have submitted my PR earlier :) I agree, not invoking the old constructor in tests is my oversight. I will try to get covered again. |
For the history, I brought up this idea in the Google group explaining why we need this at JetBrains specifically (currently we maintain our own fork only to be able to use an empty delimiter on merge). Also, there is an older PR addressing the same limitation, but it seems abandoned for a while. |
@east825 If you wish for your PR to land before mine, I will work on getting it merged once you add the missing tests. I can update my change to support this PR. Alternatively, your PR will have to merge with mine to add the Java APIs/tests. |
@mrambacher Thanks a lot. I'll add these tests today. It indeed might be a bit cumbersome for me to merge after you since I'm not that experienced with the project's internals, but on the whole, I'll be happy with whoever changes adding this functionality to the API. |
Thanks for the contribution! It is a very useful extension to the operator. Please follow Mark's comment. I'm reviewing his 8481 PR and it would better this PR can be landed before his. Thanks! |
@adamretter Can you help to take a look at the Java changes of this PR? |
@@ -51,14 +53,14 @@ bool StringAppendTESTOperator::FullMergeV2( | |||
printDelim = true; | |||
} else if (numBytes) { | |||
merge_out->new_value.reserve( | |||
numBytes - 1); // Minus 1 since we have one less delimiter | |||
numBytes - delim_.size()); // Adjust for one less delimiter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a simple explanation for why we minus delim_.size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to reserve the space for the original value and N following operands with preceding delimiters. The size of "N following operands with preceding delimiters" is stored as numBytes
. However, if there is no original value, the very first delimiter becomes redundant, therefore we reduce the amount of space needed by the size of one delimiter.
For instance, if the original value was 'foo' and operands 'bar', 'baz' with delimiter ',', we would need 3 + numBytes = 3 + 2 * (3 + 1) = 11 bytes
, however if there is no original value, we need only 0 + numBytes - 1 = 7 bytes
.
This adjustment just takes into account that a delimiter can now be of an arbitrary length, not only one byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Can you add these as the comment to the code?
@@ -20,6 +20,8 @@ class MergeOperators { | |||
static std::shared_ptr<MergeOperator> CreateUInt64AddOperator(); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(char delim_char); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a behavior change of StringAppend merge operator, also mention this change in HISTORY.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao This header is not in the public API. Should it still be mentioned in HISTORY or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to mention it in the HISTORY.md, since this merge operator is used by many users and they should be area of this behavior change. So mention it in the HISTORY.md would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the mention of this specific header from HISTORY.md, leaving just StringAppendOperator
itself.
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -20,6 +20,8 @@ class MergeOperators { | |||
static std::shared_ptr<MergeOperator> CreateUInt64AddOperator(); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(char delim_char); | |||
static std::shared_ptr<MergeOperator> CreateStringAppendOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-cao This header is not in the public API. Should it still be mentioned in HISTORY or not?
@east825 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@east825 has updated the pull request. You must reimport the pull request before landing. |
I went for the simplest solution, switching between the two constructors depending on the length of a delimiter specified in a test. I don't want to add another layer of indirection on top of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix! Can you rebase/update so the fix can be accepted?
An arbitrary string can be used as a delimiter in "StringAppend" merge operator flavor. In particular, it allows to use an empty string, effectively combining binary values for the same key byte-to-byte one next to another.
Expecting the upcoming changes turning the operator into a customizable entity.
4fd7589
to
7783b51
Compare
@east825 has updated the pull request. You must reimport the pull request before landing. |
I can squash everything into one changeset on my side if you want to. In some projects, it's done automatically, not sure which workflow is preferred here. |
No need to squash. As long as you are up-to-date with master and there are no conflicts for the merge, this should be good to go. |
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
3 similar comments
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -259,7 +271,7 @@ TEST_P(StringAppendOperatorTest, SimpleTest) { | |||
} | |||
|
|||
TEST_P(StringAppendOperatorTest, SimpleDelimiterTest) { | |||
auto db = OpenDb('|'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the test, can you keep all the original tests and add the new test that uses the string? The old interface is still in use and we do not want to remove the tests until the interface is deprecated. Also, keep the std::shared_ptr OpenTtlDb(char delim_char).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the old interface is still used if the specified string is of length 1, addressing the problem @mrambacher pointed at.
It's a bit tricky to write completely independent tests using the new interface there since they are all parameterized and use two different database opening routines depending on the parameter in their SetUp
. I believe it will complicate the setup step too much.
@east825 has updated the pull request. You must reimport the pull request before landing. |
@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@zhichao-cao merged this pull request in 8f52972. |
An arbitrary string can be used as a delimiter in StringAppend merge operator
flavor. In particular, it allows using an empty string, combining binary values for
the same key byte-to-byte one next to another.