Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use a string as a delimiter in StringAppendOperator #8536

Conversation

east825
Copy link
Contributor

@east825 east825 commented Jul 15, 2021

An arbitrary string can be used as a delimiter in StringAppend merge operator
flavor. In particular, it allows using an empty string, combining binary values for
the same key byte-to-byte one next to another.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with these changes (and made many of the same ones in #8481).

My one concern is in the stringappend_test. The way it is set up now it does not look as if the old 'char' interface is ever invoked, which would reduce the test coverage. Is it possible to somehow restructure the tests so that both the new and old constructors can be exercised?

@east825
Copy link
Contributor Author

east825 commented Jul 15, 2021

Oh, wow, I should have submitted my PR earlier :)

I agree, not invoking the old constructor in tests is my oversight. I will try to get covered again.

@east825
Copy link
Contributor Author

east825 commented Jul 27, 2021

For the history, I brought up this idea in the Google group explaining why we need this at JetBrains specifically (currently we maintain our own fork only to be able to use an empty delimiter on merge). Also, there is an older PR addressing the same limitation, but it seems abandoned for a while.

@mrambacher
Copy link
Contributor

@east825 If you wish for your PR to land before mine, I will work on getting it merged once you add the missing tests. I can update my change to support this PR.

Alternatively, your PR will have to merge with mine to add the Java APIs/tests.

@east825
Copy link
Contributor Author

east825 commented Jul 28, 2021

@mrambacher Thanks a lot. I'll add these tests today. It indeed might be a bit cumbersome for me to merge after you since I'm not that experienced with the project's internals, but on the whole, I'll be happy with whoever changes adding this functionality to the API.

@zhichao-cao
Copy link
Contributor

Thanks for the contribution! It is a very useful extension to the operator. Please follow Mark's comment. I'm reviewing his 8481 PR and it would better this PR can be landed before his. Thanks!

@zhichao-cao
Copy link
Contributor

@adamretter Can you help to take a look at the Java changes of this PR?

@@ -51,14 +53,14 @@ bool StringAppendTESTOperator::FullMergeV2(
printDelim = true;
} else if (numBytes) {
merge_out->new_value.reserve(
numBytes - 1); // Minus 1 since we have one less delimiter
numBytes - delim_.size()); // Adjust for one less delimiter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a simple explanation for why we minus delim_.size()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to reserve the space for the original value and N following operands with preceding delimiters. The size of "N following operands with preceding delimiters" is stored as numBytes. However, if there is no original value, the very first delimiter becomes redundant, therefore we reduce the amount of space needed by the size of one delimiter.

For instance, if the original value was 'foo' and operands 'bar', 'baz' with delimiter ',', we would need 3 + numBytes = 3 + 2 * (3 + 1) = 11 bytes, however if there is no original value, we need only 0 + numBytes - 1 = 7 bytes.
This adjustment just takes into account that a delimiter can now be of an arbitrary length, not only one byte.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you add these as the comment to the code?

@@ -20,6 +20,8 @@ class MergeOperators {
static std::shared_ptr<MergeOperator> CreateUInt64AddOperator();
static std::shared_ptr<MergeOperator> CreateStringAppendOperator();
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(char delim_char);
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a behavior change of StringAppend merge operator, also mention this change in HISTORY.md.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao This header is not in the public API. Should it still be mentioned in HISTORY or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to mention it in the HISTORY.md, since this merge operator is used by many users and they should be area of this behavior change. So mention it in the HISTORY.md would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the mention of this specific header from HISTORY.md, leaving just StringAppendOperator itself.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -20,6 +20,8 @@ class MergeOperators {
static std::shared_ptr<MergeOperator> CreateUInt64AddOperator();
static std::shared_ptr<MergeOperator> CreateStringAppendOperator();
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(char delim_char);
static std::shared_ptr<MergeOperator> CreateStringAppendOperator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhichao-cao This header is not in the public API. Should it still be mentioned in HISTORY or not?

utilities/merge_operators/string_append/stringappend.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@east825 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@east825 has updated the pull request. You must reimport the pull request before landing.

@east825
Copy link
Contributor Author

east825 commented Jul 29, 2021

@mrambacher

I generally agree with these changes (and made many of the same ones in #8481).

My one concern is in the stringappend_test. The way it is set up now it does not look as if the old 'char' interface is ever invoked, which would reduce the test coverage. Is it possible to somehow restructure the tests so that both the new and old constructors can be exercised?

I went for the simplest solution, switching between the two constructors depending on the length of a delimiter specified in a test. I don't want to add another layer of indirection on top of StringAppendOperatorTest::OpenFuncPtr to accommodate two possible signatures. It will probably be difficult to wrap one's head around.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix! Can you rebase/update so the fix can be accepted?

east825 and others added 4 commits July 29, 2021 17:40
An arbitrary string can be used as a delimiter in "StringAppend" merge operator
flavor. In particular, it allows to use an empty string, effectively combining
binary values for the same key byte-to-byte one next to another.
Expecting the upcoming changes turning the operator into a customizable entity.
@east825 east825 force-pushed the east825/StringAppendOperator-accepts-strings-as-delimiters branch from 4fd7589 to 7783b51 Compare July 29, 2021 14:59
@facebook-github-bot
Copy link
Contributor

@east825 has updated the pull request. You must reimport the pull request before landing.

@east825
Copy link
Contributor Author

east825 commented Jul 29, 2021

I can squash everything into one changeset on my side if you want to. In some projects, it's done automatically, not sure which workflow is preferred here.

@mrambacher
Copy link
Contributor

I can squash everything into one changeset on my side if you want to. In some projects, it's done automatically, not sure which workflow is preferred here.

No need to squash. As long as you are up-to-date with master and there are no conflicts for the merge, this should be good to go.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

3 similar comments
@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

HISTORY.md Outdated Show resolved Hide resolved
@@ -259,7 +271,7 @@ TEST_P(StringAppendOperatorTest, SimpleTest) {
}

TEST_P(StringAppendOperatorTest, SimpleDelimiterTest) {
auto db = OpenDb('|');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test, can you keep all the original tests and add the new test that uses the string? The old interface is still in use and we do not want to remove the tests until the interface is deprecated. Also, keep the std::shared_ptr OpenTtlDb(char delim_char).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the old interface is still used if the specified string is of length 1, addressing the problem @mrambacher pointed at.

It's a bit tricky to write completely independent tests using the new interface there since they are all parameterized and use two different database opening routines depending on the parameter in their SetUp. I believe it will complicate the setup step too much.

@facebook-github-bot
Copy link
Contributor

@east825 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zhichao-cao merged this pull request in 8f52972.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants