Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend StringAppendTESTOperator to use string delimiter of variable length #4806

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

azagrebin
Copy link
Contributor

This PR extend StringAppendTESTOperator class to use std::string delimiter instead of char. char becomes then a custom case of std::string. This allows for delimiter of variable length, including zero length if delimiter is not needed at all and space can be saved. The PR also extends StringAppendOperator in Java API.

super(newSharedStringAppendTESTOperator(delim));
}

public StringAppendOperator(String delim) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid the constructor without the charset.

Copy link
Contributor Author

@azagrebin azagrebin Dec 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not mind deleting it. I was just following usual shortcut idea of Java standard packages where this kind of methods use Charset.defaultCharset() if charset is not passed. Do you think that the user awareness about charset should be explicitly forced?

In case of StringAppendOperator(char delim), char is implicitly converted into ASCII in c++ JNI code.

private native static long newSharedStringAppendOperator(final char delim);
private native static long newSharedStringAppendTESTOperator(final byte[] delim);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this named with "TEST" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something to discuss. I tried not to change too much in general.

As I understand there are 2 string appending merge operators in c++ at the moment: StringAppendOperator and StringAppendTESTOperator. The former is the first implementation based on MergeOperator.FullMerge before having MergeOperator.FullMergeV2. The later is more efficient based on the newer MergeOperator.FullMergeV2 interface. I changed the newer one in c++. The Java StringAppendOperator.newSharedStringAppendOperator() also creates the older one and newSharedStringAppendTESTOperator creates the newer one. Basically, now it is in sync with c++ code to some extend but might be a bit confusing in Java code.

I thought about creating another separate Java MergeOperator class which is backed by StringAppendTESTOperator. It could be called e.g. StringAppendOperatorWithVariableDelimitor. I can refactor it if it makes sense.

@azagrebin
Copy link
Contributor Author

@adamretter
Thanks for review, I have left some thoughts about your concerns, any comments would be appreciated.

@azagrebin
Copy link
Contributor Author

@adamretter @sagar0
any update here?

@@ -18,6 +20,19 @@ public StringAppendOperator(char delim) {
super(newSharedStringAppendOperator(delim));
}

public StringAppendOperator(byte[] delim) {
super(newSharedStringAppendTESTOperator(delim));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bad idea that, depending on the constructor argument, a different implementation of the string append opertor is used. I wonder if instead it would be better to either i) have a different class in the java API that alwys goes against StringAppendTESTOperator or ii) replace SharedStringAppendOperator with SharedStringAppendTESTOperator in all places here. Is there any scenario where a user would prefer SharedStringAppendOperator over SharedStringAppendTESTOperator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, while implementing this, I was not sure about all implications for the API.

ii) I do not know exact name origin and all backwards compatibility implications of using only SharedStringAppendTESTOperator, but it seems to me that it is more performant and it uses newer c++ merge interface, maybe @sagar0 could help to answer this question.

if ii) is not applicable, i) is what I suggested in the upper comment to add a separate java class StringAppendOperatorWithVariableDelimitor which uses SharedStringAppendTESTOperator if this looks ok for the API

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the time being this approach is okay.

  • StringAppendOperator implements the Associative Merge Operator
  • StringAppendTESTOperator implements the Non-Associative Merge Operator API

.setMergeOperator(stringAppendOperator);
final RocksDB db = RocksDB.open(opt, dbFolder.getRoot().getAbsolutePath())) {
// Writing aa under key
db.put("key".getBytes(), "aa".getBytes());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also cover the corner case of value "".

@adamretter
Copy link
Collaborator

I think for the time being this approach is okay.

  • StringAppendOperator implements the Associative Merge Operator
  • StringAppendTESTOperator implements the Non-Associative Merge Operator API

I think in the near future we could improve on this:

  1. Rename C++ StringAppendOperator to StringAppendOperatorAssoc
  2. Rename C++ StringAppendTESTOperator to StringAppendOperatorNonAssoc
  3. Develop a newer performant C++ String Append Operator implementing FullMergeV2 and PartialMergeMulti, and which allows char or std::string delimiters.
  4. Update the Java API StringAppendOperator to consistently use the newer implementation.

@adamretter
Copy link
Collaborator

@siying @gfosco this one is ready to merge please

@mrambacher
Copy link
Contributor

For C++, this was addressed in #8536. I do not believe the Java/JNI code has been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants