-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend StringAppendTESTOperator to use string delimiter of variable length #4806
base: main
Are you sure you want to change the base?
Conversation
super(newSharedStringAppendTESTOperator(delim)); | ||
} | ||
|
||
public StringAppendOperator(String delim) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid the constructor without the charset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not mind deleting it. I was just following usual shortcut idea of Java standard packages where this kind of methods use Charset.defaultCharset()
if charset
is not passed. Do you think that the user awareness about charset
should be explicitly forced?
In case of StringAppendOperator(char delim)
, char
is implicitly converted into ASCII in c++ JNI code.
private native static long newSharedStringAppendOperator(final char delim); | ||
private native static long newSharedStringAppendTESTOperator(final byte[] delim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this named with "TEST" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be something to discuss. I tried not to change too much in general.
As I understand there are 2 string appending merge operators in c++ at the moment: StringAppendOperator
and StringAppendTESTOperator
. The former is the first implementation based on MergeOperator.FullMerge
before having MergeOperator.FullMergeV2
. The later is more efficient based on the newer MergeOperator.FullMergeV2
interface. I changed the newer one in c++. The Java StringAppendOperator.newSharedStringAppendOperator()
also creates the older one and newSharedStringAppendTESTOperator
creates the newer one. Basically, now it is in sync with c++ code to some extend but might be a bit confusing in Java code.
I thought about creating another separate Java MergeOperator
class which is backed by StringAppendTESTOperator
. It could be called e.g. StringAppendOperatorWithVariableDelimitor
. I can refactor it if it makes sense.
@adamretter |
@adamretter @sagar0 |
@@ -18,6 +20,19 @@ public StringAppendOperator(char delim) { | |||
super(newSharedStringAppendOperator(delim)); | |||
} | |||
|
|||
public StringAppendOperator(byte[] delim) { | |||
super(newSharedStringAppendTESTOperator(delim)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bad idea that, depending on the constructor argument, a different implementation of the string append opertor is used. I wonder if instead it would be better to either i) have a different class in the java API that alwys goes against StringAppendTESTOperator
or ii) replace SharedStringAppendOperator
with SharedStringAppendTESTOperator
in all places here. Is there any scenario where a user would prefer SharedStringAppendOperator
over SharedStringAppendTESTOperator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, while implementing this, I was not sure about all implications for the API.
ii) I do not know exact name origin and all backwards compatibility implications of using only SharedStringAppendTESTOperator
, but it seems to me that it is more performant and it uses newer c++ merge interface, maybe @sagar0 could help to answer this question.
if ii) is not applicable, i) is what I suggested in the upper comment to add a separate java class StringAppendOperatorWithVariableDelimitor
which uses SharedStringAppendTESTOperator
if this looks ok for the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the time being this approach is okay.
StringAppendOperator
implements the Associative Merge OperatorStringAppendTESTOperator
implements the Non-Associative Merge Operator API
.setMergeOperator(stringAppendOperator); | ||
final RocksDB db = RocksDB.open(opt, dbFolder.getRoot().getAbsolutePath())) { | ||
// Writing aa under key | ||
db.put("key".getBytes(), "aa".getBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also cover the corner case of value ""
.
I think for the time being this approach is okay.
I think in the near future we could improve on this:
|
For C++, this was addressed in #8536. I do not believe the Java/JNI code has been added. |
This PR extend StringAppendTESTOperator class to use
std::string
delimiter instead ofchar
.char
becomes then a custom case ofstd::string
. This allows for delimiter of variable length, including zero length if delimiter is not needed at all and space can be saved. The PR also extendsStringAppendOperator
in Java API.