Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow keys with internal nul chars in rust bindings #220

Merged
merged 9 commits into from
May 13, 2021

Conversation

gmossessian
Copy link
Contributor

Due to difference between Rust string and C string, it is possible to cause panic by submitting a string with a nul character in the middle to dictionary::get.

This PR returns empty match instead.

Copy link
Contributor

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narekgharibyan is the maintainer of the C and rust binding.

However, I have reservations regarding the approach. The limitation is a bug in the rust binding, keyvi supports zero bytes. Returning an empty match as proposed silences a bug. I would understand if the rust binding errors in some way and this error is handled on client side without causing panic.

Would it be possible to provide an additional get method in the C-API that takes a pointer and a length? I saw a container called keyvi_bytes in the code used in a different context but basically implementing this approach.

rust/Cargo.toml Outdated
@@ -1,6 +1,6 @@
[package]
name = "keyvi"
version = "0.5.3"
version = "0.5.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be part of this PR.

We bump versions as part of a release, there are some more things to do for that (branching, tagging, ...)

Having that said, we can create a patch release soonish if you need the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, removed version bump in 5dec183

@narekgharibyan
Copy link
Member

@gmossessian thanks for submitting this PR.

I agree with @hendrikmuhs that better approach to solve this would be to make use of keyvi_bytes and pass a binary representation of str down to keyvi cpp layer. Moreover this will also eliminate additional memory allocation of creating a CString.

Would it be possible to provide an additional get method in the C-API that takes a pointer and a length?

I think there is no need for additional API, and changing existing one is good enough. I don't think we really need backward compatibility here, but it's not hard to add an overload like:

struct keyvi_match* keyvi_dictionary_get(const struct keyvi_dictionary*, const keyvi_bytes);

@hendrikmuhs wdyt ?


And one semi related question.

keyvi itself does support arbitrary binary key, but rust API currently limits that to utf-8 strings by using str type.
On the other hand on python side we already have limitation of accepting only utf-8 strings as keys.

So I think it's fine to leave rust API as is for now.

@hendrikmuhs
Copy link
Contributor

Would it be possible to provide an additional get method in the C-API that takes a pointer and a length?

I think there is no need for additional API, and changing existing one is good enough. I don't think we really need backward compatibility here, but it's not hard to add an overload like:

struct keyvi_match* keyvi_dictionary_get(const struct keyvi_dictionary*, const keyvi_bytes);

@hendrikmuhs wdyt ?

sounds good to me. I don't have strong feelings regarding keeping the old API. It depends on whether somebody uses the C-API for something else. I am not aware of such a case, neither do we advocate for it. Therefore I leave it up to you.

And one semi related question.

keyvi itself does support arbitrary binary key, but rust API currently limits that to utf-8 strings by using str type.
On the other hand on python side we already have limitation of accepting only utf-8 strings as keys.

Good point, I think the python API was designed to be as convenient as possible. Not too long ago we dropped python 2 support. That means string and binary handling became much easier and we could actually add support for byte objects, if we have a usecase for it.

@narekgharibyan
Copy link
Member

Therefore I leave it up to you.

Then I'd say no need for a new one, and we can just change existing one.

That means string and binary handling became much easier and we could actually add support for byte objects, if we have a usecase for it.

Cool, then let's keep rust API part as is for now, and change it in the future if need be.

@@ -37,6 +37,14 @@ mod tests {
assert_eq!(m.matched_string(), "a");
}

#[test]
fn dictionary_get_nulerror() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmossessian seems this is the formatting issue, failing CI.

Note: you can format the by simply doing:

cargo fmt

@gmossessian
Copy link
Contributor Author

gmossessian commented May 11, 2021

Thanks for the discussion @hendrikmuhs and @narekgharibyan, I agree this is a better approach. I'll take a look at the library and re-request review when it's ready :)

@nararekgharibyan cargo fmt does not return any errors for me locally, furthermore I cannot see CI, I'm not sure it can be run on this PR without approval as I am a first-time contributor. nvm

@gmossessian gmossessian changed the title return empty match instead of NulError panic in rust dictionary binding allow keys with internal nul chars in rust bindings May 12, 2021
Copy link
Contributor

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it was my suggestion to use keyvi_bytes, actually we don't need it.

(FWIW: its odd that we still create a copy, because the the API requires a std::string, but lets ignore that for now, we can switch the dictionary getter to a string_view, but that requires C++17)

@@ -109,8 +109,12 @@ char* keyvi_dictionary_get_statistics(const keyvi_dictionary* dict) {
return std_2_c_string(dict->obj_->GetStatistics());
}

keyvi_match* keyvi_dictionary_get(const keyvi_dictionary* dict, const char* key) {
return new keyvi_match(dict->obj_->operator[](key));
keyvi_match* keyvi_dictionary_get(const keyvi_dictionary* dict, const keyvi_bytes key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I see the API, I realize we don't need to wrap and unwrap it:

keyvi_match* keyvi_dictionary_get(const keyvi_dictionary* dict, const char* key, size_t length)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! done in 6073fdd

Copy link
Contributor

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@gmossessian
Copy link
Contributor Author

Thank you @hendrikmuhs! I'll wait for @narekgharibyan's final approval as well.

Copy link
Member

@narekgharibyan narekgharibyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmossessian thanks for fixing it. LGTM!

@narekgharibyan narekgharibyan merged commit b6dcf45 into KeyviDev:master May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants