-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Remove nunicode from android binding #12497
Conversation
Our Android Collator implementation currently uses nunicode for "diacritic folding", not for case shifting, so I don't know a straightforward way to apply the same change there. I'm definitely not saying it's not possible -- just that I haven't figured it out. |
Did you run the tests in https://github.com/mapbox/mapbox-gl-native/blob/master/test/util/text_conversions.test.cpp? |
@@ -0,0 +1,21 @@ | |||
#include <mbgl/util/platform.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this file, since it's no longer "stdlib"
platform/android/src/java/lang.cpp
Outdated
jni::Object<String> String::New(JNIEnv& env, jni::String value) { | ||
static auto constructor = String::javaClass.GetConstructor<jni::String>(env); | ||
return String::javaClass.New(env, constructor, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jni::String is a Object, to be able to hook into toUpperCase
/toLowerCase
I'm creating mbgl::android::java::lang::String
. Would prefer not having to create a new String for each uppercase/lowercase. Any pointers on how to get this working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove static constexpr auto Name() { return "java/lang/String"; };
and change jni::Class<String>
to jni::Class<jni::StringTag>
, you will be able to use value.Call(env, method)
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jfirebaugh! this makes total sense
Was having issues running them, I'm making running our gtest suite on android on CI a priority. |
Is there anything that's blocking this PR? |
afc8838
to
cef6472
Compare
@kkaefer :
I addressed #12497 (comment) and would love some input on #12497 (comment) if possible. |
cef6472
to
98a3c0a
Compare
@ChrisLoer been looking what it would take to remove nunicode completely, I was able to get it compile with providing a platform specific implementation of Thoughts? |
The idea sounds attractive to me, but don't you need something that can denormalize instead of normalize? My understanding of the nunicode implementation is that it essentially denormalizes the text and then strips out all the combining characters that are in one of the "accent" blocks:
If you're going to work on normalized text then you'd have to maintain a much larger map of character transformations. |
Looks like it supports passing |
Thanks for the input! was able to get it working and the render tests confirm this: ActualExpectedDiffI took the approach to remove diatrics with a regex as shown in this blogpost. Will clean up the code and merge to this PR. |
platform/android/src/unaccent.cpp
Outdated
jni::String unaccented = android::java::lang::String::replaceAll(*env, normalized, | ||
jni::Make<jni::String>(*env, "\\p{InCombiningDiacriticalMarks}+"), | ||
jni::Make<jni::String>(*env, "")); | ||
return jni::Make<std::string>(*env, unaccented); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement an unaccent(String str)
function in com.mapbox.mapboxsdk.utils.StringUtils
that does all of that instead of going back and forth through the JNI? That could potentially lower the binary footprint a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is way simpler solution, addressed with 5b326ef
d52559c
to
5b326ef
Compare
platform/android/src/unaccent.cpp
Outdated
|
||
std::string unaccent(const std::string& str) { | ||
android::UniqueEnv env = android::AttachEnv(); | ||
jni::String unaccented = android::StringUtils::unaccent(*env,jni::Make<jni::String>(*env, str)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete refs here to avoid table overflows in tight loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, looking forward using jni::SeizeLocal
to avoid these issues!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be even better than that -- mapbox/jni.hpp#40.
5b326ef
to
f13eee1
Compare
@NonNull | ||
static String unaccent(@NonNull String value) { | ||
return Normalizer.normalize(value, Normalizer.Form.NFD) | ||
.replaceAll("\\p{InCombiningDiacriticalMarks}+", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! From https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html, it looks like you can use match any Unicode block, in which case we can get a little closer to the nunicode implementation by also matching against Combining Diacritical Marks Extended/for Symbols/Supplement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisLoer having a hard time figuring out the correct configuration for above 😅Typography is clearly not my cup of tea. Based on above, I should allow any unicode character? Thus something as"[^\\p{ASCII}]"
or do I need to work on allowing multiple \\p{}
items (For example: InCombiningDiacriticalMarks in combination with configurations on symbols/supplement from https://www.regular-expressions.info/unicode.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I think it'll be something like:
(\\p{InCombiningDiacriticalMarks}|\\p{InCombiningDiacriticalMarksExtended}|\\p{InCombiningDiacriticalMarksForSymbols}|\\p{InCombiningDiacriticalMarksSupplement})+
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I wonder why "Extended" wasn't included (although the documentation is pretty vague on the naming -- it sounds like you just strip the spaces from the Unicode block name). That will make for a small difference from the nunicode implementation, but we already tolerate differences in collator behavior from device to device and locale to locale so I think it should be OK.
f13eee1
to
64bd64c
Compare
a11979e
to
21fc53f
Compare
21fc53f updates this PR to be jni 0.4.0 compatible. |
21fc53f
to
ff68253
Compare
platform/android/src/string_util.cpp
Outdated
std::string uppercase(const std::string& str) { | ||
auto env{ android::AttachEnv() }; | ||
jni::Local<jni::String> value = jni::Make<jni::String>(*env, str.c_str()); | ||
static auto& javaClass = jni::Class<jni::StringTag>::Singleton(*env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fold this line into the next to avoid the need for another static variable. Same for lowercase
ff68253
to
67f605f
Compare
…r uppercasing an lowercasing with an Android specific String.java equivalent
67f605f
to
4e49c7b
Compare
This PR is the first step to remove nunicode from the Mapbox Maps SDK for Android. Removing this c++ library would result in decreasing the overal binary size of the SDK.
In this PR, we are replacing the nunicode uppercase and lowercase invocations by wiring through jni to leverage the android platform specific
java/lang/String#toUpperCase()
andjave/lang/String#toLowerCase()
instead.Note though, we can't remove nunicode yet as it's used for collator expressions (#12268).
@ChrisLoer would it be possible to move that implementaton to Collator.java in the same way as this PR does for upperCasing/lowerCasing?