Add STRING_BYTES_CHARSET define to explicitly set charset for mapping java.lang.String to/from bytes #460

equeim · 2021-02-14T20:14:25Z

Right now, JavaCPP uses either default Java charset or Modified UTF-8 (if MODIFIED_UTF8_STRING is defined)
when mapping java.lang.String to/from bytes (or containers like std::string via adapters).

Add STRING_BYTES_CHARSET define to explicitly set charset that will be used in these conversions.
STRING_BYTES_CHARSET, if defined, must be a string literal containing name or alias of supported Java charset (e.g. "UTF-8").

…ing java.lang.String to/from bytes Right now, JavaCPP uses either default Java charset or Modified UTF-8 (if MODIFIED_UTF8_STRING is defined) when mapping java.lang.String to/from bytes (or containers like std::string via adapters). Add STRING_BYTES_CHARSET define to explicitly set charset that will be used in these conversions. STRING_BYTES_CHARSET, if defined, must be a string literal containing name or alias of supported Java charset (e.g. "UTF-8"),

saudet · 2021-02-15T03:24:14Z

src/main/java/org/bytedeco/javacpp/tools/Generator.java

@@ -497,6 +498,14 @@ boolean classes(boolean handleExceptions, boolean defineAdapters, boolean conver
        out.println("static jmethodID JavaCPP_stringMID = NULL;");
        out.println("static jmethodID JavaCPP_getBytesMID = NULL;");
        out.println("static jmethodID JavaCPP_toStringMID = NULL;");
+        out.println("#ifdef STRING_BYTES_CHARSET");
+        out.println("#ifdef MODIFIED_UTF8_STRING");
+        out.println("#error \"STRING_BYTES_CHARSET and MODIFIED_UTF8_STRING must not be defined together\"");


I think we could just have a warning here? I can see MODIFIED_UTF8_STRING having priority over STRING_BYTES_CHARSET for backward compatibility purposes.

I don't really see any value in that. This features are mutually exclusive, and to set String charset user will need to add a new define, which means that they most likely wouldn't have any trouble in removing another one at the same time.

Also, C++ standard doesn't specify preprocessor directive for compile warning, only for error. Which means that we would have to add our own that delegates to compiler-specific ones.

They are mutually exclusive, but say you have an upstream project and your downstream project is using MODIFIED_UTF8_STRING. For some reason, the upstream project starts using STRING_BYTES_CHARSET, which would now prevent the downstream project from compiling. With only a warning though, both projects can continue to function properly.

Most compilers support the #warning directive, and for those that don't, it's just going to end up in an error, so that's fine, right? :)

Ah, I see the problem. MSVC is the one that still doesn't support #warning. On the other hand, Clang and GCC have been supporting #pragma message for a while now, so let's use that. In the worst case, it's only going to end up getting ignored, which is still better than a failed build.

saudet reviewed Feb 15, 2021

View reviewed changes

saudet added 3 commits March 1, 2021 20:28

Change #error for a #pragma message warning

3e1a097

Merge remote-tracking branch 'upstream/master' into string-charset

131d78e

Update CHANGELOG.md

f76e799

saudet merged commit 5c976fc into bytedeco:master Mar 1, 2021

saudet mentioned this pull request Mar 1, 2021

NullPointerException in Parser and poor Unicode support #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add STRING_BYTES_CHARSET define to explicitly set charset for mapping java.lang.String to/from bytes #460

Add STRING_BYTES_CHARSET define to explicitly set charset for mapping java.lang.String to/from bytes #460

equeim commented Feb 14, 2021

saudet Feb 15, 2021

equeim Feb 17, 2021

saudet Feb 18, 2021

saudet Mar 1, 2021

Add STRING_BYTES_CHARSET define to explicitly set charset for mapping java.lang.String to/from bytes #460

Add STRING_BYTES_CHARSET define to explicitly set charset for mapping java.lang.String to/from bytes #460

Conversation

equeim commented Feb 14, 2021

saudet Feb 15, 2021

Choose a reason for hiding this comment

equeim Feb 17, 2021

Choose a reason for hiding this comment

saudet Feb 18, 2021

Choose a reason for hiding this comment

saudet Mar 1, 2021

Choose a reason for hiding this comment