-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CAMEL-21199] Camel-jackson not properly marshalling 4-byte characters #15515
Conversation
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🤖 CI automation will test this PR automatically. 🐫 Apache Camel Committers, please review the following items:
|
This should be fixed in jackson instead of workaround in camel that can affect others. |
@davsclaus I'll try to provide a fix for Jackson, but if you see the history, it's open from 2015 and there's not much effort to fix it from their side. Several PRs were even rejected as too dangerous for Jackson-core :( |
Yes and if its dangerous for jackson, then why is it not dangerous for camel ? What you can/should do is to create a JIRA ticket, and add a new option to the dataformat, where you can configure this with a boolean (default false) and then only use this new code if enabled. Then existing is not affected at all. |
It was claimed dangerous by Jackson authors to mess with UTF8JsonGenerator (which containts the error, but is also fragile). This solution forces Jackson to use WriterBasedJsonGenerator inside, which seems to work fine.
Seems like an overkill to me |
If its so non dangerous why has jackson not fixed it already. Please respect the large existing user-base of Camel that dont want hacks in commonly used components. Also we need unit tests to go with that thanks. |
Don't ask me. Ask them :) |
@davsclaus I agree that it should be fixed in Jackson. Sadly, it hasn't been fixed for a long time and they don't consider it much of an issue. This hack as you call it just tries to overcome the error by invoking different string-writing mechanism (on their side) and things should be same as soon as calling Jackson marshal method ends. I see no point in providing an option. It's hardly an option for Camel users. It's either fixed, or not.
It cannot be unit-tested. It's integration issue. If you don't want this fix (which I can understand), feel free to close it. I'll then try to create a fix for Jackson itself. But I doubt they will accept it... |
I am sorry but we cannot risk affecting everyone else with some code change that Jackson itself does have out of the box, and its been outstanding since 2015. Would such a change cause unforseen changes to existing users, performance degration or what else. Are there other reported tickets at Jackson than this single instance?
What we ask is that there is a junit test class in the PR that tests this code change.
Maybe take more time to investigate if there are other reports on Jackson. But in the mean time there is only a solution you can do and that is on the Camel side. But that requires more work which you then must step up and do. And it would require a JIRA ticket in the Apache Camel JIRA at |
I'm not sure if we talk about the same thing here. Jackson offers two string generators - UTF8JsonGenerator (which we are using at the moment and which contains the error) and WriterBasedJsonGenerator (which seems to handle 4-byte characters fine). I don't see switching from one to another risky for Camel users, since they should behave exactly the same (not speaking about the error itself). |
5ffcae6
to
0bc57cc
Compare
@davsclaus I've added a test for the fix. Also, the Camel JIRA issue has been created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely need to close the OutputStreamWriter
... waiting for clarifications.
...amel-jackson/src/main/java/org/apache/camel/component/jackson/AbstractJacksonDataFormat.java
Outdated
Show resolved
Hide resolved
...amel-jackson/src/main/java/org/apache/camel/component/jackson/AbstractJacksonDataFormat.java
Outdated
Show resolved
Hide resolved
BTW, I am only reviewing the technicalities here. @davsclaus points are a lot more important than my notes. |
.../src/test/java/org/apache/camel/component/jackson/JacksonMarshalMultibyteCharactersTest.java
Outdated
Show resolved
Hide resolved
@orpiske That's fine. Thak you very much for your review! |
We use unicode escape to have foregin symbols in the camel source code. Can you see if you can replace those to escaped values in this PR unit test. |
@Test | ||
public void testMarshal4ByteCharacter() throws Exception { | ||
Map<String, Object> in = new HashMap<>(); | ||
in.put("name", "システム\uD867\uDE3D"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use unicode escape to have foregin symbols in the camel source code.
Can you see if you can replace those to escaped values in this PR unit test.
Added option |
On main branch we have the new option |
@rnetuka what is the status of this ? |
PR for Jackson issue 223 has been merged: FasterXML/jackson-core#1335. There will be a new Jackson option to combine utf-16 surrogate characters into a one unicode char in their UTF8JsonGenerator. I'm closing this PR and will create a replacement with the new option being supported. |
Issue: https://issues.apache.org/jira/browse/CAMEL-21199
Jackson doesn't handle 4-byte characters well. Marshalling a 4-byte Japanese kanji character results in two UTF-16 escapes to be written instead of the character itself. While this is ok for emoji an such, it's not for natural languages.
Jackson issue: FasterXML/jackson-core#223
Reproducer:
with the file input.txt containing:
[{"name": "システム𩸽"}]
Solution:
While it's an error in Jackson-core, it could be avoided by forcing Jackson to use WriterBasedJsonGenerator instead of UTF8JsonGenerator (see the original Jackson issue reproducer in the topmost comment)