Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n.properties responds with wrong charset #340

Closed
4 tasks
htammen opened this issue May 14, 2019 · 10 comments · Fixed by SAP/ui5-server#201
Closed
4 tasks

i18n.properties responds with wrong charset #340

htammen opened this issue May 14, 2019 · 10 comments · Fixed by SAP/ui5-server#201
Assignees
Labels
bug Something isn't working module/ui5-server Related to the UI5 Server module

Comments

@htammen
Copy link

htammen commented May 14, 2019

Expected Behavior

When using german umlaute like ä,ü,ö or other language specific chraracters in i18n.properties file it is expected that they are displayed as such in the running UI5 application.

Current Behavior

When an Umlaut is used like in "Anträge" it is displayed as "Anträge".

Steps to reproduce the issue

  1. create a text resource in i18n.properties with an Umlaut, e.g. "requests=Anträge" and save the file with UTF-8 charset
  2. use the text resource in a xml view of an UI5 app, e.g. <Text text="{i18n>requests}" />
  3. run the app

This happens when using "ui5 serve" to serve the development state of the app as well as when using "ui5 build" to build the app and then serve the content of the .dist folder with a webserver.

The i18n.properties file is delivered with Content-Type: text/plain; charset=ISO-8859-1.
If I use e.g. WebIDE to run my application the i18n.properties has Content-Type: text/plain

When I save the i18n.properties with charset "ISO 8859-1" and use UI5 tooling to serve the app the result in the browser looks like this: Antr�ge

Context

  • UI5 Module Version (output of ui5 --version when using the CLI): 1.4.0
  • Node.js Version: v10.15.0
  • npm Version: 6.6.0
  • OS/Platform: Mac OS Mojave 10.14.4
  • Browser (if relevant): Chrome 74.0.3729.131, Firefox 66.0.3, Safari 12.1
  • Other information:

Affected components (if known)

Log Output / Stack Trace

@htammen htammen changed the title i18n.properties responded with wrong charset i18n.properties responds with wrong charset May 14, 2019
@htammen
Copy link
Author

htammen commented May 15, 2019

regarding the conventions... take a look: https://ui5.sap.com/#/topic/753b32617807462d9af483a437874b36

  • Text files must be UTF-8 encoded (HANA); only *.properties and *.hdbtextbundle files must be ISO8859-1 encoded as defined in the corresponding standard.

So when I save my i18n.properties files with ISO-8859-1 encoding the output of the serve and build commands should be correct.
But as mentioned above it is not.

@NoelHendrikx
Copy link

When saving the properties files to ISO8859-1, Edge works fine, but Chrome doesn't.

@RandomByte
Copy link
Member

Your .properties that your file must be ISO 8859-1 encoded.

While ISO 8859-1 supports umlauts, from my understanding Chrome and other browsers convert the received strings to UTF8 before displaying them to the user. This goes wrong for ISO 8859-1 encoded umlauts.

Therefore, afaik only ASCII characters can safely be used in .properties files. Other characters must be unicode escaped like in the OpenUI5 Sample App:
https://github.com/SAP/openui5-sample-app/blob/master/webapp/i18n/i18n_de.properties#L3

Also see https://github.com/SAP/ui5-server/issues/7#issuecomment-401023752

@codeworrior
Copy link
Member

While ISO 8859-1 supports umlauts, from my understanding Chrome and other browsers convert the received strings to UTF8 before displaying them to the user. This goes wrong for ISO 8859-1 encoded umlauts.

To my understanding, this is not true. Browsers for each textual resource determine an encoding and then decode the resource based on that encoding. If the browser is able to determine the right encoding for a ISO 8859-1 file, it should properly decode and display umlauts etc.

The crucial part is the determination of the encoding. For a secondary resource, it is based on several sources of information:

UI5 runtime mainly relies on the first one. We also suggest a tag in the main page to properly ensure a default encoding of utf-8. For the *.properties files, we expect the server to send the correct Content-Type header with charset=ISO-8859-1.

Bringing all this together, I would have expected the following to work:

  • create and store the file in ISO-8859-1, using any char supported by that encoding
  • use a server that sends the right Content-Type for *.properties file
  • open the app in the browser

If you store the file in some other encoding, the server will send an inconsistent response (assuming that the server sends the file as binary or uses the encoding that it knows)
If the server does not send the right encoding, the browser might use a default (utf-8) or use sniffing which might or might not find the right encoding (depending on the position of the first non-ASCII byte in the binary stream and on whether the non-ASCII byte gives the right hint reg. the encoding)

From the initial description, it wasn't clear to me whether the expected chain is working or not as expected.

@RandomByte
Copy link
Member

@codeworrior I tested pretty much that with the OpenUI5 Sample App. I re-created the i18n_de.properties file with ISO 8859-1 encoding and umlauts (part of the ISO 8859-1 charset). With the UI5 Server, in Chrome I get what I think is called "replacement characters":

OpenUI5 Sample App encoding issue

To reproduce yourself:

  1. Download encoding-test.zip
  2. Extract and execute npm i && ui5 serve -o index.html?sap-language=de

@codeworrior
Copy link
Member

Indeed, I can reproduce the same locally with the zip.

But when I put the same sources into a Tomcat with an active UI5 servlet (which I only used to get the Content-Type with charset=ISO-8859-1), everything works as expected.

I would therefore assume that some middleware in our ui5 serve reads the files with the wrong encoding (instead of piping them through as a binary).

@codeworrior
Copy link
Member

I would rate this as a bug in ui5 serve.

@RandomByte RandomByte transferred this issue from SAP/ui5-tooling Jun 11, 2019
@RandomByte RandomByte self-assigned this Jun 11, 2019
@RandomByte
Copy link
Member

Yes, you are right. This seems to be an issue with the usage of the replaceStream module here:

https://github.com/SAP/ui5-server/blob/d0e747d598b8f6696755581582f53e276260c72c/lib/middleware/serveResources.js#L62-L72

When removing the replaceStream-block, the client receives the correct data.

What bothers me is that I don't yet understand the behavior of replaceStream and Nodes http.ServerResponse (which express' res object extends)

When supplying replaceStream with an "encoding": "latin1" option, it seems to convert the stream to UTF-8 (which kinda matches what this headline promises).
The resulting stream consists of raw buffer chunks that return correct strings when logging them with chunk.toString("utf8") (=> ü) and corrupt strings with chunk.toString("latin1") (=> ü).

Piping that stream into the express res object sends something (to be checked what is actually sent on the network) to the client. Note that the charset in the response header is still ISO-8859-1, which supposedly leads to the same corrupt string on the client (=> ü). Changing the charset to utf-8 in that case results in the correct string on the client (=> ü).

Long story short, we could exclude non UTF-8 encoded files from the version string replacement. We already exclude *.properties files when building a UI5 project anyways.

In addition, it turns out that mime-db (which we use through mime-types) does not contain a single type that has a different charset than UTF-8. So it's basically only *.properties files that are affected.

Note that we use replaceStream also in the UI5 Builder.

RandomByte referenced this issue in SAP/ui5-server Jun 18, 2019
The replaceStream module converts all string it processes to UTF-8.
Therefore, stop using it for strings that are not UTF-8 encoded.

Resolves https://github.com/SAP/ui5-server/issues/196
RandomByte referenced this issue in SAP/ui5-server Jun 24, 2019
The replaceStream module converts all string it processes to UTF-8.
Therefore, stop using it for strings that are not UTF-8 encoded.

Resolves https://github.com/SAP/ui5-server/issues/196
@RandomByte
Copy link
Member

RandomByte commented Jun 24, 2019

This should be resolved with UI5 CLI v1.5.3

@htammen
Copy link
Author

htammen commented Jun 26, 2019

Thx, works if I save the i18n.properties files with ISO-8859-1. Don't think that this is an international solution but it is ok for me in the European area

@RandomByte RandomByte transferred this issue from SAP/ui5-server Nov 20, 2020
@RandomByte RandomByte added bug Something isn't working module/ui5-server Related to the UI5 Server module labels Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module/ui5-server Related to the UI5 Server module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants