Skip to content
This repository was archived by the owner on Dec 14, 2018. It is now read-only.

Simplified Chinese characters are garbled in project files #6716

Closed
JohnHe404 opened this issue Aug 26, 2017 · 11 comments
Closed

Simplified Chinese characters are garbled in project files #6716

JohnHe404 opened this issue Aug 26, 2017 · 11 comments

Comments

@JohnHe404
Copy link

I use Visual Studio community 2017 15.3.1 create a asp.net core 2.0 angular project.But simplified Chinese characters garbled.
I convert
project\ClientApp\app\components\app\app.component.html project\ClientApp\app\components\home\home.component.html
project\ClientApp\app\components\fetchdata\fetchdata.component.html

Ansi file format to uft8 file.
simplified Chinese characters show normal.Can you fix the bug?

@Eilon
Copy link
Member

Eilon commented Aug 28, 2017

@HeMinzhang we need more information to understand this issue. Can you please upload an app to GitHub so that we can investigate?

@khellang
Copy link
Contributor

Sounds like an issue with the templates and which encoding is used when scaffolding a project...

@Eilon
Copy link
Member

Eilon commented Sep 1, 2017

Certainly could be.

@SteveSandersonMS - these files look like they're from the SPA template - is there any weird file encoding going on there? I think we normally do UTF-8 without BOM.

@Eilon Eilon changed the title I once again found BUG Simplified Chinese characters are garbled in project files Sep 1, 2017
@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Sep 7, 2017

@Eilon It looks like we're inconsistent about whether files include a BOM or not. Most (but not all) of the cshtml ones in StarterWeb-CSharp (i.e., MVC starter site) do have BOMs, whereas most of them (possibly all, didn't check) in RazorWebPages-CSharp do not have BOMs.

As a specific example, the MVC starter site's TwoFactorAuthentication.cshtml does not have a BOM, whereas the file SetPassword.cshtml alongside it in the same directory does have a BOM.

The SPA templates are most similar to the RazorWebPages one in that they don't have BOMs except for some .cs files that were apparently originally copied from something like StarterWeb-CSharp.

The net result of all this is that if you paste a simplified Chinese character into a non-BOM file and save it in VS, then VS will prompt you asking for permission to "change the encoding" (by which it means save with a BOM). If you say yes, all is well. If you say no, well, garbling ensues.

What to do about it

At the very least we should be deliberate and consistent about where BOMs are used or not.

As for a general policy, the Unicode spec is ambiguous:

Use of a BOM is neither required nor recommended for UTF-8 [source]

... but do they mean "we recommend that you don't", or do they mean "we are neutral on this subject and offer no recommendation"? Aside from the extra 3 bytes per file and loss of compatibility with ancient pre-UTF-8 editors, I'm not sure if there's any drawback to putting in BOMs everywhere.

It would certainly help developers avoid issues with this if we did put BOMs on all the text files.

@JohnHe404
Copy link
Author

JohnHe404 commented Sep 7, 2017

WebApplication1.zip

Simplified Chinese in "WebApplication1\WebApplication1\ClientApp\app\components\navmenu\navmenu.component.html"

@Eilon
Copy link
Member

Eilon commented Sep 7, 2017

@SteveSandersonMS I think that phrasing is generally used to mean "it is not required and it is not recommended to use," which agrees with what I've heard from others - i.e. don't use a BOM.

But either way, I completely agree that all the template files need to be consistent. And beyond that, we need a test that checks for BOM presence (or un-presence).

@rynowak / @Tratcher - I think maybe you've had opinions on BOMs before. Any thoughts on this?

@Tratcher
Copy link
Member

Tratcher commented Sep 7, 2017

UTF-8 encode all the files. The BOM should only be excluded when transmitting over the network, as the encoding is specified in a header.

@Eilon
Copy link
Member

Eilon commented Sep 7, 2017

Everything here is already all UTF-8. The question is only whether the files on disk should have a BOM or not.

@Tratcher
Copy link
Member

Tratcher commented Sep 7, 2017

Yes? How else do you save the encoding for the file? In additional metadata?

@rynowak
Copy link
Member

rynowak commented Sep 7, 2017

Yeah, I'm not the expert on this. We should talk to the tooling team.

@Eilon
Copy link
Member

Eilon commented Sep 8, 2017

This issue was moved to aspnet/Templating#5

@Eilon Eilon closed this as completed Sep 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants