Skip to content
This repository was archived by the owner on Dec 20, 2018. It is now read-only.

Simplified Chinese characters are garbled in project files #5

Closed
Eilon opened this issue Sep 8, 2017 · 14 comments
Closed

Simplified Chinese characters are garbled in project files #5

Eilon opened this issue Sep 8, 2017 · 14 comments

Comments

@Eilon
Copy link
Contributor

Eilon commented Sep 8, 2017

From @HeMinzhang on August 26, 2017 12:23

I use Visual Studio community 2017 15.3.1 create a asp.net core 2.0 angular project.But simplified Chinese characters garbled.
I convert
project\ClientApp\app\components\app\app.component.html project\ClientApp\app\components\home\home.component.html
project\ClientApp\app\components\fetchdata\fetchdata.component.html

Ansi file format to uft8 file.
simplified Chinese characters show normal.Can you fix the bug?

Copied from original issue: aspnet/Mvc#6716

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

@HeMinzhang we need more information to understand this issue. Can you please upload an app to GitHub so that we can investigate?

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @khellang on August 28, 2017 17:23

Sounds like an issue with the templates and which encoding is used when scaffolding a project...

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

Certainly could be.

@SteveSandersonMS - these files look like they're from the SPA template - is there any weird file encoding going on there? I think we normally do UTF-8 without BOM.

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @SteveSandersonMS on September 7, 2017 14:20

@Eilon It looks like we're inconsistent about whether files include a BOM or not. Most (but not all) of the cshtml ones in StarterWeb-CSharp (i.e., MVC starter site) do have BOMs, whereas most of them (possibly all, didn't check) in RazorWebPages-CSharp do not have BOMs.

As a specific example, the MVC starter site's TwoFactorAuthentication.cshtml does not have a BOM, whereas the file SetPassword.cshtml alongside it in the same directory does have a BOM.

The SPA templates are most similar to the RazorWebPages one in that they don't have BOMs except for some .cs files that were apparently originally copied from something like StarterWeb-CSharp.

The net result of all this is that if you paste a simplified Chinese character into a non-BOM file and save it in VS, then VS will prompt you asking for permission to "change the encoding" (by which it means save with a BOM). If you say yes, all is well. If you say no, well, garbling ensues.

What to do about it

At the very least we should be deliberate and consistent about where BOMs are used or not.

As for a general policy, the Unicode spec is ambiguous:

Use of a BOM is neither required nor recommended for UTF-8 [source]

... but do they mean "we recommend that you don't", or do they mean "we are neutral on this subject and offer no recommendation"? Aside from the extra 3 bytes per file and loss of compatibility with ancient pre-UTF-8 editors, I'm not sure if there's any drawback to putting in BOMs everywhere.

It would certainly help developers avoid issues with this if we did put BOMs on all the text files.

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @HeMinzhang on September 7, 2017 14:52

WebApplication1.zip

Simplified Chinese in "WebApplication1\WebApplication1\ClientApp\app\components\navmenu\navmenu.component.html"

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

@SteveSandersonMS I think that phrasing is generally used to mean "it is not required and it is not recommended to use," which agrees with what I've heard from others - i.e. don't use a BOM.

But either way, I completely agree that all the template files need to be consistent. And beyond that, we need a test that checks for BOM presence (or un-presence).

@rynowak / @Tratcher - I think maybe you've had opinions on BOMs before. Any thoughts on this?

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @Tratcher on September 7, 2017 21:42

UTF-8 encode all the files. The BOM should only be excluded when transmitting over the network, as the encoding is specified in a header.

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

Everything here is already all UTF-8. The question is only whether the files on disk should have a BOM or not.

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @Tratcher on September 7, 2017 21:45

Yes? How else do you save the encoding for the file? In additional metadata?

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

From @rynowak on September 7, 2017 22:24

Yeah, I'm not the expert on this. We should talk to the tooling team.

@Eilon
Copy link
Contributor Author

Eilon commented Sep 8, 2017

@Tratcher I had been under the impression that "these days" you can just assume UTF-8. Perhaps that's just not true. I always used to be a huge fan of BOMs because they're explicit.

@SteveSandersonMS
Copy link
Member

@Eilon What's the latest on this? Did you get consensus?

@Eilon
Copy link
Contributor Author

Eilon commented Nov 27, 2017

@rynowak do you have any further thoughts on this?

Also adding @mlorbetske and @seancpeters in case they've encountered this in other templates.

@Eilon
Copy link
Contributor Author

Eilon commented Mar 5, 2018

Closing because we don't have plans to update the files mentioned here w/ regard to BOMs. We haven't heard any other feedback on these files, so it seems like this is not a broad issue.

@Eilon Eilon closed this as completed Mar 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants