-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve BOM in web editor #28935
Preserve BOM in web editor #28935
Conversation
The ToUTF8* functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional. This does: - Add a options struct to all ToUTF8* functions, that by default will strip BOM to preserve existing behaviour - Remove ToUTF8 function, it was dead code - Rename ToUTF8WithErr to ToUTF8 - Preserve BOM in Monaco Editor - Remove a unnecessary newline in the textarea value. Browsers did ignore it, it seems but it's better not to rely on this behaviour. Fixes: go-gitea#28743 Related: go-gitea#6716
I think this fix is not complete, and it is also related to #28119 (web editor EOL) and others, they are the sample problem. The key problem is not "what MonacoEditor does", but it is "what Gitea should do". Think about some files in a git repo:
By existing approach, they could all be edited by MonacoEditor. The real problem is when "saving" the files, backend code does incorrect assumption: it only recognizes UTF-8 or "\n". So UTF-16 BOM and "\r\n" files are all changed unexpectedly. So, the complete approach could be like this IMO: when saving, the backend code should first detect the file kind (BOM, EOL), then convert the submitted text to the expected format (correct BOM, correct EOL), then everything should work well. |
I'm not sure such detection is neccesary. I would just ensure the file round trips through backend and frontend unmodified while editing. EOL is a more complicated topic because of the possibility for mixed EOL, but BOM is imho quite a clearcut case: If it's there, keep it. Ideally even indicate its precence in the editor. So for BOM, we should render it without BOM for display (see #6716 regarding rendering artifacts). Only when passing to the editor, preserve it and this is exactly what this PR does. I will test if UTF16 BOM works as well with this PR, I assume it will. |
It seems UTF-16 rendering is broken in Gitea and a number of other tools. See this file. Gitea, VSCode, Monaco and Sublime Text all seem to convert it to UTF-8 and they render this: GitHub renders the letters "test" correctly but shows a broken BOM rendering: So I think UTF-16 is a diffferent topic for another PR. We should first render it correctly, then we can talk about BOM there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there could be some improvements, eg: Remove ToUTF8WithFallback
, and introduce "AutoFallback" option for "ConvertOpts"). And ToUTF8DropErrors
looks strange, ideally there could be an option like IgnoreBroken
.
And as the comment above, to resolve the UTF-16 BOM and "\r\n" problems, it needs a new approach.
Anyway, they are all legacy problems.
Merging as-is. More improvements can come later. I agree we should move all function variants to the options struct. |
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional depending on use case. This does: - Add a options struct to all `ToUTF8*` functions, that by default will strip BOM to preserve existing behaviour - Remove `ToUTF8` function, it was dead code - Rename `ToUTF8WithErr` to `ToUTF8` - Preserve BOM in Monaco Editor - Remove a unnecessary newline in the textarea value. Browsers did ignore it, it seems but it's better not to rely on this behaviour. Fixes: go-gitea#28743 Related: go-gitea#6716 which seems to have once introduced a mechanism that strips and re-adds the BOM, but from what I can tell, this mechanism was removed at some point after that PR.
Backport #28935 by @silverwind The `ToUTF8*` functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional depending on use case. This does: - Add a options struct to all `ToUTF8*` functions, that by default will strip BOM to preserve existing behaviour - Remove `ToUTF8` function, it was dead code - Rename `ToUTF8WithErr` to `ToUTF8` - Preserve BOM in Monaco Editor - Remove a unnecessary newline in the textarea value. Browsers did ignore it, it seems but it's better not to rely on this behaviour. Fixes: #28743 Related: #6716 which seems to have once introduced a mechanism that strips and re-adds the BOM, but from what I can tell, this mechanism was removed at some point after that PR. Co-authored-by: silverwind <me@silverwind.io>
* giteaofficial/main: [skip ci] Updated licenses and gitignores Fix bug for generated repository object format (go-gitea#28969) Fixing small space missing in sample config file (go-gitea#28967) Fix inconsistent naming of OAuth 2.0 `ENABLE` setting (go-gitea#28951) Add screenshot for "Profile Readmes" to docs (go-gitea#28964) Simplify how git repositories are opened (go-gitea#28937) Preserve BOM in web editor (go-gitea#28935) Make loading animation less aggressive (go-gitea#28955) Fix SSPI user creation (go-gitea#28948) Strip `/` from relative links (go-gitea#28932) Fix non-alphabetic sorting of repo topics (go-gitea#28938) Don't remove all mirror repository's releases when mirroring (go-gitea#28817) Use new RPM constants (go-gitea#28931) Check for sha256 support to use --object-format flag (go-gitea#28928) fix: update enable_prune even if mirror_interval is not provided (go-gitea#28905) Implement `MigrateRepository` for the actions notifier (go-gitea#28920) Respect branch info for relative links (go-gitea#28909)
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional depending on use case. This does: - Add a options struct to all `ToUTF8*` functions, that by default will strip BOM to preserve existing behaviour - Remove `ToUTF8` function, it was dead code - Rename `ToUTF8WithErr` to `ToUTF8` - Preserve BOM in Monaco Editor - Remove a unnecessary newline in the textarea value. Browsers did ignore it, it seems but it's better not to rely on this behaviour. Fixes: go-gitea#28743 Related: go-gitea#6716 which seems to have once introduced a mechanism that strips and re-adds the BOM, but from what I can tell, this mechanism was removed at some point after that PR.
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.io/gitea/gitea](https://github.com/go-gitea/gitea) | patch | `1.21.4` -> `1.21.5` | --- ### Release Notes <details> <summary>go-gitea/gitea (docker.io/gitea/gitea)</summary> ### [`v1.21.5`](https://github.com/go-gitea/gitea/releases/tag/v1.21.5) [Compare Source](go-gitea/gitea@v1.21.4...v1.21.5) - SECURITY - Prevent anonymous container access if `RequireSignInView` is enabled ([#​28877](go-gitea/gitea#28877)) ([#​28882](go-gitea/gitea#28882)) - Update go dependencies and fix go-git ([#​28893](go-gitea/gitea#28893)) ([#​28934](go-gitea/gitea#28934)) - BUGFIXES - Revert "Speed up loading the dashboard on mysql/mariadb ([#​28546](go-gitea/gitea#28546))" ([#​29006](go-gitea/gitea#29006)) ([#​29007](go-gitea/gitea#29007)) - Fix an actions schedule bug ([#​28942](go-gitea/gitea#28942)) ([#​28999](go-gitea/gitea#28999)) - Fix update enable_prune even if mirror_interval is not provided ([#​28905](go-gitea/gitea#28905)) ([#​28929](go-gitea/gitea#28929)) - Fix uploaded artifacts should be overwritten ([#​28726](go-gitea/gitea#28726)) backport v1.21 ([#​28832](go-gitea/gitea#28832)) - Preserve BOM in web editor ([#​28935](go-gitea/gitea#28935)) ([#​28959](go-gitea/gitea#28959)) - Strip `/` from relative links ([#​28932](go-gitea/gitea#28932)) ([#​28952](go-gitea/gitea#28952)) - Don't remove all mirror repository's releases when mirroring ([#​28817](go-gitea/gitea#28817)) ([#​28939](go-gitea/gitea#28939)) - Implement `MigrateRepository` for the actions notifier ([#​28920](go-gitea/gitea#28920)) ([#​28923](go-gitea/gitea#28923)) - Respect branch info for relative links ([#​28909](go-gitea/gitea#28909)) ([#​28922](go-gitea/gitea#28922)) - Don't reload timeline page when (un)resolving or replying conversation ([#​28654](go-gitea/gitea#28654)) ([#​28917](go-gitea/gitea#28917)) - Only migrate the first 255 chars of a Github issue title ([#​28902](go-gitea/gitea#28902)) ([#​28912](go-gitea/gitea#28912)) - Fix sort bug on repository issues list ([#​28897](go-gitea/gitea#28897)) ([#​28901](go-gitea/gitea#28901)) - Fix `DeleteCollaboration` transaction behaviour ([#​28886](go-gitea/gitea#28886)) ([#​28889](go-gitea/gitea#28889)) - Fix schedule not trigger bug because matching full ref name with short ref name ([#​28874](go-gitea/gitea#28874)) ([#​28888](go-gitea/gitea#28888)) - Fix migrate storage bug ([#​28830](go-gitea/gitea#28830)) ([#​28867](go-gitea/gitea#28867)) - Fix archive creating LFS hooks and breaking pull requests ([#​28848](go-gitea/gitea#28848)) ([#​28851](go-gitea/gitea#28851)) - Fix reverting a merge commit failing ([#​28794](go-gitea/gitea#28794)) ([#​28825](go-gitea/gitea#28825)) - Upgrade xorm to v1.3.7 to fix a resource leak problem caused by Iterate ([#​28891](go-gitea/gitea#28891)) ([#​28895](go-gitea/gitea#28895)) - Fix incorrect PostgreSQL connection string for Unix sockets ([#​28865](go-gitea/gitea#28865)) ([#​28870](go-gitea/gitea#28870)) - ENHANCEMENTS - Make loading animation less aggressive ([#​28955](go-gitea/gitea#28955)) ([#​28956](go-gitea/gitea#28956)) - Avoid duplicate JS error messages on UI ([#​28873](go-gitea/gitea#28873)) ([#​28881](go-gitea/gitea#28881)) - Bump `@github/relative-time-element` to 4.3.1 ([#​28819](go-gitea/gitea#28819)) ([#​28826](go-gitea/gitea#28826)) - MISC - Warn that `DISABLE_QUERY_AUTH_TOKEN` is false only if it's explicitly defined ([#​28783](go-gitea/gitea#28783)) ([#​28868](go-gitea/gitea#28868)) - Remove duplicated checkinit on git module ([#​28824](go-gitea/gitea#28824)) ([#​28831](go-gitea/gitea#28831)) Instances on **[Gitea Cloud](https://cloud.gitea.com)** will be automatically upgraded to this version during the specified maintenance window. </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNjUuMCIsInVwZGF0ZWRJblZlciI6IjM3LjE2NS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9--> Reviewed-on: https://git.home/nrdufour/home-ops/pulls/355 Co-authored-by: Renovate <renovate@ptinem.io> Co-committed-by: Renovate <renovate@ptinem.io>
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional depending on use case. This does: - Add a options struct to all `ToUTF8*` functions, that by default will strip BOM to preserve existing behaviour - Remove `ToUTF8` function, it was dead code - Rename `ToUTF8WithErr` to `ToUTF8` - Preserve BOM in Monaco Editor - Remove a unnecessary newline in the textarea value. Browsers did ignore it, it seems but it's better not to rely on this behaviour. Fixes: go-gitea#28743 Related: go-gitea#6716 which seems to have once introduced a mechanism that strips and re-adds the BOM, but from what I can tell, this mechanism was removed at some point after that PR.
The
ToUTF8*
functions were stripping BOM, while BOM is actually valid in UTF8, so the stripping must be optional depending on use case. This does:ToUTF8*
functions, that by default will strip BOM to preserve existing behaviourToUTF8
function, it was dead codeToUTF8WithErr
toToUTF8
Fixes: #28743
Related: #6716 which seems to have once introduced a mechanism that strips and re-adds the BOM, but from what I can tell, this mechanism was removed at some point after that PR.