Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add html utf8 bom signature #268

Merged
merged 8 commits into from
Apr 17, 2022

Conversation

napalu
Copy link
Contributor

@napalu napalu commented Mar 31, 2022

closes #267

@coveralls
Copy link

coveralls commented Mar 31, 2022

Coverage Status

Coverage remained the same at 96.407% when pulling b0140e7 on napalu:bugfix/html_utf8_bom into 71f0e2b on gabriel-vasile:master.

Copy link
Owner

@gabriel-vasile gabriel-vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi and thank you for your contribution.
There is a problem with current solution: it does not work when there is whitespace between BOM and HTML signature.
I added a testcase to show the problem.

I think a solution for this would be to trim the BOM here, before the trimLWS call. Let me know if you other idea how to do it.

@napalu
Copy link
Contributor Author

napalu commented Apr 3, 2022

Hi and thank you for your contribution. There is a problem with current solution: it does not work when there is whitespace between BOM and HTML signature. I added a testcase to show the problem.

Hi - good catch - I'll have a look at it tomorrow and update the PR.

@napalu
Copy link
Contributor Author

napalu commented Apr 5, 2022

@gabriel-vasile I think your proposal of trimming the BOM before trimLWS is good - I chose to skip and then restore it after stripping so its presence can be used later on if needed.

Copy link
Owner

@gabriel-vasile gabriel-vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm not sure I understand the reason for restoring the BOM. The way I see it, trimming the BOM will always be more efficient and still detect MIME type correctly.

internal/magic/magic.go Outdated Show resolved Hide resolved
internal/magic/text.go Outdated Show resolved Hide resolved
ensure BOM has precedence in html encoding detection
add tests for BOM detection override
Copy link
Owner

@gabriel-vasile gabriel-vasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the code looks all good. Just one small issue with comments to solve before I can merge.

internal/magic/magic.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Html file with utf-8 byte order mark is misclassified as text/plain
3 participants