Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip BOM from output in interactive mode #1938

Merged
merged 9 commits into from
Sep 6, 2022
Merged

Conversation

dag-h
Copy link
Contributor

@dag-h dag-h commented Nov 8, 2021

Fixes #1922

This is my first contribution to an open source project so I apologize in advance for any misunderstandings!

I tried finding somewhere where the output is known to be shown in an interactive mode, and InteractivePrinter looked right on. Removed the BOM if present on the first line. Also made sure the BOM is not stripped if output is piped to a file/other program.

Contents of first line copied from bat Program.cs.txt (see #1922 for example file):

Screenshot from 2021-11-08 09-54-57

Contents of file obtained from bat Program.cs.txt > test.txt:

Screenshot from 2021-11-08 09-58-06

The utf16 test then fails since it expects the output to contain the BOM. As far as I can tell the test runs bat in interactive mode, so I assumed this to be incorrect now. I changed the expected test output to not contain the BOM anymore.

I have probably missed some important things to consider, and there are probably better ways to do this than std::str::replace.

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! I took a high level look and had a high level comment.

@dag-h
Copy link
Contributor Author

dag-h commented Nov 18, 2021

BOM is now stripped from all non-config.loop_through output. I'm not entirely confident in how robust the regression tests are, but I had a go at them as well.

@sharkdp
Copy link
Owner

sharkdp commented Sep 4, 2022

Sorry for leaving this hanging for so long. I think this actually looks quite good! I have one minor question: Is there any chance that this would interfere with any of the other bat command line options? For example, does everything still work in bat -A mode? I.e.: is the BOM still shown?

Or does this interfere/interact somehow with --line-range or similar options?

// Remove byte order mark from the first line if it exists
if line_number == 1 {
match line.strip_prefix('\u{feff}') {
Some(stripped) => stripped.to_string(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is an alternative implementation where we do not have to allocate a new string? by operating on &str instead of String.

But .. this is only relevant for files starting with a BOM. So maybe it's not the most urgent question.

@sharkdp
Copy link
Owner

sharkdp commented Sep 4, 2022

Fixed the merge conflicts.

@sharkdp sharkdp requested a review from Enselic September 4, 2022 21:01
@Enselic
Copy link
Collaborator

Enselic commented Sep 5, 2022

Looks like tests are failing in Mac and Windows, but I will be to review this PR once that has been sorted out (I did a quick try at fixing the failures myself but it did not seem trivial to fix)

@sharkdp
Copy link
Owner

sharkdp commented Sep 6, 2022

I did a quick try at fixing the failures myself but it did not seem trivial to fix

Indeed. Two tests on macOS and Windows where failing for two different reasons:

  • macOS has a different color scheme by default (presumably due to the dark mode changes) and one test relied on explicit ANSI color codes => fixed by matching with a regex
  • Windows test was failing because the terminal width was not explicitly set, and the Windows runner (presumably) has a different terminal width.

In addition, the "Run tests with updated syntaxes and themes" check was failing because we have one syntax test with a UTF-16 BOM (PowerShell). Previously, bats highlighted output looked like this:

image

The ef bb bf part is the UTF-8-encoded version of the UTF-16 BOM, which is surrounded by some ANSI highlighting (for whatever reason). This PR improves the output and removes that whole first section (enclosed in red).

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenomenal! Let's merge this then! Code looks good to me 👍

@Enselic Enselic merged commit 08386da into sharkdp:master Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A space appears at the beginning of the file (Byte order mark)
3 participants