Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-printable characters in diff view #11687

Merged
merged 2 commits into from
Jun 8, 2024

Conversation

pilleye
Copy link
Contributor

@pilleye pilleye commented Jun 2, 2024

Summary

Fixes #10841 by handling non-printable characters in the same way as GNU coreutils cat's --show-nonprinting flag. One downside of this approach is that unicode characters are no longer printed as their actual character, which could be problematic for non-Latin languages.

Test Plan

Tested with the same example in #10841, but would appreciate a pointer to where I can programmatically add some tests for various non-printable ASCII characters.

Screenshot 2024-06-01 at 7 49 00 PM

Example of Unicode Changes

Screenshot 2024-06-01 at 7 56 31 PM

Additional Considerations

As @carljm pointed out, this would make the code diverge from git diff.

Would appreciate any feedback on this quick, slightly hacky diff!

Copy link
Contributor

github-actions bot commented Jun 2, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this.

I'm a bit surprised that we don't see a single test failure. I think we may also need to make this change in

pub(super) struct MessageCodeFrame<'a> {

and

pub(super) struct Diff<'a> {

}

impl ShowNonprinting for &str {
fn show_nonprinting(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could return a Cow here to avoid allocating a string if it doesn't contain any non printable characters.

We may be able to do something similar to
https://github.com/mitsuhiko/insta/blob/569bbade9d9ff4bd43fe4138bdcbde00a6bf34c4/insta/src/output.rs#L325-L339

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

fn show_nonprinting(&self) -> String {
let mut output = String::with_capacity(self.len());

for byte in self.bytes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand this correctly, that this will escape all non-ascii characters (in addition to some ascii characters)? I think that's undesirable. I would be very confused to see ä escaped in a Diff and wouldn't understand where this is coming from.

I recommend focusing on the non printable characters and would consider using unicode characters for them (similar to what insta does see link above) or biomejs (https://github.com/MichaReiser/biome/blob/96bbf507a006274518e8f5ae5af57c56b7771fb0/crates/biome_diagnostics/src/display/frame.rs#L394-L406)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was correct. Fixed in the recent version.

@pilleye pilleye force-pushed the pilleye/nonprinting branch from 6c99816 to 359080e Compare June 5, 2024 19:18
@pilleye
Copy link
Contributor Author

pilleye commented Jun 5, 2024

Fixed the unicode issues and Cow issues: Screenshot 2024-06-05 at 3 19 46 PM

Working on adding it to the other spots in the repo.

@pilleye pilleye force-pushed the pilleye/nonprinting branch from 359080e to 434dc95 Compare June 5, 2024 19:29
@pilleye pilleye force-pushed the pilleye/nonprinting branch from 434dc95 to cd1ae4a Compare June 5, 2024 19:33
@pilleye
Copy link
Contributor Author

pilleye commented Jun 5, 2024

Updated the snapshot tests, they look to be much more clear since the ^ is actually pointing to the correct location.

@pilleye pilleye requested a review from MichaReiser June 5, 2024 19:50
@zanieb
Copy link
Member

zanieb commented Jun 5, 2024

Thanks for contributing!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a huge improvement over what we used to have before.

Comment on lines 237 to 250
match change.tag() {
ChangeTag::Equal => write!(f, " {}", change.value())?,
ChangeTag::Delete => write!(f, "{}{}", "-".red(), change.value().red())?,
ChangeTag::Insert => write!(f, "{}{}", "+".green(), change.value().green())?,
ChangeTag::Equal => write!(f, " {}", change.value().show_nonprinting())?,
ChangeTag::Delete => write!(
f,
"{}{}",
"-".red(),
change.value().show_nonprinting().red()
)?,
ChangeTag::Insert => write!(
f,
"{}{}",
"+".green(),
change.value().show_nonprinting().green()
)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I would introduce a variable with the cleaned-up code to reduce the diff size and reduce the risk that one path misses the call.

let change = change.value().show_nonprinting();

match change.tag {
	ChangeTag::Equal => write!(f, " {}", change)?,
	...
}

@MichaReiser MichaReiser enabled auto-merge (squash) June 8, 2024 06:19
@MichaReiser MichaReiser added the cli Related to the command-line interface label Jun 8, 2024
@MichaReiser MichaReiser merged commit ed94779 into astral-sh:main Jun 8, 2024
18 checks passed
@pilleye pilleye deleted the pilleye/nonprinting branch June 9, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli Related to the command-line interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider handling of non-printable characters when showing code/diffs
3 participants