Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove each comment's leading whitespace in comment::rewrite_comment #5392

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ytmimi
Copy link
Contributor

@ytmimi ytmimi commented Jun 20, 2022

Fixes #5391

Previously, leading whitespace at the start of custom comments (CommentStyle::Custom) would lead consume_same_line_comments to not return the correct byte position for the end of the comment.

This issue cascaded into a situation where an extra indented newline was inserted into rustfmts output causing a "left behind trailing whitespace" error.

Removing leading whitespace helps to properly identify the end of the comment and properly rewrite the comment.

ytmimi added 2 commits June 20, 2022 10:44
Fixes 5391

Previously, leading whitespace at the start of custom comments
(`CommentStyle::Custom`) would lead `consume_same_line_comments` to not
return the correct byte position for the end of the comment.

This issue cascaded into a situation where an extra indented newline was
inserted into rustfmts output causing a `"left behind trailing
whitespace"` error.

Removing leading whitespace helps to properly identify the end of
the comment and properly rewrite the comment.
@ytmimi ytmimi changed the title Issue 5391 Remove each comment's leading whitespace in comment::rewrite_comment Jun 20, 2022
@ytmimi
Copy link
Contributor Author

ytmimi commented Jun 21, 2022

Just want to elaborate a little more on the issue that we're experiencing here, and why I think this is the approach we should take. For reference here's the code for identify_comment:

rustfmt/src/comment.rs

Lines 269 to 408 in 3de1a09

fn identify_comment(
orig: &str,
block_style: bool,
shape: Shape,
config: &Config,
is_doc_comment: bool,
) -> Option<String> {
let style = comment_style(orig, false);
// Computes the byte length of line taking into account a newline if the line is part of a
// paragraph.
fn compute_len(orig: &str, line: &str) -> usize {
if orig.len() > line.len() {
if orig.as_bytes()[line.len()] == b'\r' {
line.len() + 2
} else {
line.len() + 1
}
} else {
line.len()
}
}
// Get the first group of line comments having the same commenting style.
//
// Returns a tuple with:
// - a boolean indicating if there is a blank line
// - a number indicating the size of the first group of comments
fn consume_same_line_comments(
style: CommentStyle<'_>,
orig: &str,
line_start: &str,
) -> (bool, usize) {
let mut first_group_ending = 0;
let mut hbl = false;
for line in orig.lines() {
let trimmed_line = line.trim_start();
if trimmed_line.is_empty() {
hbl = true;
break;
} else if trimmed_line.starts_with(line_start)
|| comment_style(trimmed_line, false) == style
{
first_group_ending += compute_len(&orig[first_group_ending..], line);
} else {
break;
}
}
(hbl, first_group_ending)
}
let (has_bare_lines, first_group_ending) = match style {
CommentStyle::DoubleSlash | CommentStyle::TripleSlash | CommentStyle::Doc => {
let line_start = style.line_start().trim_start();
consume_same_line_comments(style, orig, line_start)
}
CommentStyle::Custom(opener) => {
let trimmed_opener = opener.trim_end();
consume_same_line_comments(style, orig, trimmed_opener)
}
// for a block comment, search for the closing symbol
CommentStyle::DoubleBullet | CommentStyle::SingleBullet | CommentStyle::Exclamation => {
let closer = style.closer().trim_start();
let mut count = orig.matches(closer).count();
let mut closing_symbol_offset = 0;
let mut hbl = false;
let mut first = true;
for line in orig.lines() {
closing_symbol_offset += compute_len(&orig[closing_symbol_offset..], line);
let mut trimmed_line = line.trim_start();
if !trimmed_line.starts_with('*')
&& !trimmed_line.starts_with("//")
&& !trimmed_line.starts_with("/*")
{
hbl = true;
}
// Remove opener from consideration when searching for closer
if first {
let opener = style.opener().trim_end();
trimmed_line = &trimmed_line[opener.len()..];
first = false;
}
if trimmed_line.ends_with(closer) {
count -= 1;
if count == 0 {
break;
}
}
}
(hbl, closing_symbol_offset)
}
};
let (first_group, rest) = orig.split_at(first_group_ending);
let rewritten_first_group =
if !config.normalize_comments() && has_bare_lines && style.is_block_comment() {
trim_left_preserve_layout(first_group, shape.indent, config)?
} else if !config.normalize_comments()
&& !config.wrap_comments()
&& !config.format_code_in_doc_comments()
{
light_rewrite_comment(first_group, shape.indent, config, is_doc_comment)
} else {
rewrite_comment_inner(
first_group,
block_style,
style,
shape,
config,
is_doc_comment || style.is_doc_comment(),
)?
};
if rest.is_empty() {
Some(rewritten_first_group)
} else {
identify_comment(
rest.trim_start(),
block_style,
shape,
config,
is_doc_comment,
)
.map(|rest_str| {
format!(
"{}\n{}{}{}",
rewritten_first_group,
// insert back the blank line
if has_bare_lines && style.is_line_comment() {
"\n"
} else {
""
},
shape.indent.to_string(config),
rest_str
)
})
}
}

Taking the example from the original issue identify_comment gets called with orig: &str = " //.Y\n". when we compute the style (line 276) we determine that this is a CommentStyle::DoubleDash comment.

When hitting the match on line 321 we enter the first match arm because we just computed the style as CommentStyle::DoubleDash. This means we call consume_same_line_comments with style = CommentStyle::DoubleDash, orig = " //.Y\n", and line_start = "// ".

Now lets zoom in on the for loop in consume_same_line_comments:

rustfmt/src/comment.rs

Lines 305 to 317 in 3de1a09

for line in orig.lines() {
let trimmed_line = line.trim_start();
if trimmed_line.is_empty() {
hbl = true;
break;
} else if trimmed_line.starts_with(line_start)
|| comment_style(trimmed_line, false) == style
{
first_group_ending += compute_len(&orig[first_group_ending..], line);
} else {
break;
}
}

  • the if statement returns false since the trimmed line isn't empty
  • the else if statement returns false since "//.Y" doesn't start with "// ", and when we recompute the comment style we determine the trimmed comment to be CommentStyle::Custom.
  • Lastly we hit the else case and immediately break out of the loop and return (false, 0) from consume_same_line_comments.

Now on line 364 we split the original string at 0, which gives us ("", " //.Y\n"),

When we finally get to the last conditional statement at the end of identify_comment we end up hitting the else case which recursively calls identify_comment on the trimmed version of rest, which is "//.Y\n". When this recursive call succeeds we also add an additional indented newline.

In the case where the comment is the last item in a block this additional indented newline conflicts with the indented newline added on line 314 of visitor::FmtVisitor::close_block and leads to the "left behind trailing whitespace" error.

rustfmt/src/visitor.rs

Lines 311 to 323 in 2c8b3be

Some(offset) => {
let first_line = &sub_slice[..offset];
self.push_str(first_line);
self.push_str(&self.block_indent.to_string_with_newline(config));
// put the other lines below it, shaping it as needed
let other_lines = &sub_slice[offset + 1..];
let comment_str =
rewrite_comment(other_lines, false, comment_shape, config);
match comment_str {
Some(ref s) => self.push_str(s),
None => self.push_str(other_lines),
}

I suspect something similar happens in the case where the comment isn't the last item in a block, but didn't look too deeply into it since removing leading whitespace up front resolves both cases.

In conclusion, by trimming the comment up front we save ourselves from doing additional work that would lead us to recursively call identify_comment with a trimmed version of the string anyway.

@correabuscar
Copy link

Confirmed this PR works for me on Gentoo with rust 1.78.0 AND with locally built rustfmt1 from latest commit c97996f

diff --git a/src/main.rs b/src/main.rs
index 316bc87..65dfb1a 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,7 +1,7 @@
 mod my_error_things {
     //const CUSTOM_ERROR_MSG_BUFFER_SIZE: usize = 6;
     pub const CUSTOM_ERROR_MSG_BUFFER_SIZE: usize = 4096; // one kernel page?!
-                                                          //^ must be pub due to being used in a macro!
+    //^ must be pub due to being used in a macro!
 }
 fn main() {
     println!("Hello, world!");

Also it passes cargo test locally, so I guess it didn't break anything that was under tests.

Thanks!

Footnotes

  1. though it was kinda difficult to run the locally built rustfmt since it's linked dynamically which was surprising to me, so I had to run it like this:
    $ LD_LIBRARY_PATH="/home/user/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib:/home/user/.rustup/toolchains/nightly-2023-12-28-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" /tmp/rustfmt/target/x86_64-unknown-linux-gnu/debug/rustfmt --config version=Two src/main.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants