Fixes #5260 #5262

PSeitz · 2022-03-12T05:56:43Z

Fixes #5260 by checking if it is part of a type '::'
Add possibility to run single test e.g. TEST_FILE=5250 cargo test system to run test files containing 5250

src/test/mod.rs

ytmimi · 2022-03-12T06:52:58Z

Thanks for the PR!

As I mentioned in #5260 (comment), I'm wondering if we could take a similar approach to the referenced PR and add another regex check in has_url (linked below for reference):

rustfmt/src/comment.rs

Lines 928 to 936 in 83c1692

    
           /// Returns `true` if the given string MAY include URLs or alike. 
        
           fn has_url(s: &str) -> bool { 
        
               // This function may return false positive, but should get its job done in most cases. 
        
               s.contains("https://") 
        
                   || s.contains("http://") 
        
                   || s.contains("ftp://") 
        
                   || s.contains("file://") 
        
                   || REFERENCE_LINK_URL.is_match(s) 
        
           }

Just wondering if you gave that a try and landed on this approach because the regex route didn't work?
I personally feel that the change would be a little cleaner if it was encapsulated in has_url instead of directly tweaking how we break strings.

PSeitz · 2022-03-12T07:54:30Z

I was considering disabling breaking lines completely if detected, but I think the solution in the PR is closer to what we actually want, since it will still try to split the line.

Currently there is one part missing to apply the result, because the call to
wrap_str(result, fmt.config.max_width(), fmt.shape) accepts only string rewrites if no line is exceeding the max_width limit, which is odd since it is a better formatting result.

I moved the test filtering to this PR #5265

ytmimi · 2022-03-12T18:49:53Z

Thanks for taking the time to move the test filtering code to a different PR!

Agreed that not splitting the line at all isn't ideal, but it's consistent with how we handle other links. I think trying to uphold that consistency would be best, but totally open to considering these changes!

break_string is a source of some other bugs and, that's mostly why I'm reluctant to change it directly to support not breaking on :: (aside from the earlier point of has_url better encapsulating the fix).

Right now I'm leaning towards the has_url solution because the surface for code changes is low and it provides an immediate fix for the problem of breaking qualified paths to Types in markdown URLs. In the future I think it would be cool to use a markdown parser to more intelligently determine line breaks in doc comments for other markdown constructs that we don't currently handle well!

One nit, but something I'd want changed before moving forward is that it looks like the commit history is a bit messed up and these changes were amended to an unrelated commit (or at least the commit message / author info from a previous commit was copied over). When you have a chance could you rebase these changes and update the commit message, Thanks!

PSeitz · 2022-03-13T02:46:25Z

I understand the url approach, but I think this is a more complete solution.

The root cause of the issue is not markdown links, since (ABC)[CDE], does not contain any characters that are considered a line break. It's that : is considered a line break (punctuation), although it's part of a type.

This approach covers more aspects, e.g.

break_string is called from multiple positions, but the url check does not cover e.g. itemized blocks.
We can have very long types longer than max_width. I extended the test to illustrate.

I fixed the commit to not amend to another commit.

In general I think it would better, to handle tokens and not graphemes in break_string.

calebcartwright · 2022-03-16T03:08:37Z

Thank you both for the PR/review and dialog! I've not had a chance to look at the code but wanted to chime in on a couple of the points of dialog.

Re: leading link/url approach vs. double colon search, you've both made some good points. There's an inherent friction in trying to split things between splitting something you don't want to split vs. failing to split something that could be split. I'd say historically the tact has been more aggressive in nature with subsequent attempts to curb or soften when/if breaks will happen, and we're at such a point again. Ultimately we'll have to decide (or expose as yet another config option) whether any :: is strictly part of a type (possibly reasonable/safe) or if there's valid cases where it could appear in some breakable context

In general I think it would better, to handle tokens and not graphemes in break_string

This is a short, subtle sentence but that speaks to a much larger topic that I don't really want to delve into here beyond saying that the utilization of graphemes is intentional to account for the full range of text rustfmt will encounter, including all sorts of unicode fun, rtl vs ltr languages, etc.

If you think there's a better alternative that accounts for everything then certainly feel free to make a separate proposal elsewhere, but it definitely would not be desirable to e.g. move from grapheme processing to more trivial code point/token based processing.

PSeitz · 2022-03-16T04:06:00Z

Ultimately we'll have to decide (or expose as yet another config option) whether any :: is strictly part of a type (possibly reasonable/safe) or if there's valid cases where it could appear in some breakable context

Good point. I'm not aware of any :: which we want to split. I don't think it occurs in natural language.
It does appear in URIs, which we also don't want to split.

If you think there's a better alternative that accounts for everything then certainly feel free to make a separate proposal elsewhere, but it definitely would not be desirable to e.g. move from grapheme processing to more trivial code point/token based processing.

What I meant with tokens is to group graphemes together to a token (e.g. a word).

PSeitz · 2022-04-11T13:30:01Z

@ytmimi @calebcartwright
Can we move forward on this? This fixes an issue I have in documenting and referencing types in tantivy.

calebcartwright · 2022-04-11T23:21:11Z

@ytmimi @calebcartwright Can we move forward on this? This fixes an issue I have in documenting and referencing types in tantivy.

I appreciate you've got some vested interest in seeing the issue resolved, though please keep in mind that's also true for others that have reported and/or proposed changes for things they are encountering.

rustfmt is sadly underserved when it comes to the review/maintenance side of the equation, and as such the PR queue tends to back up. This one isn't too far out now, but we'll get to it when we can.

Thank you for your patience @PSeitz

calebcartwright · 2022-04-28T00:28:45Z

This is up next on my list to review, though before diving in, @ytmimi do you have any remaining concerns with the approach?

I'm probably grossly oversimplifying things, but AIUI the question/approaches were really between whether we'd only refrain from splitting the type if it appeared at the start of a comment line, or alternatively, anywhere within the line?

@PSeitz - definitely no rush, but as this has been waiting on us for a while an we've had to sync some changes from upstream in the interim, would probably be good to do a rebase on master at your convenience

ytmimi · 2022-04-29T02:02:21Z

@calebcartwright I've gone ahead and given this a re-review. Initially I wanted to avoid going down this code path at all. After sitting on it for a little while and taking a look at the implementation again I think I'm warmed up to moving forward with the proposed solution.

The only thing I'm wondering now is if it would be better to generalize is_part_of_type to be more generic and disallow breaking on any consecutive punctuation, but taking a look at the Unicode Category "Other Punctuation" (which is what is_punctuation is implemented with), I don't know if we'll run into very many situations like this.

Maybe breaking at an exclamation point if it's on the comment width boundary e.g: ...(some comment text)!!!! -> ...(some comment text)!!\n!!

I don't think it's necessary to implement a generalized solution, just wondering if including this now would prevent other line break issues down the road. Overall I'm happy to move forward with this.

Fixes rust-lang#5260 by checking if it is part of a type '::'

PSeitz · 2022-05-03T03:35:37Z

I think this generalization to disallow breaking on any consecutive punctuation is a good a idea. I updated the PR and added a test for it.

ytmimi

Thanks for your work on this! @calebcartwright I'm happy to move forward, but will give you a chance to review.

calebcartwright · 2022-05-05T03:05:46Z

I'm open to discussing additional changes beyond what's really in scope for the types-related issue, but I don't want to keep increasing the scope and elongating the amount of time on this particular PR. Let's drop this back to strictly handling types, and open a separate issue for discussion of broader rules, like consecutive punctuation

PSeitz · 2022-05-05T15:15:55Z

I dropped the commit, and created a PR that includes the commit here #5331

ytmimi

@calebcartwright sorry about that. I didn't mean to extend the the scope of the PR with the suggestion that we disallow breaking on any punctuation. Agreed that it's probably best to continue the discussion in a separate issue.

@PSeitz Thanks for reverting the changes LGTM.

PSeitz · 2022-05-05T16:03:52Z

I can’t merge the PR, since I don’t have write access

ytmimi · 2022-05-05T16:12:27Z

I can’t merge the PR, since I don’t have write access

Yup, that's intentional. I think we're good to go here but will hold off on merging this until @calebcartwright gives another review as I'm not sure if he wants to see any other changes besides dropping the double punctuation logic.

calebcartwright

Thanks so much! I had a couple minor thoughts but this is good enough as-is so would rather just move forward 👍

fix link alias after rust-lang/rustfmt#5262 has been merged and released. fix dead links

ytmimi reviewed Mar 12, 2022

View reviewed changes

src/test/mod.rs Outdated Show resolved Hide resolved

ytmimi added the pr-waiting-on-author label Mar 12, 2022

PSeitz force-pushed the master branch 2 times, most recently from 812f4b2 to fbc3497 Compare March 12, 2022 07:56

PSeitz force-pushed the master branch 2 times, most recently from 594993a to cd19e80 Compare March 13, 2022 02:04

calebcartwright added pr-follow-up-review-pending and removed pr-waiting-on-author labels Apr 11, 2022

PSeitz force-pushed the master branch from 2e09276 to 79857ab Compare April 28, 2022 12:10

Fixes rust-lang#5260

74c2c44

Fixes rust-lang#5260 by checking if it is part of a type '::'

PSeitz force-pushed the master branch from ac3ddc6 to 3fd191c Compare May 2, 2022 16:38

ytmimi reviewed May 3, 2022

View reviewed changes

PSeitz force-pushed the master branch from 3fd191c to 74c2c44 Compare May 5, 2022 15:13

ytmimi approved these changes May 5, 2022

View reviewed changes

calebcartwright approved these changes May 8, 2022

View reviewed changes

calebcartwright merged commit c65ba14 into rust-lang:master May 8, 2022

calebcartwright added release-notes Needs an associated changelog entry and removed pr-follow-up-review-pending labels May 8, 2022

calebcartwright removed the release-notes Needs an associated changelog entry label Jun 17, 2022

PSeitz mentioned this pull request Sep 8, 2022

improve docs quickwit-oss/tantivy#1514

Merged

PSeitz added a commit to quickwit-oss/tantivy that referenced this pull request Sep 8, 2022

improve docs

496196d

fix link alias after rust-lang/rustfmt#5262 has been merged and released. fix dead links

fulmicoton pushed a commit to quickwit-oss/tantivy that referenced this pull request Sep 8, 2022

improve docs (#1514)

4592471

fix link alias after rust-lang/rustfmt#5262 has been merged and released. fix dead links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #5260 #5262

Fixes #5260 #5262

PSeitz commented Mar 12, 2022 •

edited

Loading

ytmimi commented Mar 12, 2022

PSeitz commented Mar 12, 2022 •

edited

Loading

ytmimi commented Mar 12, 2022

PSeitz commented Mar 13, 2022 •

edited

Loading

calebcartwright commented Mar 16, 2022

PSeitz commented Mar 16, 2022

PSeitz commented Apr 11, 2022 •

edited

Loading

calebcartwright commented Apr 11, 2022

calebcartwright commented Apr 28, 2022

ytmimi commented Apr 29, 2022

PSeitz commented May 3, 2022

ytmimi left a comment

calebcartwright commented May 5, 2022

PSeitz commented May 5, 2022

ytmimi left a comment

PSeitz commented May 5, 2022

ytmimi commented May 5, 2022

calebcartwright left a comment

Fixes #5260 #5262

Fixes #5260 #5262

Conversation

PSeitz commented Mar 12, 2022 • edited Loading

ytmimi commented Mar 12, 2022

PSeitz commented Mar 12, 2022 • edited Loading

ytmimi commented Mar 12, 2022

PSeitz commented Mar 13, 2022 • edited Loading

calebcartwright commented Mar 16, 2022

PSeitz commented Mar 16, 2022

PSeitz commented Apr 11, 2022 • edited Loading

calebcartwright commented Apr 11, 2022

calebcartwright commented Apr 28, 2022

ytmimi commented Apr 29, 2022

PSeitz commented May 3, 2022

ytmimi left a comment

Choose a reason for hiding this comment

calebcartwright commented May 5, 2022

PSeitz commented May 5, 2022

ytmimi left a comment

Choose a reason for hiding this comment

PSeitz commented May 5, 2022

ytmimi commented May 5, 2022

calebcartwright left a comment

Choose a reason for hiding this comment

PSeitz commented Mar 12, 2022 •

edited

Loading

PSeitz commented Mar 12, 2022 •

edited

Loading

PSeitz commented Mar 13, 2022 •

edited

Loading

PSeitz commented Apr 11, 2022 •

edited

Loading