-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add methods for dealing with whitespaces #474
Add methods for dealing with whitespaces #474
Conversation
Hi Marcelo, this is a really useful contribution, thanks a lot! The CI is red because of mima the trait ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trim")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimRight")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimLeft") in https://github.com/scalacenter/scalafix/blob/master/project/Mima.scala#L7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution, I am eager to use this myself 😄 left a few minor comments, otherwise implementation and tests look great 👍
"trimLeft removes spaces and newline before `val`", | ||
original, | ||
"""// Foobar | ||
|object a {val x = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a // comment after the open curly brace? We should not remove newlines trailing such comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could test this by trimming left of "object"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the example above, the result would be:
// Foobar object a {
I put some conditions to prevent this issue (comments in the format /* ... */
are OK). The same would occur when trimming on the right side of a comment though....
Comments are not the only case where trimming would cause an invalid output. Example:
object Foo {
val a = 0
val b = 0
}
ctx.trimLeft(bValToken)
Output:
object Foo {
val a = 0 val b = 0
}
I feel that the current behavior for newlines is not very intuitive and useful in practice. I can't think of a valid scenario where I would like to get rid of the left/right newline and have the remaining code on that particular line (or the line below) moved up (update: except when the entire line is intended to be removed or when an expression wraps across multiple lines). Is there any use case that comes to your mind?
From the example given above and the use case @gabro described in #370, maybe the right API would be sth like ctx.removeLine(token)
(trim left/right spaces
is still useful though). What do you think?
private def canBeTrimmed(tk: Token, includeNewLine: Boolean): Boolean = | ||
tk match { | ||
case Space() | Tab() | FF() => true | ||
case Newline() if includeNewLine => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should limit this to a single newline. A typical use case if you want to remove a statement then you usually want to keep the leading / trailing blank lines. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to understand, given this example:
val x = 42
"foo"
val y = "bar"
would the current implementation be able to produce this diff?
val x = 42
-
- "foo"
val y = "bar"
so that the resulting code is
val x = 42
val y = "bar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation should produce this diff
val x = 42
- "foo"
val y = "bar"
That would leave two blank lines, but it would also preserve a blank line in the following case instead of remove it
val x = 42
- "foo"
val y = "bar"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Probably the job of taking care of multiple blank lines belongs to a formatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, scalafmt would at least combine the two blank lines into a single blank line.
* Remove whitespaces on the left side of the token. Use `includeNewLine` to | ||
* indicate when newline tokens should be preserved. | ||
*/ | ||
def trimLeft(token: Token, includeNewLine: Boolean = true): Patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having a separate method to trim only spaces? trimLeftSpaces, trimSpaces, trimRightSpaces (name up for debate). Tabs can be ignored, I'm not aware of any Scala project using tabs for indent.
I think it could result in a cleaner user API, boolean parameters are quite awkward to both write and read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, @olafurpg. Boolean parameters often indicate the need for two separate methods. My only concern is the explosion of methods (potentially 6 new ones) in the RuleCtx
api (those would be inherited from PatchOps
). If you think that's not a concern I will go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the eventual home for these methods will be simple defs in object Patch
. I personally think it's OK to have more methods if it buys us a cleaner api.
@marcelocenerine we are thinking of cutting a release today. Do you think you can address the comments above? This is a really valuable contribution and we would be happy to add it to the upcoming version. |
@MasseGuillaume, unfortunately I will only be able to address the comments in the evening. Please feel free to continue the work if you guys want it to make the release cut today. Otherwise I will update the PR this evening |
@marcelocenerine cool, I'm going to address the PR comments and include it in this release. |
@marcelocenerine don't know enough about token manipulation to address this PR for the release. I'm putting the ball back on your side for the next release. :) |
@MasseGuillaume, ok. I will update the PR tonight :) |
f386574
to
41fe3a5
Compare
@olafurpg thanks for your suggestions!!! I updated the PR addressing your comments, but I think we should discuss more on the behavior for newlines. Please see my comment: #474 (comment) |
private def isSingleLineComment(tk: Token): Boolean = | ||
tk.is[Comment] && tk.text.startsWith("//") | ||
private def isWindowsNewline(tk1: Token, tk2: Token): Boolean = | ||
tk1.is[CR] && tk2.is[LF] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran tests on Windows with a file containing CR+LF characters and it seems that scalameta ignores the CR ones, keeping only the LF tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some windows line break related issues that I have never properly looked into scalameta/scalameta#443
Hey @olafurpg, did you get a chance to take a look at my latest changes as well as my comment #474 (comment) ? I created this branch master...marcelocenerine:trim_normalize with an alternative implementation ( |
Hey @marcelocenerine so sorry, I have maybe 2-3 different pending reviews for this PR that I never got around to submit. Every time I look at it I start writing tons of comments, then I change my mind and 20 minutes I decide to "look at it again later" 😅 I took a look at your trim_normalize branch and I'm not convinced about I'm wondering if we can simplify things by moving these methods to class TokenList {
def leading
def leadingSpaces
def leadingLine // includes spaces + first newline
def trailing
def trailingSpaces
def trailingLine I'm personally most excited about scalafix/scalafix-core/shared/src/main/scala/scalafix/internal/rule/RemoveUnusedTerms.scala Line 44 in 8bd026d
Still, that means users will need to feed the output of those methods to removeTokens , would that defeat the purpose?
OK, I'm submitting this comment before I change my mind again 😂 |
Hey @olafurpg, thanks for your feedback. Your hesitation highlights the fact that this PR needs more work (especially when it comes to handling newlines) 😄 As a rule author I'd be more interested in the I think that removing newlines is dangerous unless the rule is aware of what tokens exist in the prev/next line as well as in the current line. As you pointed out, I'd be more interested in methods to find the prev/next newline character instead and then write my own logic to slice(newline, token) and figure out what is in between or what precedes/follows the newline token so I could determine if they could be removed.
or maybe a method that returned all leading/trailing tokens in the same line. What do you think? Should I update the PR following your suggestion? Or do the above instead of
To be honest, I'd prefer to have a rich API focused in finding/selecting tokens and a cleaner API with basic operations for adding/removing/replacing things ( Alternatively, given that additional whitespaces between tokens are irrelevant to determine the correctness of a program (just an aesthetic concern), should rules just ignore them and let a code formatting tool finish the job? |
Would {
val x = <<prevNewline>> If it returns the newline after
I'm glad we agree there!
Trickier cases I think we should delegate to a formatter, but for some simple rewrites I think it would be nice to offer a best-effort by for example trimming enclosing spaces, for example in #513 OK, here is my concrete proposal:
It seems we are still unsure what to do about lines so we can give that a bit more time to think and leave it for another PR. I think |
yeah, I was thinking something like that but, like you said, it's already easy to achieve that goal through the current api. Except if we wanted to save users from having to deal with the intricacies of Windows/Unix line breaks (as long as scalameta cared about that difference). Your proposal sounds good to me. In my short experience with scalafix I have come across scenarios where I missed utilities to get rid of whitespaces. I wanted to cover the other use case in #370 (newlines) as well but I think that it needs some more thought so that whatever comes out of it is generic enough for people to use. I will update the PR shortly 😅 |
41fe3a5
to
2a2c7bf
Compare
val tokenList = TokenList(emptyFileTokens) | ||
assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's kinda off topic but I think it's worth mentioning: when I was working on the previous implementation I found 2 things in the API that didn't behave as I expected (http://wiki.c2.com/?PrincipleOfLeastAstonishment):
prev
/next
returns the input token if it is the head/last of the tokenList. WouldOption[Token]
make more sense?slice(from, to)
doesn't behave as in Scala's std collection given thatfrom
is not inclusive. For instance, while removing tokens what could take a single step (ctx.removeTokens(tokenList.slice(tk, last)
) actually requires 2 (ctx.removeToken(tk) + ctx.removeTokens(tokenList.slice(tk, last)
) and is really easy to introduce bugs if users don't dive into the implementation ofslice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice
immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is updated now. |
2a2c7bf
to
aae7ef1
Compare
aae7ef1
to
47b94c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @marcelocenerine! Thanks a lot of your patience and iterating on the design. 👍
val tokenList = TokenList(emptyFileTokens) | ||
assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice
immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6
``` > scalafix --diff-base <tab> HEAD ag/gh-pages ag/issue-260 ag/issue-319 ag/master ... v0.5.5 v0.5.6 |00| 51abefa -- wip improve sbt completion (13 hours ago) |01| 3f63b5b -- Merge pull request scalacenter#474 from marcelocenerine/trim_includeNewLine (13 hours ago) |02| 47b94c0 -- Add leadingSpaces/trailingSpaces methods to TokenList (13 hours ago) |03| 8bd026d -- Merge pull request scalacenter#503 from MasseGuillaume/feature/368-escape-patches (13 hours ago) |04| 20e445f -- Escape hatch on Patch (fix scalacenter#368) (13 hours ago) |05| a2c5d70 -- Merge pull request scalacenter#500 from MasseGuillaume/feature/495-custom-id (13 hours ago) |06| 59efe7d -- Add CustomMessage to the public api (13 hours ago) |07| 9ae6071 -- Add id for CustomMessage (fix scalacenter#495) (13 hours ago) |08| e4a5c35 -- Merge pull request scalacenter#494 from MasseGuillaume/disable-regex (13 hours ago) |09| a422860 -- Merge pull request scalacenter#497 from olafurpg/disable-signatures (13 hours ago) |10| 7930947 -- DisableSyntax add regex (13 hours ago) |11| 5dbdd6b -- IntervalSet test for empty and add toString (13 hours ago) |12| b022fbd -- DisableSyntax don't repeat DisableSyntax.keyword in message (13 hours ago) |13| a992b02 -- Assert instead of scalafix:ok (13 hours ago) |14| 7896ccd -- Refactor Disable to use views. (13 hours ago) |15| 58acdbe -- Fix scalacenter#493, handle synthetics and symbol signatures in Disable. (13 hours ago) |16| b48d7f0 -- Merge pull request scalacenter#490 from olafurpg/unmanagedSources (13 hours ago) |17| e9b2b0a -- s/canFormat/canFix/ (13 hours ago) |18| 26be6fa -- Use unmanagedSources instead of unmanagedSourceDirectories. (13 hours ago) |19| 4d46001 -- Merge pull request scalacenter#488 from olafurpg/master (13 hours ago) ```
``` > scalafix --diff-base <tab> HEAD ag/gh-pages ag/issue-260 ag/issue-319 ag/master ... v0.5.5 v0.5.6 |00| 51abefa -- wip improve sbt completion (13 hours ago) |01| 3f63b5b -- Merge pull request scalacenter#474 from marcelocenerine/trim_includeNewLine (13 hours ago) |02| 47b94c0 -- Add leadingSpaces/trailingSpaces methods to TokenList (13 hours ago) |03| 8bd026d -- Merge pull request scalacenter#503 from MasseGuillaume/feature/368-escape-patches (13 hours ago) |04| 20e445f -- Escape hatch on Patch (fix scalacenter#368) (13 hours ago) |05| a2c5d70 -- Merge pull request scalacenter#500 from MasseGuillaume/feature/495-custom-id (13 hours ago) |06| 59efe7d -- Add CustomMessage to the public api (13 hours ago) |07| 9ae6071 -- Add id for CustomMessage (fix scalacenter#495) (13 hours ago) |08| e4a5c35 -- Merge pull request scalacenter#494 from MasseGuillaume/disable-regex (13 hours ago) |09| a422860 -- Merge pull request scalacenter#497 from olafurpg/disable-signatures (13 hours ago) |10| 7930947 -- DisableSyntax add regex (13 hours ago) |11| 5dbdd6b -- IntervalSet test for empty and add toString (13 hours ago) |12| b022fbd -- DisableSyntax don't repeat DisableSyntax.keyword in message (13 hours ago) |13| a992b02 -- Assert instead of scalafix:ok (13 hours ago) |14| 7896ccd -- Refactor Disable to use views. (13 hours ago) |15| 58acdbe -- Fix scalacenter#493, handle synthetics and symbol signatures in Disable. (13 hours ago) |16| b48d7f0 -- Merge pull request scalacenter#490 from olafurpg/unmanagedSources (13 hours ago) |17| e9b2b0a -- s/canFormat/canFix/ (13 hours ago) |18| 26be6fa -- Use unmanagedSources instead of unmanagedSourceDirectories. (13 hours ago) |19| 4d46001 -- Merge pull request scalacenter#488 from olafurpg/master (13 hours ago) ```
When working on a scalafix rule for
scala/collection-strawman
I implemented my own logic to get rid of whitespaces. @olafurpg mentioned in the review that it would be nice to have such functionality supported by Scalafix. Later I realized that there was even an issue filed by @gabro to look into that: #370.I explored a few alternatives to introduce the functionality. There might be better ways to achieve this, so I'd appreciate your feedback and will be willing to accept suggestions.
The code changes in this PR implement the approach that I personally prefer. There are other 2 alternatives available on separate branches (those need some polishing).
1 -
trimLeft
/trimRight
/trim
(master...marcelocenerine:trim):These methods behave similarly to Java's
String.trim
in the sense that they eliminate not only spaces but newline tokens as well. This may not be flexible enough for users who just want to get rid of spaces/tabs and preserve blank lines.The naming is based on what I originally suggested in the comment above as well as by @ShaneDelmore here
2 -
trimLeft
/trimRight
/trim
with aincludingNewline
parameter (code in the PR):This is similar to the alternative above, except that the trimming methods take an optional parameter (set to
true
by default) that can be used to indicate when newlines should be preserved. This is based on @gabro's suggestion on #3703 -
ctx.whitespaces.leftTo
/rightTo
/around
(master...marcelocenerine:trim_whitespaces):RuleCtx
already has ~30 methods and adding 3 more would make it more bloated (#380). This approach basically introduces an utility class that groups functions to find/select whitespaces (similar toMatchingParens
andAssociatedComments
). The output of its methods can be passed toctx.removeTokens
to perform the actual removal. Example: