Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for dealing with whitespaces #474

Merged
merged 1 commit into from
Dec 19, 2017

Conversation

marcelocenerine
Copy link
Contributor

@marcelocenerine marcelocenerine commented Dec 5, 2017

When working on a scalafix rule for scala/collection-strawman I implemented my own logic to get rid of whitespaces. @olafurpg mentioned in the review that it would be nice to have such functionality supported by Scalafix. Later I realized that there was even an issue filed by @gabro to look into that: #370.

I explored a few alternatives to introduce the functionality. There might be better ways to achieve this, so I'd appreciate your feedback and will be willing to accept suggestions.

The code changes in this PR implement the approach that I personally prefer. There are other 2 alternatives available on separate branches (those need some polishing).

1 - trimLeft / trimRight / trim (master...marcelocenerine:trim):
These methods behave similarly to Java's String.trim in the sense that they eliminate not only spaces but newline tokens as well. This may not be flexible enough for users who just want to get rid of spaces/tabs and preserve blank lines.
The naming is based on what I originally suggested in the comment above as well as by @ShaneDelmore here

2 - trimLeft / trimRight / trim with a includingNewline parameter (code in the PR):
This is similar to the alternative above, except that the trimming methods take an optional parameter (set to true by default) that can be used to indicate when newlines should be preserved. This is based on @gabro's suggestion on #370

3 - ctx.whitespaces.leftTo / rightTo / around (master...marcelocenerine:trim_whitespaces):
RuleCtx already has ~30 methods and adding 3 more would make it more bloated (#380). This approach basically introduces an utility class that groups functions to find/select whitespaces (similar to MatchingParens and AssociatedComments). The output of its methods can be passed to ctx.removeTokens to perform the actual removal. Example:

ctx.removeTokens(ctx.whitespaces.leftTo(tk))

@MasseGuillaume
Copy link
Contributor

Hi Marcelo, this is a really useful contribution, thanks a lot!

The CI is red because of mima
https://travis-ci.org/scalacenter/scalafix/jobs/311647980#L4256-L4269

the trait PatchOps has 3 new methods (trimLeft, trimRight, trim) and this is a public api. This is not super useful because the only implementation is via RuleCtix/RuleCtxImpl. Therefore, you can exclude them in the Mima check:

ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trim")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimRight")
ProblemFilters.exclude[ReversedMissingMethodProblem]("scalafix.patch.PatchOps.trimLeft")

in https://github.com/scalacenter/scalafix/blob/master/project/Mima.scala#L7

Copy link
Contributor

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution, I am eager to use this myself 😄 left a few minor comments, otherwise implementation and tests look great 👍

"trimLeft removes spaces and newline before `val`",
original,
"""// Foobar
|object a {val x = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a // comment after the open curly brace? We should not remove newlines trailing such comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could test this by trimming left of "object"

Copy link
Contributor Author

@marcelocenerine marcelocenerine Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example above, the result would be:

// Foobar object a {

I put some conditions to prevent this issue (comments in the format /* ... */ are OK). The same would occur when trimming on the right side of a comment though....

Comments are not the only case where trimming would cause an invalid output. Example:

object Foo {
   val a = 0
   val b = 0
}

ctx.trimLeft(bValToken)

Output:

object Foo {
   val a = 0 val b = 0
}

I feel that the current behavior for newlines is not very intuitive and useful in practice. I can't think of a valid scenario where I would like to get rid of the left/right newline and have the remaining code on that particular line (or the line below) moved up (update: except when the entire line is intended to be removed or when an expression wraps across multiple lines). Is there any use case that comes to your mind?

From the example given above and the use case @gabro described in #370, maybe the right API would be sth like ctx.removeLine(token) (trim left/right spaces is still useful though). What do you think?

private def canBeTrimmed(tk: Token, includeNewLine: Boolean): Boolean =
tk match {
case Space() | Tab() | FF() => true
case Newline() if includeNewLine => true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should limit this to a single newline. A typical use case if you want to remove a statement then you usually want to keep the leading / trailing blank lines. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, given this example:

val x = 42

  "foo"

val y = "bar"

would the current implementation be able to produce this diff?

val x = 42
-
-  "foo"

val y = "bar"

so that the resulting code is

val x = 42

val y = "bar"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation should produce this diff

val x = 42

-  "foo"

val y = "bar"

That would leave two blank lines, but it would also preserve a blank line in the following case instead of remove it

val x = 42

-  "foo"
val y = "bar"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Probably the job of taking care of multiple blank lines belongs to a formatter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, scalafmt would at least combine the two blank lines into a single blank line.

* Remove whitespaces on the left side of the token. Use `includeNewLine` to
* indicate when newline tokens should be preserved.
*/
def trimLeft(token: Token, includeNewLine: Boolean = true): Patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having a separate method to trim only spaces? trimLeftSpaces, trimSpaces, trimRightSpaces (name up for debate). Tabs can be ignored, I'm not aware of any Scala project using tabs for indent.

I think it could result in a cleaner user API, boolean parameters are quite awkward to both write and read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, @olafurpg. Boolean parameters often indicate the need for two separate methods. My only concern is the explosion of methods (potentially 6 new ones) in the RuleCtx api (those would be inherited from PatchOps). If you think that's not a concern I will go for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the eventual home for these methods will be simple defs in object Patch. I personally think it's OK to have more methods if it buys us a cleaner api.

@MasseGuillaume
Copy link
Contributor

@marcelocenerine we are thinking of cutting a release today. Do you think you can address the comments above? This is a really valuable contribution and we would be happy to add it to the upcoming version.

@marcelocenerine
Copy link
Contributor Author

@MasseGuillaume, unfortunately I will only be able to address the comments in the evening. Please feel free to continue the work if you guys want it to make the release cut today. Otherwise I will update the PR this evening

@MasseGuillaume
Copy link
Contributor

@marcelocenerine cool, I'm going to address the PR comments and include it in this release.

@MasseGuillaume
Copy link
Contributor

@marcelocenerine don't know enough about token manipulation to address this PR for the release. I'm putting the ball back on your side for the next release. :)

@marcelocenerine
Copy link
Contributor Author

@MasseGuillaume, ok. I will update the PR tonight :)

@marcelocenerine
Copy link
Contributor Author

@olafurpg thanks for your suggestions!!! I updated the PR addressing your comments, but I think we should discuss more on the behavior for newlines. Please see my comment: #474 (comment)

private def isSingleLineComment(tk: Token): Boolean =
tk.is[Comment] && tk.text.startsWith("//")
private def isWindowsNewline(tk1: Token, tk2: Token): Boolean =
tk1.is[CR] && tk2.is[LF]
Copy link
Contributor Author

@marcelocenerine marcelocenerine Dec 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran tests on Windows with a file containing CR+LF characters and it seems that scalameta ignores the CR ones, keeping only the LF tokens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have some windows line break related issues that I have never properly looked into scalameta/scalameta#443

@marcelocenerine
Copy link
Contributor Author

marcelocenerine commented Dec 17, 2017

Hey @olafurpg, did you get a chance to take a look at my latest changes as well as my comment #474 (comment) ?

I created this branch master...marcelocenerine:trim_normalize with an alternative implementation (removeLine instead of trimming newlines) as well as methods to normalize spaces (names up for debate). Please see what you think ;)

@olafurpg
Copy link
Contributor

Hey @marcelocenerine so sorry, I have maybe 2-3 different pending reviews for this PR that I never got around to submit. Every time I look at it I start writing tons of comments, then I change my mind and 20 minutes I decide to "look at it again later" 😅

I took a look at your trim_normalize branch and I'm not convinced about normalize* and removeLine. Remove line seems particularly dangerous, since it doesn't handle multiline expressions. Normalize seems quite specific so I'm not sure it needs to be in the core API, unless you have a concrete use-case?

I'm wondering if we can simplify things by moving these methods to TokenList

class TokenList {
  def leading
  def leadingSpaces
  def leadingLine // includes spaces + first newline
  def trailing
  def trailingSpaces
  def trailingLine

I'm personally most excited about leadingSpace, since that one is most annoying to implement and it's quite common for me to remove statements like in

.fold(Patch.empty)(token => ctx.removeTokens(token))

Still, that means users will need to feed the output of those methods to removeTokens, would that defeat the purpose?

OK, I'm submitting this comment before I change my mind again 😂

@marcelocenerine
Copy link
Contributor Author

marcelocenerine commented Dec 18, 2017

Hey @olafurpg, thanks for your feedback. Your hesitation highlights the fact that this PR needs more work (especially when it comes to handling newlines) 😄

As a rule author I'd be more interested in the leadingSpaces/trailingSpaces for use cases where I know what tokens can precede/follow a given token that I want to remove/replace (silly example: input: 1 + 2 ; output: 1.+(2)). If I have that awareness and don't care about whitespaces in between, then the leading/trailing spaces methods would work just fine. For the example above, I wouldn't even care whether the whitespaces are spaces, newlines, tabs or a mix of everything.

I think that removing newlines is dangerous unless the rule is aware of what tokens exist in the prev/next line as well as in the current line. As you pointed out, removeLine is indeed dangerous if a rule doesn't have that awareness. I think the same would be true for leadingLine/trailingLine. Those operations would only be useful if rules knew that either the current line could be entirely removed or whatever tokens remained in the current line could be safely moved up/down (e.g.: multiline expressions).

I'd be more interested in methods to find the prev/next newline character instead and then write my own logic to slice(newline, token) and figure out what is in between or what precedes/follows the newline token so I could determine if they could be removed.

def nextNewline: Option[LF] // or Option[Token] or else...
def prevNewline: Option[LF]

or maybe a method that returned all leading/trailing tokens in the same line.

What do you think? Should I update the PR following your suggestion? Or do the above instead of leadingLine/trailingLine? Or just focus on spaces for now?

Still, that means users will need to feed the output of those methods to removeTokens, would that defeat the purpose?

To be honest, I'd prefer to have a rich API focused in finding/selecting tokens and a cleaner API with basic operations for adding/removing/replacing things (PatchOps).

Alternatively, given that additional whitespaces between tokens are irrelevant to determine the correctness of a program (just an aesthetic concern), should rules just ignore them and let a code formatting tool finish the job?

@olafurpg
Copy link
Contributor

I'd be more interested in methods to find the prev/next newline character instead and then write my own logic to slice(newline, token) and figure out what is in between or what precedes/follows the newline token so I could determine if they could be removed.

Would prevNewline skip over non-whitespace tokens? Example, would it return None or Some here?

{
  val x = <<prevNewline>>

If it returns the newline after { (implemented as .find(_.is[Newline])) then it would be less useful IMO.

To be honest, I'd prefer to have a rich API focused in finding/selecting tokens and a cleaner API with basic operations for adding/removing/replacing things (PatchOps).

I'm glad we agree there!

Alternatively, given that additional whitespaces between tokens are irrelevant to determine the correctness of a program (just an aesthetic concern), should rules just ignore them and let a code formatting tool finish the job?

Trickier cases I think we should delegate to a formatter, but for some simple rewrites I think it would be nice to offer a best-effort by for example trimming enclosing spaces, for example in #513

OK, here is my concrete proposal:

  • TokenList.leadingSpaces/trailingSpaces
  • Make TokenList final, adding relevant filters in Mima.

It seems we are still unsure what to do about lines so we can give that a bit more time to think and leave it for another PR. I think leadingLine(Token/Tree) (without trailing option) would be a nice complement to ctx.removeTokens(leadingLine(tree) ++ tree.tokens).

@marcelocenerine
Copy link
Contributor Author

marcelocenerine commented Dec 19, 2017

Would prevNewline skip over non-whitespace tokens?

yeah, I was thinking something like that but, like you said, it's already easy to achieve that goal through the current api. Except if we wanted to save users from having to deal with the intricacies of Windows/Unix line breaks (as long as scalameta cared about that difference).

Your proposal sounds good to me. In my short experience with scalafix I have come across scenarios where I missed utilities to get rid of whitespaces. I wanted to cover the other use case in #370 (newlines) as well but I think that it needs some more thought so that whatever comes out of it is generic enough for people to use.

I will update the PR shortly 😅

val tokenList = TokenList(emptyFileTokens)
assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kinda off topic but I think it's worth mentioning: when I was working on the previous implementation I found 2 things in the API that didn't behave as I expected (http://wiki.c2.com/?PrincipleOfLeastAstonishment):

  • prev/next returns the input token if it is the head/last of the tokenList. Would Option[Token] make more sense?
  • slice(from, to) doesn't behave as in Scala's std collection given that from is not inclusive. For instance, while removing tokens what could take a single step (ctx.removeTokens(tokenList.slice(tk, last)) actually requires 2 (ctx.removeToken(tk) + ctx.removeTokens(tokenList.slice(tk, last)) and is really easy to introduce bugs if users don't dive into the implementation of slice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6

Copy link
Contributor Author

@marcelocenerine marcelocenerine Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Here are the 2 issues I created: #516 & #517

@marcelocenerine
Copy link
Contributor Author

The PR is updated now.

Copy link
Contributor

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @marcelocenerine! Thanks a lot of your patience and iterating on the design. 👍

val tokenList = TokenList(emptyFileTokens)
assert(tokenList.slice(emptyFileTokens.head, emptyFileTokens.head) == Seq())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points, I agree both of these can be improved. Can you open tickets? I am open to fix slice immediately (we can even implement is using slice on views from the stdlib) while prev/next will have to wait until v0.6

@olafurpg olafurpg merged commit 3f63b5b into scalacenter:master Dec 19, 2017
MasseGuillaume added a commit to MasseGuillaume/scalafix that referenced this pull request Dec 20, 2017
```
> scalafix --diff-base <tab>
HEAD
ag/gh-pages
ag/issue-260
ag/issue-319
ag/master
...
v0.5.5
v0.5.6
|00| 51abefa -- wip improve sbt completion (13 hours ago)
|01| 3f63b5b -- Merge pull request scalacenter#474 from marcelocenerine/trim_includeNewLine (13 hours ago)
|02| 47b94c0 -- Add leadingSpaces/trailingSpaces methods to TokenList (13 hours ago)
|03| 8bd026d -- Merge pull request scalacenter#503 from MasseGuillaume/feature/368-escape-patches (13 hours ago)
|04| 20e445f -- Escape hatch on Patch (fix scalacenter#368) (13 hours ago)
|05| a2c5d70 -- Merge pull request scalacenter#500 from MasseGuillaume/feature/495-custom-id (13 hours ago)
|06| 59efe7d -- Add CustomMessage to the public api (13 hours ago)
|07| 9ae6071 -- Add id for CustomMessage (fix scalacenter#495) (13 hours ago)
|08| e4a5c35 -- Merge pull request scalacenter#494 from MasseGuillaume/disable-regex (13 hours ago)
|09| a422860 -- Merge pull request scalacenter#497 from olafurpg/disable-signatures (13 hours ago)
|10| 7930947 -- DisableSyntax add regex (13 hours ago)
|11| 5dbdd6b -- IntervalSet test for empty and add toString (13 hours ago)
|12| b022fbd -- DisableSyntax don't repeat DisableSyntax.keyword in message (13 hours ago)
|13| a992b02 -- Assert instead of scalafix:ok (13 hours ago)
|14| 7896ccd -- Refactor Disable to use views. (13 hours ago)
|15| 58acdbe -- Fix scalacenter#493, handle synthetics and symbol signatures in Disable. (13 hours ago)
|16| b48d7f0 -- Merge pull request scalacenter#490 from olafurpg/unmanagedSources (13 hours ago)
|17| e9b2b0a -- s/canFormat/canFix/ (13 hours ago)
|18| 26be6fa -- Use unmanagedSources instead of unmanagedSourceDirectories. (13 hours ago)
|19| 4d46001 -- Merge pull request scalacenter#488 from olafurpg/master (13 hours ago)
```
MasseGuillaume added a commit to MasseGuillaume/scalafix that referenced this pull request Dec 22, 2017
```
> scalafix --diff-base <tab>
HEAD
ag/gh-pages
ag/issue-260
ag/issue-319
ag/master
...
v0.5.5
v0.5.6
|00| 51abefa -- wip improve sbt completion (13 hours ago)
|01| 3f63b5b -- Merge pull request scalacenter#474 from marcelocenerine/trim_includeNewLine (13 hours ago)
|02| 47b94c0 -- Add leadingSpaces/trailingSpaces methods to TokenList (13 hours ago)
|03| 8bd026d -- Merge pull request scalacenter#503 from MasseGuillaume/feature/368-escape-patches (13 hours ago)
|04| 20e445f -- Escape hatch on Patch (fix scalacenter#368) (13 hours ago)
|05| a2c5d70 -- Merge pull request scalacenter#500 from MasseGuillaume/feature/495-custom-id (13 hours ago)
|06| 59efe7d -- Add CustomMessage to the public api (13 hours ago)
|07| 9ae6071 -- Add id for CustomMessage (fix scalacenter#495) (13 hours ago)
|08| e4a5c35 -- Merge pull request scalacenter#494 from MasseGuillaume/disable-regex (13 hours ago)
|09| a422860 -- Merge pull request scalacenter#497 from olafurpg/disable-signatures (13 hours ago)
|10| 7930947 -- DisableSyntax add regex (13 hours ago)
|11| 5dbdd6b -- IntervalSet test for empty and add toString (13 hours ago)
|12| b022fbd -- DisableSyntax don't repeat DisableSyntax.keyword in message (13 hours ago)
|13| a992b02 -- Assert instead of scalafix:ok (13 hours ago)
|14| 7896ccd -- Refactor Disable to use views. (13 hours ago)
|15| 58acdbe -- Fix scalacenter#493, handle synthetics and symbol signatures in Disable. (13 hours ago)
|16| b48d7f0 -- Merge pull request scalacenter#490 from olafurpg/unmanagedSources (13 hours ago)
|17| e9b2b0a -- s/canFormat/canFix/ (13 hours ago)
|18| 26be6fa -- Use unmanagedSources instead of unmanagedSourceDirectories. (13 hours ago)
|19| 4d46001 -- Merge pull request scalacenter#488 from olafurpg/master (13 hours ago)
```
@marcelocenerine marcelocenerine deleted the trim_includeNewLine branch February 19, 2018 18:49
@olafurpg olafurpg added this to the v0.6.0-M2 milestone Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants