Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jan 10, 2025

What changes were proposed in this pull request?

This PR aims to fix lint-scala script not to ignore scalastyle errors.

Why are the changes needed?

This bug was introduced via the following PR at Apache Spark 3.4.0.

After the above PR, lint-scala ignores scalastyle error and only considers the exit code of scalafmt like the following CI result.

Screenshot 2025-01-10 at 07 14 31

Does this PR introduce any user-facing change?

No, this is a dev-only tool change.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

cc @grundprinzip, @HyukjinKwon

dev/lint-scala Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a weird patch. The mvn command should actually continue checking for $? and not for the error output being empty. In particular because it parses for "Unformatted files"

Not the second issue is that the lint-scala should not care about the code in

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/ExecutorKubernetesCredentialsFeatureStepSuite.scala

The mvn command explicitly just checks for

    -pl sql/api \
    -pl sql/connect/common \
    -pl sql/connect/server \
    -pl connector/connect/client/jvm \

Copy link
Contributor

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when I did the first version of the patch I got push-back to enforce scalastyle on pull requests and therefore this script only enforces Spark Connect related modules.

@dongjoon-hyun
Copy link
Member Author

I'm not sure why you say like that with -e.

This is a bit of a weird patch. The mvn command should actually continue checking for $? and not for the error output being empty. In particular because it parses for "Unformatted files"

Not the second issue is that the lint-scala should not care about the code in

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/ExecutorKubernetesCredentialsFeatureStepSuite.scala

The mvn command explicitly just checks for

    -pl sql/api \
    -pl sql/connect/common \
    -pl sql/connect/server \
    -pl connector/connect/client/jvm \

It's a standard way to stop script if some command exist abnormally during the script. Apache Spark utilizes it in many script. Are you aware of it, @grundprinzip ?

$ git grep 'set -e' | wc -l
      33

@grundprinzip
Copy link
Contributor

I'm not sure why you say like that with -e.

This is a bit of a weird patch. The mvn command should actually continue checking for $? and not for the error output being empty. In particular because it parses for "Unformatted files"
Not the second issue is that the lint-scala should not care about the code in

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/ExecutorKubernetesCredentialsFeatureStepSuite.scala

The mvn command explicitly just checks for

    -pl sql/api \
    -pl sql/connect/common \
    -pl sql/connect/server \
    -pl connector/connect/client/jvm \

It's a standard way to stop script if some command exist abnormally during the script. Apache Spark utilizes it in many script. Are you aware of it, @grundprinzip ?

$ git grep 'set -e' | wc -l
      33

yes using -e exits immediately when any bash command returns a non zero exit code. But this is even true for accessing an undefined variable or so. My point was that I'm not sure why the original command was changed from checking for $? to prepping the output.

The main reason for checking for the error is that the goal of the script is to be actionable and have a good error message that the user knows what to do to fix the error. Just failing the command does not help the user to understand how to fix it.

Lastly, my questions still remains why this came up for the ExecutorKubernetesCredentialsFeatureStepSuite.scala file because it's not in scope for the scalastyle check.

@dongjoon-hyun
Copy link
Member Author

Just failing the command does not help the user to understand how to fix it.

This sounds like a wrong claim to me. As I mentioned in the PR description, it fails with the rich information like the following.

$ dev/lint-scala
Using SPARK_LOCAL_IP=localhost
Scalastyle checks failed at following occurrences:
[error] /Users/dongjoon/APACHE/spark-merge/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/ExecutorKubernetesCredentialsFeatureStepSuite.scala:21:0: There should be no empty line separating imports in the same group.
[error] /Users/dongjoon/APACHE/spark-merge/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/ExecutorKubernetesCredentialsFeatureStepSuite.scala:21:0: io.fabric8.kubernetes.api.model.PodSpec is in wrong order relative to org.scalatest.BeforeAndAfter.
[error] Total time: 16 s, completed Jan 10, 2025, 7:32:59 AM

$ echo $?
1

This also looks wrong to me. ExecutorKubernetesCredentialsFeatureStepSuite.scala is a Scala code which is in the scope of scalastyle script should cover.

Lastly, my questions still remains why this came up for the ExecutorKubernetesCredentialsFeatureStepSuite.scala file because it's not in scope for the scalastyle check.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 10, 2025

Please note that scalastyle is the following which uses SBT.

spark/dev/scalastyle

Lines 20 to 29 in 9547a47

SPARK_PROFILES=${1:-"-Pkubernetes -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Phive-thriftserver -Phive -Pvolcano -Pjvm-profiler -Phadoop-cloud"}
# NOTE: echo "q" is needed because SBT prompts the user for input on encountering a build file
# with failure (either resolution or compilation); the "q" makes SBT quit.
ERRORS=$(echo -e "q\n" \
| build/sbt \
${SPARK_PROFILES} \
-Pdocker-integration-tests \
-Pkubernetes-integration-tests \
scalastyle test:scalastyle \

@dongjoon-hyun
Copy link
Member Author

Could you review this, @panbingkun ? The CI starts to fail at compilation due to this.

@dongjoon-hyun
Copy link
Member Author

Also, cc @huaxingao and @viirya . When you have some time, could you review this PR?

@grundprinzip
Copy link
Contributor

Hi @dongjoon-hyun , I was checking the history of the script and it seems it has never failed scalastyle for the past 5 years. See here for example: https://github.com/apache/spark/blob/f0b6245ea4cef0dc61d4277f1d6874ccf315e47a/dev/lint-scala

I'm absolutely in favor of enforcing scalastyle :)

I tested your PR to see if the error message for Spark Connect is still present. If the goal is to enforce Scalastyle then I'm +1 on the PR

@dongjoon-hyun
Copy link
Member Author

Thank you, @grundprinzip . Yes, you are right. Previously, we didn't hit this corner case before.

@dongjoon-hyun
Copy link
Member Author

Let me check my PR once more~

@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 10, 2025 17:01
@dongjoon-hyun
Copy link
Member Author

To recover master branch, I merged @panbingkun 's PR first because mine is still incomplete.

@dongjoon-hyun
Copy link
Member Author

I'll revisit this PR later Today.

@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 10, 2025 17:37
@dongjoon-hyun
Copy link
Member Author

This PR is ready back for review.

Screenshot 2025-01-10 at 09 55 02

Copy link
Contributor

@cnauroth cnauroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (non-binding) from my perspective. I wonder if we'll find any other pre-existing violations that need a cleanup first.

Thanks for the patch, @dongjoon-hyun .

@dongjoon-hyun
Copy link
Member Author

Thank you, @grundprinzip , @cnauroth , @viirya .

Merged to master.

There is no previous violation because technically Spark SBT build enforces this if we explicitly skip it by the env variable, NOLINT_ON_COMPILE.

dongjoon-hyun added a commit that referenced this pull request Jan 10, 2025
### What changes were proposed in this pull request?

This PR aims to fix `lint-scala` script not to ignore `scalastyle` errors.

### Why are the changes needed?

This bug was introduced via the following PR at Apache Spark 3.4.0.
- #38258

After the above PR, `lint-scala` ignores `scalastyle` error and only considers the exit code of `scalafmt` like the following CI result.
- #49428 (comment)

![Screenshot 2025-01-10 at 07 14 31](https://github.com/user-attachments/assets/bdaa3be3-5daf-401b-a46f-7c02b7610158)

### Does this PR introduce _any_ user-facing change?

No, this is a dev-only tool change.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49443 from dongjoon-hyun/SPARK-50784.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 9d4b7a5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-50784 branch January 10, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants