-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-48152][BUILD] Make spark-profiler as a part of release and publish to maven central repo
#46402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @panbingkun .
spark-profiler as a part of Spark release
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I agree with publishing this module. However, I don't think we need to include this as Apache Spark Binary or dev/deps. This module is designed to be like kafka module, @panbingkun .
[Only test] make spark-profiler publish snapshot [SPARK-48152][BUILD] Make the module spark-profiler as a part of Spark release
| # dev/create-release/release-build.sh | ||
| HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pkubernetes -Pyarn -Phive \ | ||
| -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud" | ||
| -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud -Pjvm-profiler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like Kafka module, we should not include this here.
Do you know how we skip Kafka module's dependency here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ grep kafka dev/deps/spark-deps-hadoop-3-hive-2.3 | wc -l
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will remove it.
But I have a question. If the end user wants to use spark-profiler to analyze the executor, does he download 'me.bechberger:ap-loader-all:xxx' from the maven central repository and use it together with module spark-profiler? If that's the way it used to be, I am fine to it.
Maybe we need to add detailed guide in some document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LikeKafkamodule, we should not include this here.Do you know how we skip Kafka module's dependency here?
Let me investigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's identical with Apache Spark's Kafka module and Apache Spark Hadoop Cloud module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me add detailed guide in some document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LikeKafkamodule, we should not include this here.Do you know how we skip Kafka module's dependency here?
I already know about it. It is implemented through maven scope overloading. Let me try.
spark-profiler as a part of Spark releasespark-profiler to maven central repository
connector/profiler/pom.xml
Outdated
| <groupId>me.bechberger</groupId> | ||
| <artifactId>ap-loader-all</artifactId> | ||
| <version>3.0-8</version> | ||
| <version>3.0-9</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you proceed this dependency update separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate PR:
#46427
| * Linux (musl, x64) | ||
| * MacOS | ||
|
|
||
| To get maximum profiling information set the following jvm options for the executor : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need to rename the file connector/profiler/README.md to "jvm-profiler-integration.md" and move it to the directory docs/jvm-profiler-integration.md", while linking it in the file docs/building-spark.md? Do we need to do this? @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| **Note:** The `jvm-profiler` profile builds the assembly without including the dependency `ap-loader`, | ||
| you can download it manually from maven central repo and use it together with `spark-profiler_{{site.SCALA_BINARY_VERSION}}`. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I merged the following. Could you rebase this PR? |
| <groupId>me.bechberger</groupId> | ||
| <artifactId>ap-loader-all</artifactId> | ||
| <version>3.0-8</version> | ||
| <scope>provided</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @parthchandra , too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to include this feature in the spark release. I feel though, that if we are making it available in the release the ap-loader dependency should be included as well. (Currently, if we build with the jvm-profiler profile, the dependency is included in the build).
Surprisingly, the jar file is not very large (I have version 2.9.7 and it is only 830K).
Either way, including this in the release is already a big step so I'm fine with asking users to download the jar.
|
Could you revise the PR title, @panbingkun ? |
spark-profiler to maven central repositoryspark-profiler as a part of release and publish to maven central repo
Done. |
Done. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for Apache Spark 4.0.0-preview.
- Thank you for completing area, @panbingkun .
- Thank you for review and understanding the current usage, @parthchandra .
Let's bring this into 4.0.0-preview and get more feedback. We can revise later before 4.0.0.
|
Merged to master. |
Thanks. ❤️ |
|
It's so happy to observe that |
### What changes were proposed in this pull request? Bump ap-loader version from 3.0-8 to 3.0-9. ### Why are the changes needed? ap-loader has already released v3.0-9, which should bump version from 3.0-8 for `JVMProfiler`. Backport: 1. apache/spark#46402 2. apache/spark#49440 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI. Closes #3072 from SteNicholas/CELEBORN-1842. Authored-by: SteNicholas <programgeek@163.com> Signed-off-by: SteNicholas <programgeek@163.com>


What changes were proposed in this pull request?
The pr aims to
spark-profileras a part of Spark releasespark-profilertomaven central repositoryspark-profilerin the docdocs/building-spark.mdWhy are the changes needed?
1.The modules released in the current daily
spark-4.0.0do not includespark-profiler. I believe that according to the current logic, thespark-profilerwill not appear in the future official version of spark.2.Align the compilation description of other modules in doc

docs/building-spark.md, eg:Does this PR introduce any user-facing change?
Yes, make it easy for users to use
spark-profilerin the future version of Spark, instead of manually compilingspark-profilerbased on source code.How was this patch tested?
spark-profilter_2.13generatehttps://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-profiler_2.13/4.0.0-SNAPSHOT/
Was this patch authored or co-authored using generative AI tooling?
No.