Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented May 6, 2024

What changes were proposed in this pull request?

The pr aims to

  • make the module spark-profiler as a part of Spark release
  • publish the module spark-profiler to maven central repository
  • add instructions on how to compile supports spark-profiler in the doc docs/building-spark.md

Why are the changes needed?

1.The modules released in the current daily spark-4.0.0 do not include spark-profiler. I believe that according to the current logic, the spark-profiler will not appear in the future official version of spark.

2.Align the compilation description of other modules in doc docs/building-spark.md, eg:
image

Does this PR introduce any user-facing change?

Yes, make it easy for users to use spark-profiler in the future version of Spark, instead of manually compiling spark-profiler based on source code.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @panbingkun .

@panbingkun panbingkun changed the title [Only test] make spark-profiler publish snapshot [SPARK-48152][BUILD] Make the module spark-profiler as a part of Spark release May 7, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I agree with publishing this module. However, I don't think we need to include this as Apache Spark Binary or dev/deps. This module is designed to be like kafka module, @panbingkun .

[Only test] make spark-profiler publish snapshot [SPARK-48152][BUILD] Make the module spark-profiler as a part of Spark release

# dev/create-release/release-build.sh
HADOOP_MODULE_PROFILES="-Phive-thriftserver -Pkubernetes -Pyarn -Phive \
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud"
-Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud -Pjvm-profiler"
Copy link
Member

@dongjoon-hyun dongjoon-hyun May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ grep kafka dev/deps/spark-deps-hadoop-3-hive-2.3 | wc -l
       0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remove it.
But I have a question. If the end user wants to use spark-profiler to analyze the executor, does he download 'me.bechberger:ap-loader-all:xxx' from the maven central repository and use it together with module spark-profiler? If that's the way it used to be, I am fine to it.
Maybe we need to add detailed guide in some document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

Let me investigate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me add detailed guide in some document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Kafka module, we should not include this here.

Do you know how we skip Kafka module's dependency here?

I already know about it. It is implemented through maven scope overloading. Let me try.

@panbingkun panbingkun changed the title [SPARK-48152][BUILD] Make the module spark-profiler as a part of Spark release [SPARK-48152][BUILD] Publish the module spark-profiler to maven central repository May 7, 2024
<groupId>me.bechberger</groupId>
<artifactId>ap-loader-all</artifactId>
<version>3.0-8</version>
<version>3.0-9</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you proceed this dependency update separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate PR:
#46427

* Linux (musl, x64)
* MacOS

To get maximum profiling information set the following jvm options for the executor :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to rename the file connector/profiler/README.md to "jvm-profiler-integration.md" and move it to the directory docs/jvm-profiler-integration.md", while linking it in the file docs/building-spark.md? Do we need to do this? @dongjoon-hyun

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


**Note:** The `jvm-profiler` profile builds the assembly without including the dependency `ap-loader`,
you can download it manually from maven central repo and use it together with `spark-profiler_{{site.SCALA_BINARY_VERSION}}`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual effect is like this:
image

@panbingkun panbingkun marked this pull request as ready for review May 7, 2024 11:55
@dongjoon-hyun
Copy link
Member

I merged the following. Could you rebase this PR?

<groupId>me.bechberger</groupId>
<artifactId>ap-loader-all</artifactId>
<version>3.0-8</version>
<scope>provided</scope>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @parthchandra , too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to include this feature in the spark release. I feel though, that if we are making it available in the release the ap-loader dependency should be included as well. (Currently, if we build with the jvm-profiler profile, the dependency is included in the build).
Surprisingly, the jar file is not very large (I have version 2.9.7 and it is only 830K).
Either way, including this in the release is already a big step so I'm fine with asking users to download the jar.

@dongjoon-hyun
Copy link
Member

Could you revise the PR title, @panbingkun ?

@panbingkun panbingkun changed the title [SPARK-48152][BUILD] Publish the module spark-profiler to maven central repository [SPARK-48152][BUILD] Make spark-profiler as a part of release and publish to maven central repo May 7, 2024
@panbingkun
Copy link
Contributor Author

I merged the following. Could you rebase this PR?

Done.

@panbingkun
Copy link
Contributor Author

Could you revise the PR title, @panbingkun ?

Done.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Apache Spark 4.0.0-preview.

  • Thank you for completing area, @panbingkun .
  • Thank you for review and understanding the current usage, @parthchandra .

Let's bring this into 4.0.0-preview and get more feedback. We can revise later before 4.0.0.

@dongjoon-hyun
Copy link
Member

Merged to master.

@panbingkun
Copy link
Contributor Author

Let's bring this into 4.0.0-preview and get more feedback. We can revise later before 4.0.0.

Thanks. ❤️

@panbingkun
Copy link
Contributor Author

It's so happy to observe that maven's snapshot repo has spark-profiler:
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-profiler_2.13/4.0.0-SNAPSHOT/
image

SteNicholas added a commit to apache/celeborn that referenced this pull request Jan 21, 2025
### What changes were proposed in this pull request?

Bump ap-loader version from 3.0-8 to 3.0-9.

### Why are the changes needed?

ap-loader has already released v3.0-9, which should bump version from 3.0-8 for `JVMProfiler`.

Backport:

1. apache/spark#46402
2. apache/spark#49440

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #3072 from SteNicholas/CELEBORN-1842.

Authored-by: SteNicholas <programgeek@163.com>
Signed-off-by: SteNicholas <programgeek@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants