Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49241][CORE] Add OpenTelemetryPush Sink with opentelemetry profile #47763

Closed
wants to merge 22 commits into from

Conversation

TQJADE
Copy link
Contributor

@TQJADE TQJADE commented Aug 14, 2024

What changes were proposed in this pull request?

OpenTelemetryPushSink and OpenTelemetryPushReporter has been added in the supported Sink Classes.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

docker run \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector:0.107.0 \
  2>&1 | tee collector-output.txt # Optionally tee output for easier search later
  • Start Spark Applications and publishing the metrics
  • Observing the metrics to be printed.

Was this patch authored or co-authored using generative AI tooling?

No

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49241](Add OpenTelemetryPush Sink) [SPARK-49241][CORE] Add OpenTelemetryPush Sink Aug 14, 2024
@dongjoon-hyun
Copy link
Member

Thank you for making a PR, @TQJADE .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make CIs happy?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Aug 16, 2024

  1. It seems that there are multiple failures in CIs. Do you think those failures related to this PR?
  2. When we enable this profile, it seems that this PR introduces many dependencies. Do we need all?
$ dev/make-distribution.sh -Popentelemetry-reporter
$ ls -alh dist/jars/*opentelemetry*
-rw-r--r--@ 1 dongjoon  staff   138K Aug 16 09:20 dist/jars/opentelemetry-api-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    42K Aug 16 09:20 dist/jars/opentelemetry-api-incubator-1.40.0-alpha.jar
-rw-r--r--@ 1 dongjoon  staff    46K Aug 16 09:20 dist/jars/opentelemetry-context-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff   100K Aug 16 09:20 dist/jars/opentelemetry-exporter-common-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    68K Aug 16 09:20 dist/jars/opentelemetry-exporter-otlp-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff   188K Aug 16 09:20 dist/jars/opentelemetry-exporter-otlp-common-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    25K Aug 16 09:20 dist/jars/opentelemetry-exporter-sender-okhttp-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff   6.6K Aug 16 09:20 dist/jars/opentelemetry-sdk-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    53K Aug 16 09:20 dist/jars/opentelemetry-sdk-common-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    19K Aug 16 09:20 dist/jars/opentelemetry-sdk-extension-autoconfigure-spi-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff    53K Aug 16 09:20 dist/jars/opentelemetry-sdk-logs-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff   302K Aug 16 09:20 dist/jars/opentelemetry-sdk-metrics-1.40.0.jar
-rw-r--r--@ 1 dongjoon  staff   125K Aug 16 09:20 dist/jars/opentelemetry-sdk-trace-1.40.0.jar
  1. Does this introduce any transitive dependencies too?

core/pom.xml Outdated
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
<version>1.40.0</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49241][CORE] Add OpenTelemetryPush Sink [SPARK-49241][CORE] Add OpenTelemetryPush Sink with opentelemetry profile Aug 25, 2024
pollInterval: Int = 10,
pollUnit: TimeUnit = TimeUnit.SECONDS,
host: String = "http://localhost",
port: String = "4317",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun 4317 is port for grpc and 4318 is for HTTP/HTTPS. Should I rename the port to grpcPort?

https://opentelemetry.io/docs/languages/sdk-configuration/otlp-exporter/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @TQJADE , you are using http:// in line 39, aren't you? Are you using grpc instead of HTTP here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean? Do you mean OpenTelemetry requires HTTP with 4317 ? If we need to use 4317 with HTTP, which HTTP call use 4318 port?

Copy link
Contributor Author

@TQJADE TQJADE Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with grpc endpoint, when I tested, the endpoint should be started with http or https. The default value of Opentelmetry GRPC endpoint port is 4317. HTTP endpoint port is 4318.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we are repeating the same questions and answers again and again. :)

What I asked to confirm is that why we are not using the default value of HTTP endpoint port (4318) here when the endpoint starts with http. This PR uses http://localhost:4317 instead of http://localhost:4318, doesn't it? We know that your sentence is correct, but it sounds like contradiction. We need an explanation for this contradiction.

the endpoint should be started with http or https. The default value of Opentelmetry GRPC endpoint port is 4317. HTTP endpoint port is 4318.


import org.apache.spark.SparkFunSuite

class OpenTelemetryPushSinkSuite extends SparkFunSuite with PrivateMethodTester {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like having a test coverage for only object OpenTelemetryPushSink.

Could you add more test coverage for class OpenTelemetryPushSink?


import org.apache.spark.SparkFunSuite

class OpenTelemetryPushReporterSuite extends SparkFunSuite with PrivateMethodTester {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add more test coverage?

core/pom.xml Outdated
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>3.13.0</version>
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that Apache Spark's okhttp is 3.12.12. So, we had better use 3.12.12 even for the test case in order to make it sure that there is no problem inside Apache Spark 4.0.0 release.

okhttp/3.12.12//okhttp-3.12.12.jar

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/pom.xml Outdated
@@ -651,7 +651,8 @@
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>3.13.0</version>
<!--version should be same with https://github.com/apache/spark/blob/0895471ad3b2a82668ea7e83dd20dda6b17f145f/dev/deps/spark-deps-hadoop-3-hive-2.3#L226-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @TQJADE .
Merged to master.

@LuciferYang
Copy link
Contributor

Is it necessary to activate the opentelemetry profile to execute unit tests in IntelliJ, regardless of whether it's Maven or SBT? I've found that without activating the opentelemetry profile, I'm unable to execute unit tests in IntelliJ. For example, when I try to execute DataSourceV2SQLSuiteV2Filter in IntelliJ, I encounter the following error:

image . Although manually activating the `opentelemetry` profile can serve as a workaround, this is not a good experience because `opentelemetry` is not activated by default. Or are there any other correct ways to ensure it works?

cc @TQJADE @dongjoon-hyun

also cc @panbingkun

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 3, 2024

Got it.

Actually, this PR follows the existing Volcano profile way which has the same side-effect. However, this seems more disruptive because this is in core while Volcano affects only K8s module.

To @TQJADE , could you find a better way to mitigate the above situation?

@mridulm
Copy link
Contributor

mridulm commented Sep 4, 2024

QQ: Cant this not live outside of Apache Spark - and use the existing integrations in MetricSystem ?
I am trying to understand why we need to include it as part of Spark, and not as an external package - are the current extension mechanisms not sufficient ?

+CC @dongjoon-hyun

@LuciferYang
Copy link
Contributor

Got it.

Actually, this PR follows the existing Volcano profile way which has the same side-effect. However, this seems more disruptive because this is in core while Volcano affects only K8s module.

To @TQJADE , could you find a better way to mitigate the above situation?

cc @dongjoon-hyun @TQJADE In fact, due to our use of sbt-pom-reader, on the Maven side, we can utilize the build-helper-maven-plugin to dynamically load additional src and test code paths. The configuration for build-helper-maven-plugin is also dynamically activated in SparkBuild.scala without requiring any additional configuration. I think this method should resolve the issue I mentioned.

I have submitted a pr: #47986.

Before supplementing the PR description, I will take some time to conduct additional verification.

I am trying to understand why we need to include it as part of Spark, and not as an external package - are the current extension mechanisms not sufficient ?

Of course, this question also requires a clear answer :)

@TQJADE
Copy link
Contributor Author

TQJADE commented Sep 4, 2024

I am trying to understand why we need to include it as part of Spark, and not as an external package - are the current extension mechanisms not sufficient ?

@mridulm @LuciferYang
I would like to use OpenTelemetry instrumentations to “translate” the codahale metrics of Spark Jobs to opentelemetry metrics. However this is disabled by default as below with reasons:
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md#disabled-instrumentations

Some instrumentations can produce too many spans and make traces very noisy. For this reason, the following instrumentations are disabled by default. dropwizard-metrics which might create a very low quality metrics data, because of lack of label/attribute support in the Dropwizard metrics API.

That is the reason I would like to contribute to Apache Spark side. The OpenTelemetryPushSink is similar to StatsdSink, to push the metrics to target services.
cc: @dongjoon-hyun

@mridulm
Copy link
Contributor

mridulm commented Sep 5, 2024

I am trying to understand whether there are any limitations due to which this integration has to be included as part of Apache Spark.
In other words, can this be an external integration/library that users include for the spark apps which need it: I am not familiar with opentelemetry - so trying to reason about the value of including it within Spark.

@dongjoon-hyun
Copy link
Member

Your point is correct, @mridulm .

  • There is no limitation in Spark architecture and this can be an external library without any problem.
  • Initially, I misunderstood the context of the existing opentelemetry's Disabled instrumentations as discontinued or deprecated ones. It's my bad.

Let me revert this commit. Thank you for all your valuable reviews, @mridulm , @LuciferYang . Personally, sorry for wasting your time, @TQJADE .

@dongjoon-hyun
Copy link
Member

This is reverted via 628cd75

@mridulm
Copy link
Contributor

mridulm commented Sep 5, 2024

Thanks for the details @dongjoon-hyun ! Really appreciate it.

IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
… profile

### What changes were proposed in this pull request?
OpenTelemetryPushSink and OpenTelemetryPushReporter has been added in the supported Sink Classes.

### Why are the changes needed?

- From OpenTelemetry side, they have added the support from a pull model: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/apachesparkreceiver

- However, for a very short jobs, only the push model is applicable. Therefore, a new sink OpenTelemetryPushSink and a new reporter OpenTelemetryPushReporter have been introduced here.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
* Build the Apache Spark Core Snapshot. Reference the Snapshot and giving the local OpenTelemetry endpoint `127.0.0.1:4317` as the endpoint for OpenTelemetryReporter.
* Start the open-telemetry collector locally, per: https://opentelemetry.io/docs/collector/quick-start/#generate-and-collect-telemetry

```
docker run \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector:0.107.0 \
  2>&1 | tee collector-output.txt # Optionally tee output for easier search later
```

* Start Spark Applications and publishing the metrics
* Observing the metrics to be printed.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47763 from TQJADE/sink.

Authored-by: Qi Tan <qi_tan@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
… profile

### What changes were proposed in this pull request?
OpenTelemetryPushSink and OpenTelemetryPushReporter has been added in the supported Sink Classes.

### Why are the changes needed?

- From OpenTelemetry side, they have added the support from a pull model: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/apachesparkreceiver

- However, for a very short jobs, only the push model is applicable. Therefore, a new sink OpenTelemetryPushSink and a new reporter OpenTelemetryPushReporter have been introduced here.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
* Build the Apache Spark Core Snapshot. Reference the Snapshot and giving the local OpenTelemetry endpoint `127.0.0.1:4317` as the endpoint for OpenTelemetryReporter.
* Start the open-telemetry collector locally, per: https://opentelemetry.io/docs/collector/quick-start/#generate-and-collect-telemetry

```
docker run \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector:0.107.0 \
  2>&1 | tee collector-output.txt # Optionally tee output for easier search later
```

* Start Spark Applications and publishing the metrics
* Observing the metrics to be printed.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47763 from TQJADE/sink.

Authored-by: Qi Tan <qi_tan@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
… profile

### What changes were proposed in this pull request?
OpenTelemetryPushSink and OpenTelemetryPushReporter has been added in the supported Sink Classes.

### Why are the changes needed?

- From OpenTelemetry side, they have added the support from a pull model: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/apachesparkreceiver

- However, for a very short jobs, only the push model is applicable. Therefore, a new sink OpenTelemetryPushSink and a new reporter OpenTelemetryPushReporter have been introduced here.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
* Build the Apache Spark Core Snapshot. Reference the Snapshot and giving the local OpenTelemetry endpoint `127.0.0.1:4317` as the endpoint for OpenTelemetryReporter.
* Start the open-telemetry collector locally, per: https://opentelemetry.io/docs/collector/quick-start/#generate-and-collect-telemetry

```
docker run \
  -p 127.0.0.1:4317:4317 \
  -p 127.0.0.1:55679:55679 \
  otel/opentelemetry-collector:0.107.0 \
  2>&1 | tee collector-output.txt # Optionally tee output for easier search later
```

* Start Spark Applications and publishing the metrics
* Observing the metrics to be printed.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47763 from TQJADE/sink.

Authored-by: Qi Tan <qi_tan@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants