-
Notifications
You must be signed in to change notification settings - Fork 324
Allow instrumented Spark trace linked to Openlineage originated context #7450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
1f7e31b
add OpenlineageContext to capture the parent context from Openlineage…
yiliangzhou 6fcfb34
create parent span id from airflow task run for OpenLineageParentContext
yiliangzhou 31c2ac8
Merge branch 'master' into liangzhou.yi/airflow-to-spark-lineage
yiliangzhou 46d882e
Creat root span's trace id and span id for spark application if openl…
yiliangzhou 66f3390
add tests for OpenlineageParentContext
yiliangzhou 32116f9
ensure all openlineage context present when we capture it as parent c…
yiliangzhou 0dc4674
Merge branch 'master' into liangzhou.yi/airflow-to-spark-lineage
yiliangzhou 0784808
Use SHA-256 in generating trace id and root span id when Openlineage …
yiliangzhou fdeb1fb
create OpenlineageParentContext only all required fields are present …
yiliangzhou 33038c8
Merge branch 'master' into liangzhou.yi/airflow-to-spark-lineage
yiliangzhou dbf8e55
Merge branch 'master' into liangzhou.yi/airflow-to-spark-lineage
yiliangzhou 252431c
update test to avoid codenarcTestFixtures complaints
yiliangzhou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
157 changes: 157 additions & 0 deletions
157
...ion/spark/src/main/java/datadog/trace/instrumentation/spark/OpenlineageParentContext.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| package datadog.trace.instrumentation.spark; | ||
|
|
||
| import datadog.trace.api.DDSpanId; | ||
| import datadog.trace.api.DDTraceId; | ||
| import datadog.trace.api.sampling.PrioritySampling; | ||
| import datadog.trace.bootstrap.instrumentation.api.AgentSpan; | ||
| import datadog.trace.bootstrap.instrumentation.api.AgentTraceCollector; | ||
| import datadog.trace.bootstrap.instrumentation.api.AgentTracer; | ||
| import datadog.trace.bootstrap.instrumentation.api.PathwayContext; | ||
| import java.nio.ByteBuffer; | ||
| import java.nio.charset.StandardCharsets; | ||
| import java.security.MessageDigest; | ||
| import java.security.NoSuchAlgorithmException; | ||
| import java.util.Collections; | ||
| import java.util.Map; | ||
| import java.util.Optional; | ||
| import java.util.regex.Pattern; | ||
| import org.apache.spark.SparkConf; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class OpenlineageParentContext implements AgentSpan.Context { | ||
| private static final Logger log = LoggerFactory.getLogger(OpenlineageParentContext.class); | ||
| private static final Pattern UUID = | ||
| Pattern.compile( | ||
| "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$"); | ||
|
|
||
| private final DDTraceId traceId; | ||
| private final long spanId; | ||
| private final long childRootSpanId; | ||
|
|
||
| private final String parentJobNamespace; | ||
| private final String parentJobName; | ||
| private final String parentRunId; | ||
|
|
||
| public static final String OPENLINEAGE_PARENT_JOB_NAMESPACE = | ||
| "spark.openlineage.parentJobNamespace"; | ||
| public static final String OPENLINEAGE_PARENT_JOB_NAME = "spark.openlineage.parentJobName"; | ||
| public static final String OPENLINEAGE_PARENT_RUN_ID = "spark.openlineage.parentRunId"; | ||
|
|
||
| public static Optional<OpenlineageParentContext> from(SparkConf sparkConf) { | ||
| if (!sparkConf.contains(OPENLINEAGE_PARENT_JOB_NAMESPACE) | ||
| || !sparkConf.contains(OPENLINEAGE_PARENT_JOB_NAME) | ||
| || !sparkConf.contains(OPENLINEAGE_PARENT_RUN_ID)) { | ||
| return Optional.empty(); | ||
| } | ||
|
|
||
| String parentJobNamespace = sparkConf.get(OPENLINEAGE_PARENT_JOB_NAMESPACE); | ||
| String parentJobName = sparkConf.get(OPENLINEAGE_PARENT_JOB_NAME); | ||
| String parentRunId = sparkConf.get(OPENLINEAGE_PARENT_RUN_ID); | ||
|
|
||
| if (!UUID.matcher(parentRunId).matches()) { | ||
| return Optional.empty(); | ||
| } | ||
|
|
||
| return Optional.of( | ||
| new OpenlineageParentContext(parentJobNamespace, parentJobName, parentRunId)); | ||
| } | ||
|
|
||
| OpenlineageParentContext(String parentJobNamespace, String parentJobName, String parentRunId) { | ||
| log.debug( | ||
| "Creating OpenlineageParentContext with parentJobNamespace: {}, parentJobName: {}, parentRunId: {}", | ||
| parentJobNamespace, | ||
| parentJobName, | ||
| parentRunId); | ||
|
|
||
| this.parentJobNamespace = parentJobNamespace; | ||
| this.parentJobName = parentJobName; | ||
| this.parentRunId = parentRunId; | ||
|
|
||
| MessageDigest digest = null; | ||
| try { | ||
| digest = MessageDigest.getInstance("SHA-256"); | ||
| } catch (NoSuchAlgorithmException e) { | ||
| log.debug("Unable to find SHA-256 algorithm", e); | ||
| } | ||
|
|
||
| if (digest != null && parentJobNamespace != null && parentRunId != null) { | ||
| traceId = computeTraceId(digest, parentJobNamespace, parentJobName, parentRunId); | ||
| spanId = DDSpanId.ZERO; | ||
|
|
||
| childRootSpanId = | ||
| computeChildRootSpanId(digest, parentJobNamespace, parentJobName, parentRunId); | ||
| } else { | ||
| traceId = DDTraceId.ZERO; | ||
| spanId = DDSpanId.ZERO; | ||
|
|
||
| childRootSpanId = DDSpanId.ZERO; | ||
| } | ||
|
|
||
| log.debug("Created OpenlineageParentContext with traceId: {}, spanId: {}", traceId, spanId); | ||
| } | ||
|
|
||
| private long computeChildRootSpanId( | ||
| MessageDigest digest, String parentJobNamespace, String parentJobName, String parentRunId) { | ||
| byte[] inputBytes = | ||
| (parentJobNamespace + parentJobName + parentRunId).getBytes(StandardCharsets.UTF_8); | ||
| byte[] hash = digest.digest(inputBytes); | ||
|
|
||
| return ByteBuffer.wrap(hash).getLong(); | ||
| } | ||
|
|
||
| private DDTraceId computeTraceId( | ||
| MessageDigest digest, String parentJobNamespace, String parentJobName, String parentRunId) { | ||
| byte[] inputBytes = | ||
| (parentJobNamespace + parentJobName + parentRunId).getBytes(StandardCharsets.UTF_8); | ||
| byte[] hash = digest.digest(inputBytes); | ||
|
|
||
| return DDTraceId.from(ByteBuffer.wrap(hash).getLong()); | ||
| } | ||
|
|
||
| @Override | ||
| public DDTraceId getTraceId() { | ||
| return traceId; | ||
| } | ||
|
|
||
| @Override | ||
| public long getSpanId() { | ||
| return spanId; | ||
| } | ||
|
|
||
| public long getChildRootSpanId() { | ||
| return childRootSpanId; | ||
| } | ||
|
|
||
| @Override | ||
| public AgentTraceCollector getTraceCollector() { | ||
| return AgentTracer.NoopAgentTraceCollector.INSTANCE; | ||
| } | ||
|
|
||
| @Override | ||
| public int getSamplingPriority() { | ||
| return PrioritySampling.USER_KEEP; | ||
| } | ||
|
|
||
| @Override | ||
| public Iterable<Map.Entry<String, String>> baggageItems() { | ||
| return Collections.<String, String>emptyMap().entrySet(); | ||
| } | ||
|
|
||
| @Override | ||
| public PathwayContext getPathwayContext() { | ||
| return null; | ||
| } | ||
|
|
||
| public String getParentJobNamespace() { | ||
| return parentJobNamespace; | ||
| } | ||
|
|
||
| public String getParentJobName() { | ||
| return parentJobName; | ||
| } | ||
|
|
||
| public String getParentRunId() { | ||
| return parentRunId; | ||
| } | ||
| } |
87 changes: 87 additions & 0 deletions
87
...stFixtures/groovy/datadog/trace/instrumentation/spark/OpenlineageParentContextTest.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| package datadog.trace.instrumentation.spark | ||
|
|
||
| import datadog.trace.api.DDSpanId | ||
| import org.apache.spark.SparkConf | ||
| import spock.lang.Specification | ||
|
|
||
| class OpenlineageParentContextTest extends Specification { | ||
| def "should create none empty OpenLineageParentContext using SHA-256 for TraceID and root span SpanId if all required fields are present" () { | ||
| given: | ||
| SparkConf mockSparkConf = Mock(SparkConf) | ||
|
|
||
| when: | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAMESPACE) >> true | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAME) >> true | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_RUN_ID) >> true | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAMESPACE) >> "default" | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAME) >> "dag-push-to-s3-spark.upload_to_s3" | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_RUN_ID) >> parentRunId | ||
|
|
||
| then: | ||
| Optional<OpenlineageParentContext> parentContext = OpenlineageParentContext.from(mockSparkConf) | ||
| parentContext.isPresent() | ||
|
|
||
| parentContext.get().getParentJobNamespace() == "default" | ||
| parentContext.get().getParentJobName() == "dag-push-to-s3-spark.upload_to_s3" | ||
| parentContext.get().getParentRunId() == expectedParentRunId | ||
|
|
||
| parentContext.get().traceId.toLong() == expectedTraceId | ||
| parentContext.get().spanId == DDSpanId.ZERO | ||
| parentContext.get().childRootSpanId == expectedRootSpanId | ||
|
|
||
| where: | ||
| parentRunId | expectedParentRunId | expectedTraceId | expectedRootSpanId | ||
| "ad3b6baa-8d88-3b38-8dbe-f06232249a84" | "ad3b6baa-8d88-3b38-8dbe-f06232249a84" | 0xa475569dbce5e6cfL | 0xa475569dbce5e6cfL | ||
| "ad3b6baa-8d88-3b38-8dbe-f06232249a85" | "ad3b6baa-8d88-3b38-8dbe-f06232249a85" | 0x31da6680bd14991bL | 0x31da6680bd14991bL | ||
| } | ||
|
|
||
| def "should create empty OpenLineageParentContext if any required field is missing" () { | ||
| given: | ||
| SparkConf mockSparkConf = Mock(SparkConf) | ||
|
|
||
| when: | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAMESPACE) >> jobNamespacePresent | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAME) >> jobNamePresent | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_RUN_ID) >> runIdPresent | ||
|
|
||
| then: | ||
| Optional<OpenlineageParentContext> parentContext = OpenlineageParentContext.from(mockSparkConf) | ||
| parentContext.isPresent() == expected | ||
|
|
||
| where: | ||
| jobNamespacePresent | jobNamePresent | runIdPresent | expected | ||
| true | true | false | false | ||
| true | false | true | false | ||
| false | true | true | false | ||
| true | false | false | false | ||
| false | true | false | false | ||
| false | false | true | false | ||
| false | false | false | false | ||
| } | ||
|
|
||
| def "should only generate a non-empty OpenlineageParentContext if parentRunId is a valid UUID" () { | ||
| given: | ||
| SparkConf mockSparkConf = Mock(SparkConf) | ||
|
|
||
| when: | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAMESPACE) >> true | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAME) >> true | ||
| mockSparkConf.contains(OpenlineageParentContext.OPENLINEAGE_PARENT_RUN_ID) >> true | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAMESPACE) >> "default" | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_JOB_NAME) >> "dag-push-to-s3-spark.upload_to_s3" | ||
| mockSparkConf.get(OpenlineageParentContext.OPENLINEAGE_PARENT_RUN_ID) >> runId | ||
|
|
||
| then: | ||
| Optional<OpenlineageParentContext> parentContext = OpenlineageParentContext.from(mockSparkConf) | ||
| parentContext.isPresent() == expected | ||
|
|
||
| where: | ||
| runId | expected | ||
| "6afeb6ee-729d-37f7-ad73-b8e6f47ca694" | true | ||
| " " | false | ||
| "invalid-uuid" | false | ||
| "6afeb6ee-729d-37f7-b8e6f47ca694" | false | ||
| "6AFEB6EE-729D-37F7-AD73-B8E6F47CA694" | true | ||
| } | ||
| } | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.