Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

fix: Wrong default value of disabled facets property #359

Merged
merged 1 commit into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 11 additions & 13 deletions docs/integrations/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ whether the job runs properly.

## Limitations

Currently OpenLineage's Flink integration is limited to getting information from jobs running in Application Mode.
Currently, OpenLineage's Flink integration is limited to getting information from jobs running in Application Mode.

OpenLineage integration extracts lineage only from following `Sources` and `Sinks`:

Expand Down Expand Up @@ -76,10 +76,10 @@ In your job, you need to set up `OpenLineageFlinkJobListener`.

For example:
```java
JobListener listener = JobListener listener = OpenLineageFlinkJobListener.builder()
.executionEnvironment(streamExecutionEnvironment)
.build();
streamExecutionEnvironment.registerJobListener(listener);
JobListener listener = OpenLineageFlinkJobListener.builder()
.executionEnvironment(streamExecutionEnvironment)
.build();
streamExecutionEnvironment.registerJobListener(listener);
```

Also, OpenLineage needs certain parameters to be set in `flink-conf.yaml`:
Expand Down Expand Up @@ -114,17 +114,15 @@ and allows all the configuration features present there to be used. The configur
* `openlineage.yml` file with a environment property `OPENLINEAGE_CONFIG` being set and pointing to configuration file. File structure and allowed options are described [here](https://github.com/OpenLineage/OpenLineage/tree/main/client/java#configuration).
* Standard Flink configuration with the parameters defined below.

### Flink Configuration parameters
### Flink Configuration parameters

The following parameters can be specified:

| Parameter | Definition | Example |
------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;` | \[some_facet1;some_facet1\] |
| openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |


| Parameter | Definition | Example |
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[some_facet1;some_facet1\] |
| openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |

## Transports

Expand Down
30 changes: 15 additions & 15 deletions docs/integrations/spark/configuration/spark_conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ title: Spark Config Parameters

The following parameters can be specified:

| Parameter | Definition | Example |
----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| Parameter | Definition | Example |
|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
Loading