Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence Component - Big query support #1904

Merged
merged 55 commits into from
Jul 4, 2023

Conversation

prasar-ashutosh
Copy link
Contributor

What type of PR is this?

  • Improvement

What does this PR do / why is it needed ?

  • Big Query Support
  • Interface update to handle Big Query
  • Support for Derived dataset for all milestoning schemes
  • SQL & E2E Tests for Big Query

Which issue(s) this PR fixes:

Fixes #

Other notes for reviewers:

Does this PR introduce a user-facing change?

NO

@prasar-ashutosh prasar-ashutosh requested a review from a team as a code owner June 8, 2023 03:34
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jun 8, 2023

CLA Not Signed

@github-actions
Copy link

github-actions bot commented Jun 8, 2023

Test Results

     474 files       474 suites   47m 5s ⏱️
  9 848 tests   9 517 ✔️ 331 💤 0
11 710 runs  11 313 ✔️ 397 💤 0

Results for commit ae4b796.

♻️ This comment has been updated with latest results.

@prasar-ashutosh prasar-ashutosh marked this pull request as draft June 8, 2023 03:37
@prasar-ashutosh prasar-ashutosh changed the title DRAFT: Persistence Component - Big query support Persistence Component - Big query support Jun 15, 2023
@TheJuanAndOnly99
Copy link
Member

/easycla

…y_support

# Conflicts:
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/pom.xml
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/BigQuerySink.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/executor/BigQueryExecutor.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/executor/BigQueryHelper.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/executor/BigQueryTransactionManager.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/optmizer/StringCaseOptimizer.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/sql/visitor/DeleteVisitor.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/sql/visitor/FieldVisitor.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/main/java/org/finos/legend/engine/persistence/components/relational/bigquery/sqldom/schemaops/statements/AlterTable.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/BaseTestUtils.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/e2e/BigQueryEndToEndTest.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/ingestmode/BigQueryTestArtifacts.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/ingestmode/BitemporalDeltaSourceSpecifiesFromTest.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/ingestmode/NontemporalDeltaTest.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/java/org/finos/legend/engine/persistence/components/logicalplan/operations/AlterTest.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/resources/expected/append/data_pass2.csv
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-bigquery/src/test/resources/expected/nontemporal_delta/data_pass2.csv
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-core/src/main/java/org/finos/legend/engine/persistence/components/relational/RelationalSink.java
#	legend-engine-xt-persistence-component/legend-engine-xt-persistence-component-relational-core/src/main/java/org/finos/legend/engine/persistence/components/relational/sqldom/common/Clause.java


@Override
public void close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this empty block intentional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, intentional, JDBC Helper actually has a JDBC connection that needs to be closed explicitly, but Big Query doesn't need to close anything, that's why this is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont be better to have closeTransactionManager as private, and inside this close call that method? That way, you have a single close that cleans all resources, and in the future other resources might be added too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested, But keeping closeTransactionManager as public as it's inherited from RelationalExecutionHelper.

implicitDataTypeMapping.put(DataType.INTEGER, new HashSet<>(Arrays.asList(DataType.INT, DataType.BIGINT, DataType.TINYINT, DataType.SMALLINT, DataType.INT64)));
implicitDataTypeMapping.put(DataType.NUMERIC, new HashSet<>(Arrays.asList(DataType.NUMERIC, DataType.NUMBER, DataType.DECIMAL, DataType.INT, DataType.INTEGER, DataType.BIGINT, DataType.TINYINT, DataType.SMALLINT, DataType.INT64)));
implicitDataTypeMapping.put(DataType.FLOAT, new HashSet<>(Arrays.asList(DataType.REAL, DataType.DOUBLE, DataType.FLOAT64, DataType.INT, DataType.INTEGER, DataType.BIGINT, DataType.TINYINT, DataType.SMALLINT, DataType.INT64, DataType.NUMBER, DataType.NUMERIC, DataType.DECIMAL)));
implicitDataTypeMapping.put(DataType.STRING, new HashSet<>(Arrays.asList(DataType.STRING, DataType.CHAR, DataType.CHARACTER, DataType.VARCHAR, DataType.LONGNVARCHAR, DataType.LONGTEXT, DataType.TEXT)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These HashSet should be wrappd on unmodifiable to actually make the IMPLICIT_DATA_TYPE_MAPPING fully unmodifiable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Changed as suggested

explicitDataTypeMapping.put(DataType.NUMBER, new HashSet<>(Arrays.asList(DataType.REAL, DataType.FLOAT, DataType.DOUBLE, DataType.FLOAT64)));
explicitDataTypeMapping.put(DataType.NUMERIC, new HashSet<>(Arrays.asList(DataType.REAL, DataType.FLOAT, DataType.DOUBLE, DataType.FLOAT64)));
explicitDataTypeMapping.put(DataType.DECIMAL, new HashSet<>(Arrays.asList(DataType.REAL, DataType.FLOAT, DataType.DOUBLE, DataType.FLOAT64)));
EXPLICIT_DATA_TYPE_MAPPING = Collections.unmodifiableMap(explicitDataTypeMapping);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - These HashSet should be wrappd on unmodifiable to actually make the IMPLICIT_DATA_TYPE_MAPPING fully unmodifiable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Changed as suggested



@Override
public void close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont be better to have closeTransactionManager as private, and inside this close call that method? That way, you have a single close that cleans all resources, and in the future other resources might be added too...

<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<scope>provided</scope>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these provided dependencies will work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a library, the user of the library is supposed to provide the dependencies.

case TINYINT:
case SMALLINT:
case INT64:
dataType = new Int64();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rational to fit everything on 64 bit? Seems as something that will lead to incorrect data.

Im not sure how these mappings are used, but if we allow 64 bit on some places that then we try to insert into the database into TinyInt, we might end with bit overflows, and we might not know the value on the DB will be wrong.

Thoughts?

Copy link
Contributor Author

@prasar-ashutosh prasar-ashutosh Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mapping is for mapping logical data types to BigQuery data types.
In Big Query, INT, SMALLINT, INTEGER, BIGINT, TINYINT, and BYTEINT are aliases for INT64.
So we have mapped all of them to INT64.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric_types

return FieldType.builder().dataType(DataType.NUMERIC).length(columnSize).scale(decimalDigits).build();
case FLOAT64:
case FLOAT:
return FieldType.builder().dataType(DataType.FLOAT).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the same line as prev comment on INT64, FLOAT and FLAOT 64 are two different types that have different ranges of values they can store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery only supports FLOAT64, so we have mapped both these logical types to FLOAT64

@epsstan epsstan merged commit 594db31 into finos:master Jul 4, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants