Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

gisripa · 2024-05-11T00:15:27Z

What

Removing redundant references and duplicate information passed around using WriteConfig objects. No functional changes and resurrected all the information needed through StreamConfig and adapted changes accordingly.

This PR should be in a mergeable state with no functional changes after the ones down the stack are published.

Review guide

Removed references of BigQueryWriteConfig and reused already built StreamConfig
Removing unnecessary StagingOperations interface and made concrete class, this will help for later adding a shim on this and refactoring without large changes
Removed other unnecessary references of getting dynamic schema, WriteDispostion etc. Probably remnant of bigquery-denormalized bespoke connector.

User Impact

Can this PR be safely reverted and rolled back?

YES 💚
NO ❌

vercel · 2024-05-11T00:15:29Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 13, 2024 5:47pm

gisripa · 2024-05-11T00:15:42Z

Destinations CDK: Add interfaces for operations by responsibility #38107 : 2 dependent PRs (#38132 , #38173 )
Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131 👈
Destinations CDK: Extract generation ID from catalog #38127 : 1 other dependent PR (#38126 )
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @gisripa and the rest of your teammates on Graphite

edgao

couple minor comments, nothing blocking (and some of them are about the CDK PR this is stacked on :P )

edgao · 2024-05-13T15:45:19Z

...bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDestination.java

@@ -234,7 +236,8 @@ public SerializedAirbyteMessageConsumer getSerializedMessageConsumer(final JsonN
    final String datasetLocation = BigQueryUtils.getDatasetLocation(config);
    final BigQuerySqlGenerator sqlGenerator = new BigQuerySqlGenerator(config.get(BigQueryConsts.CONFIG_PROJECT_ID).asText(), datasetLocation);
    final Optional<String> rawNamespaceOverride = TypingAndDedupingFlag.getRawNamespaceOverride(RAW_DATA_DATASET);
-    final ParsedCatalog parsedCatalog = parseCatalog(config, catalog, datasetLocation, rawNamespaceOverride);
+    final ParsedCatalog parsedCatalog = parseCatalog(sqlGenerator, defaultNamespace,
+        rawNamespaceOverride.orElse(JavaBaseConstants.DEFAULT_AIRBYTE_INTERNAL_NAMESPACE), catalog);


random thought: now that we're in kotlin, do you want to make catalogparser accept rawNamespace: String? and do rawNamespace ?: DEFAULT_AIRBYTE_INTERNAL? then we don't need to copy this logic into every connector

Yeah makes sense to me.

edgao · 2024-05-13T15:51:42Z

...c/main/java/io/airbyte/integrations/destination/bigquery/BigQueryStagingConsumerFactory.java

-                                                                        final ParsedCatalog parsedCatalog,
-                                                                        final Function<JsonNode, BigQueryRecordFormatter> recordFormatterCreator,
-                                                                        final Function<String, String> tmpTableNameTransformer) {
+  private Map<StreamDescriptor, StreamConfig> createWriteConfigs(final ConfiguredAirbyteCatalog catalog,


can this just be parsedCatalog.getStreams().stream()? afaict we don't actually need the raw protocol models for anything

(and then we don't need to plumb the raw configured catalog into this method)

I think so, but the called method iterated over the Configuredcatalog and populates the map. Didn't want to change any functional logic for the fear of missing some defaultNamespace plumbing. up the stack the whole method will be removed.

edgao · 2024-05-13T15:53:43Z

...c/main/java/io/airbyte/integrations/destination/bigquery/BigQueryStagingConsumerFactory.java

-        // In Destinations V2, we will always use the 'airbyte' schema/namespace for raw tables
+            BigQueryRecordFormatter.SCHEMA_V2, streamConfig.getId().getOriginalName(),
+            tableId, streamConfig.getId().getOriginalName());
+        // In Destinations V2, we will always use the 'airbyte' schema/originalNamespace for raw tables


Suggested change

// In Destinations V2, we will always use the 'airbyte' schema/originalNamespace for raw tables

// In Destinations V2, we will always use the 'airbyte_internal' schema/originalNamespace for raw tables

😅

edgao · 2024-05-13T15:56:01Z

...irbyte/integrations/destination/bigquery/typing_deduping/BigqueryDestinationHandlerTest.java

        DestinationSyncMode.APPEND_DEDUP,
        List.of(new ColumnId("foo", "bar", "fizz")),
        Optional.empty(),
-        new LinkedHashMap<>());
+        new LinkedHashMap<>(), 0, 0, 0);


.... do you want me to set default values for the generation/sync ID args? These diffs seem kind of dumb

gisripa mentioned this pull request May 11, 2024

Destinations CDK: Add interfaces for operations by responsibility #38107

Merged

2 tasks

octavia-squidington-iii added area/connectors Connector related issues connectors/destination/bigquery labels May 11, 2024

gisripa changed the title ~~Bigquery cdk signature changes~~ Destination BigQuery: CDK signature changes May 11, 2024

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch 2 times, most recently from b43ce70 to dc2606b Compare May 11, 2024 00:20

gisripa mentioned this pull request May 11, 2024

Destination BigQuery: Adapt to newer interface for Sync operations #38132

Merged

2 tasks

gisripa force-pushed the cdk-ops-refactor branch from 3cd7744 to 3586a5c Compare May 12, 2024 19:17

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from dc2606b to f6903b7 Compare May 12, 2024 19:17

gisripa changed the base branch from cdk-ops-refactor to cdk_generation_id May 12, 2024 22:50

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from f6903b7 to cf090a7 Compare May 12, 2024 22:50

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from cf090a7 to 9c0dfc5 Compare May 13, 2024 03:13

gisripa changed the title ~~Destination BigQuery: CDK signature changes~~ Destination BigQuery: Consolidation of objects to StreamConfig, cleanup May 13, 2024

gisripa force-pushed the cdk_generation_id branch from 0724d77 to 0ded7a7 Compare May 13, 2024 03:27

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 9c0dfc5 to 4328f0a Compare May 13, 2024 03:27

gisripa force-pushed the cdk_generation_id branch from 0ded7a7 to 23a237f Compare May 13, 2024 15:42

edgao approved these changes May 13, 2024

View reviewed changes

gisripa marked this pull request as ready for review May 13, 2024 16:14

gisripa requested a review from a team as a code owner May 13, 2024 16:14

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 4328f0a to 3de0d51 Compare May 13, 2024 16:36

octavia-squidington-iii added the area/documentation Improvements or additions to documentation label May 13, 2024

vercel bot deployed to Preview May 13, 2024 16:40 View deployment

Base automatically changed from cdk_generation_id to master May 13, 2024 16:59

edgao requested a review from a team as a code owner May 13, 2024 16:59

Bigquery cdk signature changes

6bf8684

gisripa force-pushed the gireesh/05-10-Bigquery_cdk_signature_changes branch from 3de0d51 to 6bf8684 Compare May 13, 2024 17:43

vercel bot deployed to Preview May 13, 2024 17:47 View deployment

gisripa merged commit e0225c1 into master May 13, 2024
33 checks passed

gisripa deleted the gireesh/05-10-Bigquery_cdk_signature_changes branch May 13, 2024 18:09

edgao mentioned this pull request May 15, 2024

CDK ops test #38173

Closed

2 tasks

gisripa mentioned this pull request May 16, 2024

bq-delete-obsolote #38280

Merged

2 tasks

edgao mentioned this pull request May 17, 2024

Destination bigquery: Bump cdk again #38331

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

gisripa commented May 11, 2024 •

edited

Loading

vercel bot commented May 11, 2024 •

edited

Loading

gisripa commented May 11, 2024 •

edited by edgao

Loading

edgao left a comment

edgao May 13, 2024

gisripa May 13, 2024

edgao May 13, 2024

gisripa May 13, 2024

edgao May 13, 2024

gisripa May 13, 2024

edgao May 13, 2024

	// In Destinations V2, we will always use the 'airbyte' schema/originalNamespace for raw tables
	// In Destinations V2, we will always use the 'airbyte_internal' schema/originalNamespace for raw tables

Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

Destination BigQuery: Consolidation of objects to StreamConfig, cleanup #38131

Conversation

gisripa commented May 11, 2024 • edited Loading

What

Review guide

User Impact

Can this PR be safely reverted and rolled back?

vercel bot commented May 11, 2024 • edited Loading

gisripa commented May 11, 2024 • edited by edgao Loading

edgao left a comment

Choose a reason for hiding this comment

edgao May 13, 2024

Choose a reason for hiding this comment

gisripa May 13, 2024

Choose a reason for hiding this comment

edgao May 13, 2024

Choose a reason for hiding this comment

gisripa May 13, 2024

Choose a reason for hiding this comment

edgao May 13, 2024

Choose a reason for hiding this comment

gisripa May 13, 2024

Choose a reason for hiding this comment

edgao May 13, 2024

Choose a reason for hiding this comment

gisripa commented May 11, 2024 •

edited

Loading

vercel bot commented May 11, 2024 •

edited

Loading

gisripa commented May 11, 2024 •

edited by edgao

Loading