-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seamlessly propagate schema changes made after pipelines starts running for Spanner Change Streams to BigQuery template #1730
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ShuranZhang
force-pushed
the
SchemaUpdateIT
branch
3 times, most recently
from
July 16, 2024 16:08
2a1f6c0
to
30bc294
Compare
ShuranZhang
force-pushed
the
SchemaUpdateIT
branch
from
July 16, 2024 20:04
30bc294
to
c83d18a
Compare
ShuranZhang
force-pushed
the
SchemaUpdateIT
branch
2 times, most recently
from
July 30, 2024 19:14
381b528
to
776929d
Compare
nancyxu123
approved these changes
Aug 2, 2024
ShuranZhang
force-pushed
the
SchemaUpdateIT
branch
2 times, most recently
from
August 6, 2024 17:55
82e9deb
to
1612c84
Compare
…ng for Spanner Change Streams to BigQuery template.
ShuranZhang
force-pushed
the
SchemaUpdateIT
branch
from
August 27, 2024 01:18
1612c84
to
8209708
Compare
DLPTextToBigQueryStreamingIT.testDLPTextToBigQuery failed due to #1836 |
Abacn
approved these changes
Sep 4, 2024
ShuranZhang
changed the title
Add IT tests for handling schema update during spanner cdc to big query pipeline running
Seamlessly propagate schema changes made after pipelines starts running for Spanner Change Streams to BigQuery template
Sep 11, 2024
asthamohta
pushed a commit
to asthamohta/DataflowTemplates
that referenced
this pull request
Sep 15, 2024
…ry pipeline running (GoogleCloudPlatform#1730) * Seamlessly propagate schema changes made after pipelines starts running for Spanner Change Streams to BigQuery template. * Resolve merge conflicts * Fix missing columns in bq when initial record is UPDATE or DELETE * Improve test coverage for TypesUtils * Add IT tests for handling schema update during a running pipeline --------- Co-authored-by: Changyu Li <sherryl1780@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original feature description from @ChangyuLi28 :
This pr supports schema updates after pipeline starts running.
Before you add a new column to an existing tracked Cloud Spanner table, first add the column to the BigQuery changelog table. The new column must be NULLABLE. Then add the column to the Cloud Spanner table. The new column is automatically populated when the pipeline receives a new record with that column. It's recommended to wait for >10 minutes after making schema updates in BigQuery to start the load including the new column because of this issue.
To add a new table, add the table in the Cloud Spanner database first. The table is automatically created when the pipeline receives a record for the new table.
The template doesn't drop tables or columns from BigQuery. If a column is dropped from the Cloud Spanner table, then null values are populated to these changelog columns for records generated after the columns are dropped from the Cloud Spanner table, unless you manually drop the column from BigQuery.
The template doesn't support column type updates or column nullable mode updates.
Follow this guide to deploy pipelines with this template. More specific instructions.
Add IT tests for handling schema update during a spanner cdc to bigquery pipeline running.
Previously a base case test that doesn't contain any schema change was added in #1705 . This pull request adds two more IT tests following above workflow:
I have run all IT tests together in parallel for >15 times locally and no flakiness observed. We can add more test cases later if tests in this pr are running stable.
I also added a 3 minutes dlq retry parameters to all IT tests in case there are any availability/transient errors happening on the spanner side.