Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Destination s3 / gcs: add option for uncompressed csv and jsonl format #12167

Merged
merged 21 commits into from
Apr 22, 2022

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Apr 20, 2022

What

  • Resolve Add option for S3 and GCS connectors to not compress CSV and JSONL formats #12001.
  • In recent S3 and GCS destination connectors, the serialized buffering strategy was introduced to improve the scalability of these connectors. However, the new versions also introduced breaking changes that automatically compress CSV and JSONL formats.
  • The change does not affect staging destinations, as staging CSV and JSONL files will remain compressed for efficiency.

How

  • A compression field is added to let the users choose whether they want the CSV or JSONL files to be compressed.

Recommended reading order

  1. spec.json
  2. S3CsvFormatConfig.java
  3. CsvSerializedBuffer.java
  4. The rest

🚨 User Impact 🚨

  • S3 and GCS users now can choose to not compress the CSV and JSONL formats.

@github-actions github-actions bot added the area/connectors Connector related issues label Apr 20, 2022
@tuliren
Copy link
Contributor Author

tuliren commented Apr 20, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2194831891
❌ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2194831891
🐛

@tuliren
Copy link
Contributor Author

tuliren commented Apr 20, 2022

/test connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2194832336
✅ connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2194832336
No Python unittests run

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Apr 20, 2022
@tuliren
Copy link
Contributor Author

tuliren commented Apr 20, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2194922786
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2194922786
No Python unittests run

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending two comments

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2205946324
❌ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2205946324
🐛

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2205946622
❌ connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2205946622
🐛

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-bigquery

🕑 connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/2205947555
✅ connectors/destination-bigquery https://github.com/airbytehq/airbyte/actions/runs/2205947555
Python tests coverage:

Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       159     31    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           34      7    79%
normalization/transform_catalog/dbt_macro.py                                                                                       22      7    68%
normalization/transform_catalog/catalog_processor.py                                                                              147     80    46%
normalization/transform_catalog/transform.py                                                                                       61     38    38%
normalization/transform_catalog/stream_processor.py                                                                               534    345    35%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1447    550    62%

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2205948078
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/2205948078
Python tests coverage:

Name                                                                                                                            Stmts   Miss  Cover
---------------------------------------------------------------------------------------------------------------------------------------------------
normalization/transform_config/__init__.py                                                                                          2      0   100%
normalization/transform_catalog/reserved_keywords.py                                                                               13      0   100%
normalization/transform_catalog/__init__.py                                                                                         2      0   100%
normalization/destination_type.py                                                                                                  13      0   100%
normalization/__init__.py                                                                                                           4      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/airbyte_protocol.py     124      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/models/__init__.py               1      0   100%
/actions-runner/_work/airbyte/airbyte/airbyte-integrations/bases/airbyte-protocol/airbyte_protocol/__init__.py                      2      0   100%
normalization/transform_catalog/destination_name_transformer.py                                                                   155      8    95%
normalization/transform_config/transform.py                                                                                       159     31    81%
normalization/transform_catalog/table_name_registry.py                                                                            174     34    80%
normalization/transform_catalog/utils.py                                                                                           34      7    79%
normalization/transform_catalog/dbt_macro.py                                                                                       22      7    68%
normalization/transform_catalog/catalog_processor.py                                                                              147     80    46%
normalization/transform_catalog/transform.py                                                                                       61     38    38%
normalization/transform_catalog/stream_processor.py                                                                               534    345    35%
---------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                                            1447    550    62%

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2206160148
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2206160148
No Python unittests run

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/test connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2206160312
✅ connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2206160312
No Python unittests run

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/publish connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2209883900
❌ Failed to publish connectors/destination-s3
❌ Couldn't auto-bump version for connectors/destination-s3

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/publish connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2210297895
🚀 Successfully published connectors/destination-s3
🚀 Auto-bumped version for connectors/destination-s3
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/2210297895

@tuliren
Copy link
Contributor Author

tuliren commented Apr 22, 2022

/publish connector=connectors/destination-gcs

🕑 connectors/destination-gcs https://github.com/airbytehq/airbyte/actions/runs/2210298193
🚀 Successfully published connectors/destination-gcs
❌ Couldn't auto-bump version for connectors/destination-gcs

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets April 22, 2022 22:33 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets April 22, 2022 22:33 Inactive
@tuliren tuliren merged commit 9a0442c into master Apr 22, 2022
@tuliren tuliren deleted the liren/add-no-compression-option branch April 22, 2022 22:38
@tuliren tuliren temporarily deployed to more-secrets April 22, 2022 22:39 Inactive
@tuliren tuliren temporarily deployed to more-secrets April 22, 2022 22:39 Inactive
suhomud pushed a commit that referenced this pull request May 23, 2022
…mat (#12167)

* Add gzip compression option

* Add file extension method to s3 format config

* Pass gzip compression to serialized buffer

* Add unit test

* Format code

* Update integration test

* Bump version and update doc

* Fix unit test

* Add extra gzip tests for csv and jsonl

* Make compression an oneOf param

* Migrate csv config to new compression spec

* Migrate jsonl config to new compression spec

* Update docs

* Fix unit test

* Fix integration tests

* Format code

* Bump version

* auto-bump connector version

* Bump gcs version in seed

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option for S3 and GCS connectors to not compress CSV and JSONL formats
4 participants