Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source S3: fix discovery issues #24157

Merged
merged 5 commits into from
Mar 16, 2023

Conversation

davydov-d
Copy link
Collaborator

What

https://github.com/airbytehq/oncall/issues/1664
https://github.com/airbytehq/oncall/issues/1652

How

This PR actually fixes two different problems:

  • Return empty schema if no files found during discovery
  • Do not infer data types when user schema is applied (see more in code comment)

🚨 User Impact 🚨

The second fix will only impact connections with:

  • File type = CSV and
  • User enforced schema and
  • Large files and small block size

Some data may change it's type but still it shouldn't break syncs (see more in code comment)

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 16, 2023
@davydov-d
Copy link
Collaborator Author

davydov-d commented Mar 16, 2023

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/4440330375
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/4440330375
Python tests coverage:

Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/storagefile.py                       23      0   100%
source_s3/source_files_abstract/spec.py                              55      0   100%
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/source.py                                                  27      0   100%
source_s3/exceptions.py                                              10      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source_files_abstract/formats/parquet_parser.py            64      2    97%
source_s3/source_files_abstract/formats/abstract_file_parser.py      41      2    95%
source_s3/stream.py                                                  43      3    93%
source_s3/s3file.py                                                  41      3    93%
source_s3/source_files_abstract/formats/avro_parser.py               39      3    92%
source_s3/source_files_abstract/formats/jsonl_parser.py              53      5    91%
source_s3/source_files_abstract/file_info.py                         26      3    88%
source_s3/source_files_abstract/formats/csv_parser.py               130     16    88%
source_s3/source_files_abstract/source.py                            41      7    83%
source_s3/source_files_abstract/stream.py                           198     35    82%
source_s3/s3_utils.py                                                20      4    80%
source_s3/utils.py                                                   31     10    68%
-------------------------------------------------------------------------------------
TOTAL                                                               887     93    90%
Name                                                              Stmts   Miss  Cover
-------------------------------------------------------------------------------------
source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
source_s3/source_files_abstract/formats/jsonl_spec.py                13      0   100%
source_s3/source_files_abstract/formats/csv_spec.py                  16      0   100%
source_s3/source_files_abstract/formats/avro_spec.py                  5      0   100%
source_s3/s3_utils.py                                                20      0   100%
source_s3/__init__.py                                                 2      0   100%
source_s3/source_files_abstract/storagefile.py                       23      1    96%
source_s3/stream.py                                                  43      3    93%
source_s3/source_files_abstract/stream.py                           198     14    93%
source_s3/s3file.py                                                  41      3    93%
source_s3/source_files_abstract/formats/abstract_file_parser.py      41      4    90%
source_s3/source.py                                                  27      4    85%
source_s3/source_files_abstract/formats/csv_parser.py               130     40    69%
source_s3/source_files_abstract/file_info.py                         26      8    69%
source_s3/utils.py                                                   31     10    68%
source_s3/exceptions.py                                              10      4    60%
source_s3/source_files_abstract/source.py                            41     18    56%
source_s3/source_files_abstract/spec.py                              55     31    44%
source_s3/source_files_abstract/formats/jsonl_parser.py              53     32    40%
source_s3/source_files_abstract/formats/avro_parser.py               39     25    36%
source_s3/source_files_abstract/formats/parquet_parser.py            64     44    31%
-------------------------------------------------------------------------------------
TOTAL                                                               887    241    73%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [5] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:509: The previous and actual discovered catalogs are identical.
================== 95 passed, 6 skipped in 270.05s (0:04:30) ===================

@davydov-d davydov-d requested review from bnchrch and lazebnyi March 16, 2023 18:17
Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit but otherwise LGTM.

Friendly reminder to Run CAT before publishing.

@davydov-d
Copy link
Collaborator Author

davydov-d commented Mar 16, 2023

/publish connector=connectors/source-s3

🕑 Publishing the following connectors:
connectors/source-s3
https://github.com/airbytehq/airbyte/actions/runs/4441038122


Connector Did it publish? Were definitions generated?
connectors/source-s3

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@davydov-d davydov-d merged commit db45f05 into master Mar 16, 2023
@davydov-d davydov-d deleted the ddavydov/#1652-1664-source-s3-discovery-fixes branch March 16, 2023 20:39
adriennevermorel pushed a commit to adriennevermorel/airbyte that referenced this pull request Mar 17, 2023
* airbytehq#1652 airbytehq#1664 Source S3: fix discovery issues

* airbytehq#1652 airbytehq#1664 source s3: upd changelog

* airbytehq#1652 airbytehq#1664 source s3: review comments

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* #1652 #1664 Source S3: fix discovery issues

* #1652 #1664 source s3: upd changelog

* #1652 #1664 source s3: review comments

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
erohmensing pushed a commit that referenced this pull request Mar 22, 2023
* #1652 #1664 Source S3: fix discovery issues

* #1652 #1664 source s3: upd changelog

* #1652 #1664 source s3: review comments

* auto-bump connector version

---------

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/s3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants