-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source file: do not read whole file on check and discover #24278
Conversation
/test connector=connectors/source-file
Build PassedTest summary info:
|
# this is to ensure we make all conditions under which the bug is reproduced, i.e. | ||
# - chunk size < file size | ||
# - column type in the last chunk is not `string` | ||
@patch("source_file.client.Client.CSV_CHUNK_SIZE", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped this test because it doesn't represent the expected behavior from now on
with client.reader.open(): | ||
list(client.streams) | ||
return AirbyteConnectionStatus(status=Status.SUCCEEDED) | ||
list(client.streams(empty_schema=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read only file header when running check
to ensure the connection succeeds
reader_options["chunksize"] = self.CSV_CHUNK_SIZE | ||
if skip_data: | ||
reader_options["nrows"] = 0 | ||
reader_options["index_col"] = 0 | ||
yield from reader(fp, **reader_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read only self.CSV_CHUNK_SIZE
bytes of data to generate schema. Otherwise a time out is possible in case a large file is read
/publish connector=connectors/source-file
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-file-secure
if you have connectors that successfully published but failed definition generation, follow step 4 here |
What
https://github.com/airbytehq/oncall/issues/1681
Fix timing out
check
anddiscover
commandsHow
Read only the header of a file on
check
.Read a single chunk of data on
discover
.