-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File CDK: Allow skipping unparseable file types #32092
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
.set_expected_records([]) | ||
).build() | ||
|
||
# TODO When working on https://github.com/airbytehq/airbyte/issues/31605, this test should be split into two tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind putting a link to this in here so whoever implements it knows to update this?
@@ -14,3 +16,10 @@ class Config: | |||
"unstructured", | |||
const=True, | |||
) | |||
|
|||
skip_unprocessable_file_types: Optional[bool] = Field( | |||
default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to default to False
to be consistent with existing connections. Otherwise, if a user re-runs a sync they could end up syncing different files than they would have previously, despite not actually having signed up for a config change. I believe that would be considered a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't change existing connections - these don't have the property at all yet, so it will use the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of an existing connection that has a sync error for whatever reason, then the connector is upgraded to this version, then the sync is retried. The files that will be processed are different in the two cases.
I don' think this is a huge deal though so if you feel that it's a better UX to default to True that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in slack. I'm comfortable either way on this. 👍
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_types/unstructured_parser.py
Outdated
Show resolved
Hide resolved
@@ -14,3 +16,10 @@ class Config: | |||
"unstructured", | |||
const=True, | |||
) | |||
|
|||
skip_unprocessable_file_types: Optional[bool] = Field( | |||
default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of an existing connection that has a sync error for whatever reason, then the connector is upgraded to this version, then the sync is retried. The files that will be processed are different in the two cases.
I don' think this is a huge deal though so if you feel that it's a better UX to default to True that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. 🚀
As noted in slack, I really like the UX. Very easy to follow and the toggle is very inviting.
@@ -14,3 +16,10 @@ class Config: | |||
"unstructured", | |||
const=True, | |||
) | |||
|
|||
skip_unprocessable_file_types: Optional[bool] = Field( | |||
default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in slack. I'm comfortable either way on this. 👍
…red-filter-options
Allow skipping file types that can't be parsed:
This will still fail
I didn't add the mime type filter because not all file-cdk-based sources support it so it might lead to unexpected behaviors and I'm not sure how critical this feature is.
Biggest open question is whether this should default to true or false.
Note: Until #31605 the behavior is a bit weird anyway at the moment:
So, until #31605 is done, this flag will only take effect for check and discover (during the connection setup), not during the actual sync.
IMHO this makes it less critical and we can also wait for #31605 to be implemented before introducing it (which is planned for Q4b at the moment)