-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JsonSchema anyOf
when writing Parquet/Avro in S3 destination
#4294
Comments
@olivermeyer could you download the full log from the UI and share it here? |
Here is the schema of the field causing the problem: "SystemModstamp": {
"anyOf": [
{
"type": "string",
"format": "date-time"
},
{
"type": ["string", "null"]
}
]
} The root cause of this failure is that when writing to Parquet, we perform a Json schema to Parquet schema conversion, and currently we don’t support the But on a second thought, it is probably not difficult to support those keywords. Although there is no direct equivalent keywords in Parquet schema, we can just have a less stringent type union as a workaround. |
Thanks for looking into this @tuliren. That makes sense. I'll follow this issue for a fix :-) |
anyOf
when writing Parquet/Avro in S3 destination
This is a workaround I applied to file
and seems to work.
|
@MaxwellJK, thanks for the workaround. It only works for the first element within the |
yeah I know, I just quickly developed it to fix the problem I was having with Salesforce. |
Thanks @tuliren! |
Expected Behavior
I have a connection between Salesforce and S3 (Parquet). The expected behaviour is that the sync should work, and data should be written to S3.
Current Behavior
The sync starts but quickly hangs with no further messages in the logs.
Logs
Since the Salesforce connector exposes credentials in plain text in the logs, I cannot post them in full. However, I found the following which seems relevant:
Steps to Reproduce
Account
stream (might affect other streams as well); trigger the sync and waitSeverity of the bug for you
Critical - CSV is not acceptable as a file format for us, and not having this connection is an immediate showstopper.
Airbyte Version
0.26.2-alpha
Connector Version (if applicable)
Salesforce: 0.2.1
S3: 0.1.6
Additional context
I tried syncing another stream (
UserPreference
) and ran into the same issue. The logs were similar too:There definitely seems to be a pattern, but I'm not familiar enough with Airbyte's internals to understand it.
I can also confirm that the following works:
The text was updated successfully, but these errors were encountered: