-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redshift dialect with multi-row insert #39
Conversation
use multi row insert
@ahmeroxa Hi, thank you for PR. It looks great. After discussion with my colleges we would kindly ask you to add some changes:
|
@willyborankin Ok great, those suggestions make sense to me. I can work on those changes. Regarding You can find more information here: https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html Perhaps it would be best to make that configurable and provide a sane default (either some arbitrary large number or maybe set the default to What do you think? |
Of course! I mean, when we have the support for multi-row insert in general (in existing dialects), adding Redshift would be a small change. |
This PR/Spike introduces a new Redshift Dialect and support for multi-row inserts.
Ideally we can roll in Redshift specific optimizations into the dialect. In particular the default data mapping of Kafka Records string to Postgres TEXT results in fields being defined as VARCHAR(256) within Redshift. You can read about the Redshift data type mapping here:
https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-storage-and-ranges
For the sake of testing I have arbitrarily mapped string to VARCHAR(5000) but ideally a more deliberate/intelligent mapping would be implemented.
The multi-row insert is aimed at improving performance of a Sink connector writing to Redshift. During testing multi-row inserts are magnitudes faster than the existing single inserts that the insert mode uses. You can read about AWS's recommendation of multi-row insert here:
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-multi-row-inserts.html