-
Notifications
You must be signed in to change notification settings - Fork 347
Add extracopyoptions #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is
|
|
Given that the data that we're loading is being generated by |
|
Errors I have seen myself include text fields being too long, and invalid UTF8 characters in strings, although there are probably more possible. |
|
Ah, gotcha. Hopefully we can address the "text fields too long" issue as part of the fix for #29. I think that the invalid UTF8 character issue may have been fixed by some escaping / quoting fixes in one of my earlier bugfix patches. If you still observe this issue with the current master, though, it would be great if you could file a new issue with an error message so that I can debug. Reading the error message from |
|
Hi @JoshRosen, We're currently using a fork with this patch added at the moment, as we keep finding new reasons for it being useful. Here's a little more info about our use cases for context: The data we're processing comes in from mobile clients, so it contains all kinds of invalid mess. Strings with bad chars, timestamps at +/-MAXINT, etc. All things that slip through the Avro type system easily enough, but don't rightly make sense in the Redshift schema. We add validation to our code to filter it out before attempting to write to Redshift, but new things pop up every so often as new data comes in. We'd like to make sure that when new types of bad data come in, we're aware of it, but it's not a catastrophic failure for the whole ETL jobs. Being AWS-based, the most sensible way for us to do this is for our ETL app to emit CloudWatch metrics, telling us how many rows were added to It'd therefore be pretty useful for us to expose the Redshift |
|
I think something like this could be more generally useful, to allow users to append arbitrary options to the end of the copy command. |
|
I like the idea of adding a What do you think about calling this |
|
One nice additional advantage of |
|
I think that sounds good. It may mean there's less of a maintenance burden to keep up with new parameters potentially being added to Redshift's copy functionality too. Looking at the documentation as it is now, it actually seems like all the things people are likely to want to change work fine at the end of the copy statement. I guess it's possible that some things won't work just added onto the end later, but if we're clear in the documentation that this is what happens, I think it'll be a useful feature. |
|
Sounds good to me. @emlyn, if you go ahead and add the |
|
@JoshRosen I've renamed it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass -> append
|
Yep, looks good to me. I'm going to merge this now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, just realized that this isn't going to be interpolated properly due to a missing s at the start of this string. Will hotfix now. Missed this because the integration tests didn't run for this third-party PR :(
|
@JoshRosen Hey could you provide an example how to use the but it didnt work. Moreover after taking a look to the code and the AWS documentation I don't figure out how it could work ... |
|
Hey @nhuray, Which version of |
|
Hi @JoshRosen I used the 0.5.2. |
Give the user the option to ignore a certain number of bad rows in the import. I've implemented it as a separate option, but it might be preferable to just have something like a freeform
copyoptionsthat gets appended to the RedshiftCOPYquery.