Skip to content

Conversation

@JoshRosen
Copy link
Contributor

This patch improves our error reporting for Redshift LOAD errors. When a load error occurs, we will now try to automatically fetch more detailed error information from Redshift's STL_LOAD_ERRORS table.

As an example of the improved error messages:

Old:

java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'error_message_when_string_too_long_3596907251636891354' failed.  Check 'stl_load_errors' system table for details.;

New:

java.sql.SQLException: Error #1204 while loading data into Redshift: "String length exceeds DDL length".
Table name: the_table_name
Column name: a
Column type: varchar(256)
Raw line: [...]
Raw field value: [...]

@JoshRosen JoshRosen added this to the 0.5 milestone Aug 26, 2015
@JoshRosen JoshRosen mentioned this pull request Aug 26, 2015
@codecov-io
Copy link

Current coverage is 87.22%

Merging #53 into master will increase coverage by +0.87% as of 2840223

@@            master     #53   diff @@
======================================
  Files           10      10       
  Stmts          337     368    +31
  Branches        79      87     +8
  Methods          0       0       
======================================
+ Hit            291     321    +30
  Partial          0       0       
- Missed          46      47     +1

Review entire Coverage Diff as of 2840223

Powered by Codecov. Updated on successful CI builds.

@JoshRosen
Copy link
Contributor Author

Hmm, I guess this should also include the destination table name in the error message.

@jaley
Copy link
Contributor

jaley commented Aug 26, 2015

This is awesome, thanks for adding this, much sanity saved here :)

I don't recall exactly what the default permissions situation is for the load errors table. If it's the case that new users need an explicit grant to be run before they can query it, we might want to add a note to the docs that tells users they need to do this to enable better error messages.

It's possible that Redshift actually does something magic where you can only read rows added by loads from your own user account, in which case it should just work I guess.

@emlyn
Copy link
Contributor

emlyn commented Aug 26, 2015

Nice! I believe anyone can read from stl_load_errors and Redshift will only return rows that relate to the current user (at least that what I've seen when querying it).

@marmbrus
Copy link
Contributor

LGTM

@JoshRosen JoshRosen closed this in 9f19e1c Aug 26, 2015
@JoshRosen JoshRosen deleted the load-error-reporting branch August 26, 2015 19:29
JoshRosen added a commit that referenced this pull request Aug 27, 2015
This patch allows users to specify a `maxlength` column metadata entry for string columns in order to control the width of `VARCHAR` columns in generated Redshift table schemas. This is necessary in order to support string columns that are wider than 256 characters. In addition, this configuration can be used as an optimization to achieve space-savings in Redshift. For more background on the motivation of this feature, see #29.

See also: #53 to improve error reporting when LOAD fails.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #54 from JoshRosen/max-length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants