Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly escape query used with Redshift UNLOAD #1907

Merged

Conversation

rizzatti
Copy link
Contributor

@rizzatti rizzatti commented Nov 2, 2016

Description

SQL queries used in the context of the UNLOAD command in Redshift need to have any single quotes escaped.

This change fixes a little bug which didn't correctly add the backslashes to the query string.

Have you tested this? If so, how?

While creating some jobs that use RedshiftUnloadTask earlier today, I noticed the issue. This PR fixes it.

@mention-bot
Copy link

@rizzatti, thanks for your PR! By analyzing the history of the files in this pull request, we identified @chenzhan, @ddaniels888 and @steenzout to be potential reviewers.

@Tarrasch
Copy link
Contributor

Tarrasch commented Nov 2, 2016

Can you add a testcase?

@rizzatti
Copy link
Contributor Author

rizzatti commented Nov 2, 2016

@Tarrasch I fixed the current testcase.
It's easy to notice the problem by looking at the test errors from Travis' last run. The expected command:

AssertionError: Expected call: execute("UNLOAD ( 'SELECT 'a' as col_a, current_date as col_b' ) TO 's3://bucket/key' credentials 'aws_access_key_id=AWS_ACCESS_KEY;aws_secret_access_key=AWS_SECRET_KEY' DELIMITER ',' ADDQUOTES GZIP ALLOWOVERWRITE PARALLEL OFF;")

The quoted query 'SELECT 'a' as col_a, current_date as col_b' would be misinterpreted due to the quotes around the 'a' not being properly escaped.

@Tarrasch
Copy link
Contributor

Tarrasch commented Nov 4, 2016

Can you get a redshift person to review this? :)

@rizzatti
Copy link
Contributor Author

rizzatti commented Nov 4, 2016

@Tarrasch, would this help?

Unload broken with luigi 2.3.3

Unload working with PR

@rizzatti
Copy link
Contributor Author

rizzatti commented Nov 4, 2016

These are the contents of example.py in the screenshots above.

# -*- coding: utf-8 -*-

import os
import luigi
import luigi.contrib.redshift


class RedshiftUnloadExample(luigi.contrib.redshift.RedshiftUnloadTask):

    host = os.environ.get('RS_HOST')
    user = os.environ.get('RS_USER')
    password = os.environ.get('RS_PASS')
    database = os.environ.get('RS_DATABASE')
    table = os.environ.get('RS_TABLE')
    aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
    aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
    s3_unload_path = os.environ.get('RS_UNLOAD_PATH')

    # This comes straight from test/contrib/redshift_test.py
    def query(self):
        return "SELECT 'a' as col_a, current_date as col_b"


if __name__ == '__main__':
    luigi.run()

@Tarrasch
Copy link
Contributor

Tarrasch commented Nov 4, 2016

@rizzatti, I (or any other single volunteering maintainer) cannot be expected to understand details of every system luigi interoperates. Please find another reviewer.

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't systematically use the UNLOAD function in my ETL so I haven't tested this myself. But assuming it worked previously and the only case it failed was when ' was used within the unload query, then I don't see anything wrong with this update to escaping '.

@Tarrasch Tarrasch merged commit 49276c9 into spotify:master Nov 7, 2016
@Tarrasch
Copy link
Contributor

Tarrasch commented Nov 7, 2016

Thanks for review! :)

@rizzatti
Copy link
Contributor Author

rizzatti commented Nov 7, 2016

Thanks guys.

@rizzatti rizzatti deleted the fix_redshift_unload_query_escaping branch November 7, 2016 14:40
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants