-
Notifications
You must be signed in to change notification settings - Fork 334
Support catalog backed by s3 in run_spark_sql.sh #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
regtests/run_spark_sql.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear what "uses local filesystem" means in this context, we should probably describe what this means in the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more context
regtests/run_spark_sql.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably do a real usage string rather than this,
usage: ./run_spark_sql.sh [s3-path aws-role]
s3-path: Location on S3 to .....
aws-role: arn?role? used for authenticating catalog?client?both?
When path and role are absent, local filesystem is used for ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more context to explain it is a role used by the catalog
regtests/run_spark_sql.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment here seems to suggest you can set these as ENV variables but I believe the above two lines would overwrite anything stored in the ENV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, remove the environment to avoid confusion
4621135 to
5e0bad3
Compare
|
Thanks @RussellSpitzer for the review. Resolved the comments. Please take another look. |
regtests/run_spark_sql.sh
Outdated
| # ----------------------------------------------------------------------------- | ||
| # | ||
| # Usage: | ||
| # Without arguments: Runs against a catalog backed by the local filesystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should be a more traditional usage string
./run_spark_sql.sh [S3-location AWS-IAM-role]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
regtests/run_spark_sql.sh
Outdated
| # You must run 'use polaris;' as your first query in the spark-sql shell. | ||
| # Arguments: | ||
| # [S3 location] - The S3 path to use as the default base location for the catalog. | ||
| # [AWS IAM role] - The AWS IAM role to assume when the catalog accessing the S3 location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS IAM role for the catalog to assume when accessing the S3 Location
regtests/run_spark_sql.sh
Outdated
| AWS_ROLE_ARN=$2 | ||
| # Check if AWS variables are set | ||
| if [ -z "${AWS_BASE_LOCATION}" ] || [ -z "${AWS_ROLE_ARN}" ]; then | ||
| echo "AWS_BASE_LOCATION or/and AWS_ROLE_ARN not set. Please set them to create a catalog backed by S3." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a bit confusing because these things are not "set" they are passed through. So I think it would better if we just check the number of args passed to the command. Then you can just say expected 2 or 0 args and got 1 or 3 or whatever.
if [ $# -eq 2 ]; then
echo "run_spark_sql.sh only accepts 0 or 2 arguments"
echo $USAGE
exit 1
fiFeel free to ignore copying the USAGE through if you like but that would be another nice feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the new commit
|
Thanks @RussellSpitzer for the review. Resolved all comments. Ready for another look. |
|
Thanks @flyrain ! Merged |
Description
This is to support catalog backed by s3 in run_spark_sql.sh
Type of change
How Has This Been Tested?
Tested locally with s3 and local file system. Both passed.
Checklist: