-
Notifications
You must be signed in to change notification settings - Fork 347
Description
I've been working on getting my aws credentials properly set up to read/write, and encountered some issues:
1.) The given example has aws access being configured by sc.hadoopConfig.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID"),etc ; there is no hadoopConfig method for the pyspark context. It seems to be possible to pass a config file to hadoopFile, but this seems specific to whatever file you're pointing it at.
2.) s3n://ACCESSKEY:SECRETKEY@bucket/path/to/temp/dir works absolutely fine when reading from redshift, but throws
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL
when writing the dataframe back to redshift.
Both work fine if I define access/secret key environment variables before launching the shell, although the write throws 'invalid AVRO file found', probably because the column names contain _, as per #84 .