AWS config issues (python)

I've been working on getting my aws credentials properly set up to read/write, and encountered some issues:

1.) The given example has aws access being configured by sc.hadoopConfig.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID"),etc ; there is no hadoopConfig method for the pyspark context.  It seems to be possible to pass a config file to hadoopFile, but this seems specific to whatever file you're pointing it at.

2.)  s3n://ACCESSKEY:SECRETKEY@bucket/path/to/temp/dir works absolutely fine when reading from redshift, but throws
<code>java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL</code>

when writing the dataframe back to redshift.

Both work fine if I define access/secret key environment variables before launching the shell, although the write throws 'invalid AVRO file found', probably because the column names contain _, as per https://github.com/databricks/spark-redshift/issues/84 .


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AWS config issues (python) #89

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AWS config issues (python) #89

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions