Skip to content

AWS config issues (python) #89

@TELSER1

Description

@TELSER1

I've been working on getting my aws credentials properly set up to read/write, and encountered some issues:

1.) The given example has aws access being configured by sc.hadoopConfig.set("fs.s3n.awsAccessKeyId", "YOUR_KEY_ID"),etc ; there is no hadoopConfig method for the pyspark context. It seems to be possible to pass a config file to hadoopFile, but this seems specific to whatever file you're pointing it at.

2.) s3n://ACCESSKEY:SECRETKEY@bucket/path/to/temp/dir works absolutely fine when reading from redshift, but throws
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL

when writing the dataframe back to redshift.

Both work fine if I define access/secret key environment variables before launching the shell, although the write throws 'invalid AVRO file found', probably because the column names contain _, as per #84 .

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions