-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 access documentation #1643
Comments
For the record, I don't need to provide any extra jars when using Elastic Map Reduce (EMR) version emr-5.6.0 on AWS, so Spark 2.1.1 on Hadoop 2.7.3 YARN with Ganglia 3.7.2 and Zeppelin 0.7.1, then build ADAM from source with Hadoop 2.7.3 dependency version, and then
|
+1 to documenting, but also +1 to @heuermh's point. I believe most distros (eg EMR, Databricks, CDH) build some AWS library in. I know this is true for EMR&DB, and I'm pretty sure of the same for CDH. |
Getting Spark to connect to S3 can require a bit of trial and error - it would be good if we had the process documented.
This recipe works for me at the moment on my local machine:
I downloaded jars:
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk/1.7.4
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.1
I start adam-shell like:
reading from s3a appears to work
Attempts that didn't work for me:
todo:
it would be good for the dependencies required for S3 access to be in the POM, perhaps activated by a profile.
test reading BAM/VCF from S3
The text was updated successfully, but these errors were encountered: