Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maven dependency for gcloud NIO #346

Closed
wants to merge 3 commits into from

Conversation

Georgehe4
Copy link
Contributor

@Georgehe4 Georgehe4 commented Dec 22, 2017

screen shot 2017-12-22 at 8 19 03 pm

Resolve google file system loading issues in mango-python.

Changes for mango-cli will be completed in a different PR.

@Georgehe4 Georgehe4 changed the title Add maven ependency for gs NIO Add maven dependency for gcloud NIO Dec 22, 2017
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/536/
Test PASSed.

@Georgehe4
Copy link
Contributor Author

Georgehe4 commented Dec 22, 2017

Adding support for only mango-python for now -- changes to mango-cli will need a bit more work.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/537/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/mango-prb/538/
Test PASSed.


PY4J_ZIP="$(ls -1 "${SPARK_HOME}/python/lib" | grep py4j)"
export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/${PY4J_ZIP}:${PYTHONPATH}

${PYSPARK} \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.bdgenomics.mango.serialization.MangoKryoRegistrator \
--jars ${MANGO_CLI_JAR} \
--jars ${MANGO_CLI_JAR},${GOOGLE_CLOUD_NIO_JAR} \
Copy link
Contributor

@akmorrow13 akmorrow13 Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add this as a spark argument instead of modifying the file? I dont think this belongs here because users aren't always running on google

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add it as a spark arg (./bin/mango-notebook --jars ./bin/mango-notebook --jars ${GOOGLENIOJAR} --
This means you don't have to modify the actual file. This flag can be called in https://github.com/bigdatagenomics/mango/pull/340/files.
You can access this through docker by mounting the jar.

Also, this will only work for notebook, not browser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, you should just be able to do /bin/mango-notebook --jars https://oss.sonatype.org/content/repositories/releases/com/google/cloud/google-cloud-nio/0.22.0-alpha/google-cloud-nio-0.22.0-alpha-shaded.jar --

It acquires a bit of latency on the first call because you are downloading it upon the first count/collect. You can do it either way though.

Copy link
Contributor Author

@Georgehe4 Georgehe4 Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fs doesn't seem to recognize the URL passed in the --jars command.

SPARK_ARGS='--master yarn --jars https://oss.sonatype.org/content/repositories/releases/com/google/cloud/google-cloud-nio/0.22.0-alpha/google-cloud-nio-0.22.0-alpha-shaded.jar'
...

Exception in thread "main" java.io.IOException: No FileSystem for scheme: https

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok then just wget it and add it as --jars

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was the workaround used in the other branch (uploaded to gfs to reference).

@akmorrow13
Copy link
Contributor

I don't think we should add a jar to the repo. There can be a separate script that adds --packages with the jar if the user needs it

@Georgehe4
Copy link
Contributor Author

Adding a separate --packages would be ideal except that the NIO file readers rely on Guava 20.0+.

hadoop 2.6+ breaks with guava 17.0+ (https://github.com/bigdatagenomics/adam/blob/49cbdb72e7b4090d4b2840dc72d06b9c68618a3a/pom.xml#L600)

So using the shaded jar is the only way to prevent naming conflicts.
I don't believe we can shade using the --packages flag which is why I've added the jar in this repo.

@akmorrow13
Copy link
Contributor

akmorrow13 commented Dec 30, 2017

where is the jar downloaded from? Can you post the location?

@Georgehe4
Copy link
Contributor Author

Georgehe4 commented Dec 30, 2017

Copy link
Contributor

@akmorrow13 akmorrow13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to call a wget on the jar in google-install script in PR https://github.com/bigdatagenomics/mango/pull/340/files instead of putting it in the repo.

@Georgehe4
Copy link
Contributor Author

Closing - moving scripts and discussion over to #340

@Georgehe4 Georgehe4 closed this Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants