-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add maven dependency for gcloud NIO #346
Conversation
Test PASSed. |
Adding support for only mango-python for now -- changes to mango-cli will need a bit more work. |
Test PASSed. |
Test PASSed. |
|
||
PY4J_ZIP="$(ls -1 "${SPARK_HOME}/python/lib" | grep py4j)" | ||
export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/${PY4J_ZIP}:${PYTHONPATH} | ||
|
||
${PYSPARK} \ | ||
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ | ||
--conf spark.kryo.registrator=org.bdgenomics.mango.serialization.MangoKryoRegistrator \ | ||
--jars ${MANGO_CLI_JAR} \ | ||
--jars ${MANGO_CLI_JAR},${GOOGLE_CLOUD_NIO_JAR} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just add this as a spark argument instead of modifying the file? I dont think this belongs here because users aren't always running on google
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add it as a spark arg (./bin/mango-notebook --jars ./bin/mango-notebook --jars ${GOOGLENIOJAR} --
This means you don't have to modify the actual file. This flag can be called in https://github.com/bigdatagenomics/mango/pull/340/files.
You can access this through docker by mounting the jar.
Also, this will only work for notebook, not browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, you should just be able to do /bin/mango-notebook --jars https://oss.sonatype.org/content/repositories/releases/com/google/cloud/google-cloud-nio/0.22.0-alpha/google-cloud-nio-0.22.0-alpha-shaded.jar --
It acquires a bit of latency on the first call because you are downloading it upon the first count/collect. You can do it either way though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fs doesn't seem to recognize the URL passed in the --jars command.
SPARK_ARGS='--master yarn --jars https://oss.sonatype.org/content/repositories/releases/com/google/cloud/google-cloud-nio/0.22.0-alpha/google-cloud-nio-0.22.0-alpha-shaded.jar'
...
Exception in thread "main" java.io.IOException: No FileSystem for scheme: https
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then just wget it and add it as --jars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this was the workaround used in the other branch (uploaded to gfs to reference).
I don't think we should add a jar to the repo. There can be a separate script that adds --packages with the jar if the user needs it |
Adding a separate --packages would be ideal except that the NIO file readers rely on Guava 20.0+. hadoop 2.6+ breaks with guava 17.0+ (https://github.com/bigdatagenomics/adam/blob/49cbdb72e7b4090d4b2840dc72d06b9c68618a3a/pom.xml#L600) So using the shaded jar is the only way to prevent naming conflicts. |
where is the jar downloaded from? Can you post the location? |
I should mention that we can add this as a maven-shaded dependency as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to call a wget on the jar in google-install script in PR https://github.com/bigdatagenomics/mango/pull/340/files instead of putting it in the repo.
Closing - moving scripts and discussion over to #340 |
Resolve google file system loading issues in mango-python.
Changes for mango-cli will be completed in a different PR.