Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform VCF to adam file not found exception. #2076

Closed
SatyaGsk opened this issue Oct 30, 2018 · 3 comments
Closed

Transform VCF to adam file not found exception. #2076

SatyaGsk opened this issue Oct 30, 2018 · 3 comments
Milestone

Comments

@SatyaGsk
Copy link

My environment:
Hadoop 2.6.0-cdh5.14.4/Spark 2.2.0/Scala 2.11.8

I was trying to convert VCF to Adam format. using adam-submit. It is not accepting the input either in HDFS or local. Tried to pass input file with and without hdfs qualifier in URL. Also tried using namenode port, in all cases it is not finding the input file.
CLI and Error message below - Also HDFS file location:

./adam-submit --packages org.apache.parquet:parquet-hadoop:1.8.2 --deploy-mode cluster --driver-memory 10g --executor-memory 10g --conf spark.driver.cores=12 -- transformAlignments hdfs://bluedata748.corpnet3.com/user/sm/chr17.vcf /user/sm/chr17.adam

client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: 10.138.44.4
ApplicationMaster RPC port: 0
queue: root.dev_rip_adm_grp
start time: 1540838267324
final status: UNDEFINED
tracking URL: https://bluedata747.corpnet3.com:8090/proxy/application_1540223364056_0014/
user: sm
18/10/29 14:38:06 INFO yarn.Client: Application report for application_1540223364056_0014 (state: FINISHED)
18/10/29 14:38:06 INFO yarn.Client:
client token: N/A
diagnostics: User class threw exception: java.io.FileNotFoundException: Couldn't find any files matching hdfs://bluedata748.corpnet3.com/user/sm/chr17.vcf for the requested PathFilter
ApplicationMaster host: 10.138.44.4
ApplicationMaster RPC port: 0
queue: root.dev_rip_adm_grp
start time: 1540838267324
final status: FAILED
tracking URL: https://bluedata747.corpnet3.com:8090/proxy/application_1540223364056_0014/
user: sm
Exception in thread "main" org.apache.spark.SparkException: Application application_1540223364056_0014 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1192)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/10/29 14:38:06 INFO util.ShutdownHookManager: Shutdown hook called
18/10/29 14:38:06 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-32b7063c-25af-477d-a4b8-cc5fbe4dc596

[sm@bluedata750 bin]$ hadoop fs -ls hdfs://bluedata748.corpnet3.com/user/sm/chr17.vcf
-rw-rw----+ 3 sm supergroup 91866 2018-10-17 10:28 hdfs://bluedata748.corpnet3.com/user/sm/chr17.vcf

@akmorrow13
Copy link
Contributor

transformAlignments requires an alignment file (ie a file such as bam or sam). This will not run because vcf files do not contain alignments. The error

diagnostics: User class threw exception: java.io.FileNotFoundException: Couldn't find any files matching hdfs://bluedata748.corpnet3.com/user/sm/chr17.vcf for the requested PathFilter

means you are trying to convert an alignment file that is not actually an alignment file.

You wan to use transformVariants():

transformVariants : Convert a file with variants into corresponding ADAM format and vice versa

@SatyaGsk
Copy link
Author

Thanks that helps.
While I was trying to use adam format variant file in mango, I am getting parquet class not found error. I tried to pass parquet jar just like adam-submit.

./mango-submit --packages org.apache.parquet:parquet-hadoop:1.8.2 /home/sm/mango-master/example-files/hg19.17.2bit -genes http://www.biodalliance.org/datasets/ensGene.bb -reads /user/sm/chr17.7500000-7515000.sam.adam -variants /user/sm/chr17.adam -show_genotypes -discover
Using spark-submit=/usr/bin/spark2-submit
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName
at org.bdgenomics.utils.cli.ParquetArgs$class.$init$(ParquetArgs.scala:40)
at org.bdgenomics.mango.cli.VizReadsArgs.(VizReads.scala:252)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.bdgenomics.utils.cli.Args4j$.apply(Args4j.scala:34)
at org.bdgenomics.mango.cli.VizReads$.apply(VizReads.scala:196)
at org.bdgenomics.utils.cli.BDGCommandCompanion$class.main(BDGCommand.scala:33)
at org.bdgenomics.mango.cli.VizReads$.main(VizReads.scala:125)
at org.bdgenomics.mango.cli.VizReads.main(VizReads.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more

@heuermh
Copy link
Member

heuermh commented May 23, 2019

@SatyaGsk I created a new issue on Mango (bigdatagenomics/mango#499) related to your last comment

@heuermh heuermh closed this as completed May 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants