-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAM/BED to parquet #2376
Comments
Command line $ adam-submit transformAlignments sample.bam sample.alignments.adam
$ adam-submit transformFeatures annotation.bed annotation.features.adam Scala import org.bdgenomics.adam.ds.ADAMContext._
val alignments = sc.loadAlignments("sample.bam")
alignments.saveAsParquet("sample.alignments.adam")
val features = sc.loadFeatures("annotation.bed")
features.saveAsParquet("annotation.features.adam") Python from bdgenomics.adam.adamContext import ADAMContext
ac = ADAMContext(sc)
alignments = ac.loadAlignments("sample.bam")
alignments.saveAsParquet("sample.alignments.adam")
features = ac.loadFeatures("annotation.bed")
features.saveAsParquet("annotation.features.adam") Hope this helps! |
Thank you very much for such a quick answer. Bit of a follow up: |
Yes, I've never had any issues with Parquet in Apache Arrow. There was a mis-specification between the JVM Parquet and the C++ Parquet with regards to LZ4 compression at some point, I don't know if that is still a problem. Other compression algorithms should be fine. I did have some issues with incomplete support for Parquet via DuckDB, details here As of that effort, DuckDB did not support Parquet enums or nested schema, both features that we use in bdg-formats/ADAM. |
Hello, I can confirm that so far I have no issues reading parquet files created by ADAM using python polars. As for the .bed to adam/parquet, I noticed that the 6 column bed got transformed into 26 column parquet with obviously empty columns for values not in the input. Not a problem, just a note that the parquets created from BED files contain such extra slots. Well, this should let me start experimenting with ADAM after getting back from vacations. Many thanks for your help Darek Kedra |
We use rather rich schema for all the various genomic data types, defined in Avro at The |
Hello,
Would it be possible to provide a minimal example be it in Scala/python/CLI, how to convert say BAM to an ADAMs parquet? Same with a canonical 6 columns BED.
DK
The text was updated successfully, but these errors were encountered: