New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Caching GenomicRDD in pyspark #1883

Closed

akmorrow13 opened this issue Jan 23, 2018 · 1 comment

Milestone

Contributor

akmorrow13 commented Jan 23, 2018

Can we cache a GenomicRDD in python?

Contributor Author

akmorrow13 commented Jan 24, 2018

I try

reads = ac.loadAlignments(readsPath)
x = reads.toDF().rdd.cache() 
cachedReads = reads._replaceRdd(x)
cachedReads.toDF().rdd.first()

Which errors as:

ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

akmorrow13 mentioned this issue

[ADAM-1883] Python and R caching #1885

Closed

fnothaft closed this as completed in

heuermh added this to the 0.24.0 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment