Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential reading for FASTA. #16

Merged
merged 1 commit into from
Jul 29, 2016
Merged

Conversation

alumi
Copy link
Member

@alumi alumi commented Jul 27, 2016

This PR adds a sequential reading function for FASTA like following.
It supports raw FASTA (.fa) and compressed FASTA (.fa.gz, .fa.bz2, etc.).

(cljam.fasta/sequential-read "path/to/fasta") => [{:name "chr1", :sequence "ATGC..."}, ...]

The function gives slightly better performance than cljam.fasta/read-sequence when reading entire file.

Run#1 Run#2 Run#3
Sequential 25330 msecs 25258 msecs 25114 msecs
Random 40162 msecs 40287 msecs 40201 msecs

cljam.fasta/read takes more than 3 hours for the same FASTA file.

Currently, this is a blocking function running on a single thread.
It may be worth implementing async version using cljam.fasta.reader/sequential-read*.

@totakke totakke merged commit 9b54ec4 into master Jul 29, 2016
@totakke totakke deleted the feature/fasta-sequential-read branch July 29, 2016 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants