-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add opt-in rich fasta headers with metadata #3448
base: main
Are you sure you want to change the base?
Conversation
Thanks for looking into this. I just tried it in the preview at https://fasta-header.loculus.org/ebola-sudan/search? , and it didn't seem to yield a difference? I don't know if you saw, but I hacked something similar together at https://github.com/theosanderson/metadata-sequence-combiner to allow #3439 . I'm a bit uncertain about whether we should add this to Loculus when it only support <5000 sequences. I do think it would be possible to stream from LAPIS from metadata and sequences and combine them on the fly in a stream, scaling to an unlimited number of sequences. I think my vote might be to keep this out of Loculus until we are able to do that, and rely on hacks on Pathoplexus to allow Nextclade and other integrations - which I think are the main use case atm? |
ca83099
to
ceb1a8b
Compare
ceb1a8b
to
5277fef
Compare
I hadn't drilled through the new value into the website config (locally I'd just edited the config json directly), should be fixed now.
I think I disagree on the first part (not doing it unless streaming) because imperfect but working is better than not having it all and if there's enough demand one can add streaming later. I see it like that: Right now one can download only 0 sequences with rich header, now one can do 5k, clear improvement, 90+% of total benefit achieved. I do really like the tools UI/UX you've come up with, very smooth. I'm just a bit worried about deviating from our monorepo pattern for something that can be considered a core feature. But nothing stops us from going down the tools route and then insourcing it later, so happy either wray. |
resolves #2284
preview URL: https://fasta-header.loculus.org
Summary
Allow unaligned sequences to be downloaded with a rich FASTA header (configured at organism level for simplicity), e.g.
displayName
instead ofaccessionVersion
.This requires creating a new astro API endpoint, for now
/[organism]/api/sequences
because LAPIS does not want to implement a dynamic FASTA header feature. This endpoint fetches metadata and sequences separately and joins them before responding with the result.To prevent excessive memory consumption, there's a hardcoded limit of 5k sequences that can be accessed via this endpoint (enforced via an initial request to
/aggregated
).Compression is not supported for this endpoint but this is currently not worth the effort (rich header is for ad hoc analysis/viewing not for download of full data).
Screenshot
PR Checklist