Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom alphabet or moltype? #3171

Open
olgabot opened this issue May 21, 2024 · 1 comment
Open

Custom alphabet or moltype? #3171

olgabot opened this issue May 21, 2024 · 1 comment

Comments

@olgabot
Copy link
Collaborator

olgabot commented May 21, 2024

Hello,
Hope you're doing well! I was wondering if it would be possible in the future to support custom moltypes, e.g. if I wanted to do a riff on the Dayhoff alphabet where arginines were a special category because I wanted to look for arginine conservation specifically. Is that something that could be possible?
Thank you so much!
Warmest,
Olga

@ctb
Copy link
Contributor

ctb commented May 23, 2024

Possible, but not proximal? :(

You could split this into two distinct phases -

  1. generating the sketches. This is in some sense easy, since you can easily write your own code to generate hash values and just add them to a MinHash object; @luizirber and I have both done this at various times. The only catch with this is you have to be responsible for making sure you catch incompatible sketches yourself - you wouldn't want to compare OlgaCustom sketches to regular protein sketches.
  2. Adding custom sketch types into sourmash. This is valuable and important but not straightforward at the moment. In brief, the simplest idea would be to add support for different hash function identifier strings into sourmash. Please see the discussion in What about using different hash functions (reversible, rolling, etc.) with sourmash? #1659 and can we just use the hash function to flag incompatible signatures, instead of DNA/protein/etc? #751.

I guess a third would be "implement fast sketching in Rust core", but I would argue with (2) you don't really need to do this - you can write your own plugin/sketching code as in (1) and have it remain outside of core indefinitely.

Related issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants