Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Move greyhound-core into sourmash #1238

Merged
merged 17 commits into from
Feb 14, 2022
Merged

[MRG] Move greyhound-core into sourmash #1238

merged 17 commits into from
Feb 14, 2022

Conversation

luizirber
Copy link
Member

@luizirber luizirber commented Nov 17, 2020

This PR moves the greyhound-core (RevIndex and gather) into sourmash. It doesn't bring the CLI, web server or browser frontend.
(The CLI should probably be exposed here at some point, the web server and frontend should go into wort).

I created a new feature on the Rust side, "experimental". The idea is to allow experimentation without making stability guarantees, including passing all checks required for merging (like wasm support). #1221 is another example of an "experimental" feature.
In order to avoid piling up experimental features, I also propose a requirement that sourmash-Python CAN'T use the "experimental" feature. This keeps us honest, and force stabilization in the Rust side =]
This is sort of equivalent to the nightly features in the Rust compiler.

Lots left to do, but a small list:

  • properly use the parallel feature to expose rayon
  • fix wasm compilation

@codecov
Copy link

codecov bot commented Nov 17, 2020

Codecov Report

Merging #1238 (f016123) into latest (f9a2e96) will increase coverage by 6.35%.
The diff coverage is 65.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1238      +/-   ##
==========================================
+ Coverage   83.58%   89.94%   +6.35%     
==========================================
  Files         113       88      -25     
  Lines       12185     8592    -3593     
  Branches     1684     1705      +21     
==========================================
- Hits        10185     7728    -2457     
+ Misses       1743      600    -1143     
- Partials      257      264       +7     
Flag Coverage Δ
python 89.94% <65.17%> (-0.32%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/utils.py 78.94% <0.00%> (ø)
src/sourmash/index/revindex.py 63.46% <63.46%> (ø)
src/sourmash/index/__init__.py 95.96% <100.00%> (ø)
src/core/src/encodings.rs
src/core/src/ffi/mod.rs
src/core/src/ffi/utils.rs
src/core/src/index/linear.rs
src/core/src/index/mod.rs
src/core/src/index/sbt/mhbt.rs
src/core/src/index/sbt/mod.rs
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9a2e96...f016123. Read the comment docs.

@ctb
Copy link
Contributor

ctb commented Nov 17, 2020

In order to avoid piling up experimental features, I also propose a requirement that sourmash-Python CAN'T use the "experimental" feature. This keeps us honest, and force stabilization in the Rust side =]

Hmm. 👎

I instead propose that we add an "experimental" subcommand in sourmash CLI that is explicitly not stable.

Alternatively, as long as it's exposed by the Rust API to Python, I can use it in scripts. So we could keep it away from the CLI, at the cost of not being able to evolve a better CLI as we play with use cases.

Motivation: I do most of my experimentation in Python, as do others, and I think sourmash has benefited substantially from this kind of experimentation. So I want to keep this available!

@luizirber luizirber force-pushed the greyhound branch 2 times, most recently from ab74189 to b37b51f Compare November 17, 2020 17:49
@luizirber
Copy link
Member Author

In order to avoid piling up experimental features, I also propose a requirement that sourmash-Python CAN'T use the "experimental" feature. This keeps us honest, and force stabilization in the Rust side =]

Hmm. -1

I instead propose that we add an "experimental" subcommand in sourmash CLI that is explicitly not stable.

Alternatively, as long as it's exposed by the Rust API to Python, I can use it in scripts. So we could keep it away from the CLI, at the cost of not being able to evolve a better CLI as we play with use cases.

Motivation: I do most of my experimentation in Python, as do others, and I think sourmash has benefited substantially from this kind of experimentation. So I want to keep this available!

Fair point. Note that we can request "experimental" features from the Rust side on any sourmash PRs just fine (add "--features experimental" to the cargo invocation in setup.py), what I'm blocking is merging PRs that use the experimental feature without also "stabilizing" it on the Rust side.

@ctb
Copy link
Contributor

ctb commented Nov 17, 2020

is there value in combining this with #1131 at all? (I don't really know, am wondering.)

@luizirber
Copy link
Member Author

is there value in combining this with #1131 at all? (I don't really know, am wondering.)

Probably... I see the LcaDB as a superset of RevIndex, and could probably extend the latter with the taxinfo bits from the former, but this becomes a larger PR than this one.

@luizirber
Copy link
Member Author

as I suspected, shoehorning greyhound into how sourmash does gather is more
complicated than it looks.

First hurdle is that I did the greyhound demos using on-disk signatures (and paths),
but sourmash indices also work with in-memory signatures
(create an empty index, add in-memory sigs).
So, I've been modifying greyhound to support both (not hard, but does change
some of the internal logic).

Next, the gather issue discussed in #1263
I was thinking about making another method in Index for now, counter_gather,
for the greyhound behavior (returning a Counter for the matches in the index),
with a blanket implementation behaving like a LinearIndex (which... I think it
would also work by default for the SBT?), and figuring out later how to
integrate back into the regular Index.gather.
(side benefit: testing how well counter-gather works on SBTs, which I think
might not be terrible on runtime and actually stellar on memory consumption?)

@ctb
Copy link
Contributor

ctb commented Feb 11, 2022

Is fine by me!

add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
@luizirber luizirber changed the title [WIP] move greyhound-core into sourmash [MRG] Move greyhound-core into sourmash Feb 14, 2022
@luizirber luizirber merged commit 13bf0d5 into latest Feb 14, 2022
@luizirber luizirber deleted the greyhound branch February 14, 2022 05:59
@mr-eyes
Copy link
Member

mr-eyes commented Feb 15, 2022

🎉 🎉 🎉 🎉 🎉 🎉

@ctb
Copy link
Contributor

ctb commented Feb 17, 2022

After a frustrating 15 minutes of debugging, I feel like the reorganization of index.py into index/ should have received a mention in the PR description :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants