Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Oxidized LinearIndex #1526

Closed
wants to merge 8 commits into from
Closed

[WIP] Oxidized LinearIndex #1526

wants to merge 8 commits into from

Conversation

luizirber
Copy link
Member

@luizirber luizirber commented May 15, 2021

The main benefit of oxidizing LinearIndex is the ability to run in parallel (using rayon).
While parallelization is doable in Python, it involves a lot of pickling if using multiprocessing-like libraries.

Pulling bits from #1238, especially the FFI and delayed initialization ideas.

15 3 2 (before find implementation)
17 7 failing tests during the find implementation, but... it is starting to work?

TODO

  • implement find in Rust
  • Implement search/gather?
  • performance measures (nagging feeling it might be slower because it pulls .signatures() from the Rust layer)
  • Implement a new_with_paths constructor that defers loading to Rust, especially for all the --filelist cases (places in the CLI that convert a list of files into a LinearIndex)

@luizirber luizirber force-pushed the rust_linear_index branch from 0c3176a to f50a8d4 Compare May 15, 2021 21:21
@codecov
Copy link

codecov bot commented May 15, 2021

Codecov Report

Merging #1526 (4e71e88) into latest (516dc53) will decrease coverage by 0.63%.
The diff coverage is 38.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1526      +/-   ##
==========================================
- Coverage   90.26%   89.63%   -0.64%     
==========================================
  Files         126      129       +3     
  Lines       21271    21458     +187     
  Branches     1595     1605      +10     
==========================================
+ Hits        19201    19233      +32     
- Misses       1843     1994     +151     
- Partials      227      231       +4     
Flag Coverage Δ
python 95.06% <93.67%> (-0.20%) ⬇️
rust 64.52% <1.66%> (-2.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/ffi/index/linear.rs 0.00% <0.00%> (ø)
src/core/src/ffi/index/mod.rs 0.00% <0.00%> (ø)
src/core/src/ffi/mod.rs 0.00% <ø> (ø)
src/core/src/ffi/search.rs 0.00% <0.00%> (ø)
src/core/src/ffi/utils.rs 0.00% <ø> (ø)
src/core/src/index/mod.rs 44.44% <0.00%> (-17.10%) ⬇️
src/core/src/index/linear.rs 10.00% <6.45%> (-7.31%) ⬇️
src/sourmash/search.py 95.92% <85.71%> (-0.34%) ⬇️
src/sourmash/index.py 93.62% <94.44%> (-4.02%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 516dc53...4e71e88. Read the comment docs.

@luizirber luizirber force-pushed the rust_linear_index branch from f50a8d4 to fdc60ba Compare May 15, 2021 21:33
@luizirber
Copy link
Member Author

@ctb not ready for review yet, but the two failing tests are related to mocking with FakeSignature. Probably better to fix the mocking than changing the rest of the code to accomodate?

@ctb
Copy link
Contributor

ctb commented May 16, 2021

@ctb not ready for review yet, but the two failing tests are related to mocking with FakeSignature. Probably better to fix the mocking than changing the rest of the code to accomodate?

absolutely - do you need me to do anything?

@ctb
Copy link
Contributor

ctb commented May 16, 2021

@ctb not ready for review yet, but the two failing tests are related to mocking with FakeSignature. Probably better to fix the mocking than changing the rest of the code to accomodate?

absolutely - do you need me to do anything?

Looking into this a bit more - this is really a test of LazyLinearIndex. I'm not sure of the best way to fix it, but maybe we could mock a new Index class to support this test? It just needs select and signatures methods.

@luizirber luizirber force-pushed the rust_linear_index branch from 75ba62a to 7038293 Compare May 25, 2021 21:57
@luizirber
Copy link
Member Author

Hmm, thinking about circling back to #1238 and applying the search functions parts there.

My initial plan to explore parallelism with LinearIndex was based on taking advantage of --from-pathlist to load all the signatures into one LinearIndex, and then search in parallel in Rust. But path lists support signatures, SBT and LCA indices, and each signature gets loaded into an individual LinearIndex (and all indices wrapped under a MultiIndex), which means that some logic is required to decide if something is a signature, SBT or LCA index in the path list, which would probably require loading the path, defeating the purpose of making it faster.

@ctb
Copy link
Contributor

ctb commented Jun 21, 2021

manifests. just sayin'.

@luizirber
Copy link
Member Author

#2230 is implementing this on top of manifests and a shared impl for revindex/linearindex

@luizirber luizirber closed this Feb 13, 2023
@luizirber luizirber deleted the rust_linear_index branch February 13, 2023 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants