Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction refactoring #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

jsgounot
Copy link

@jsgounot jsgounot commented Jan 5, 2022

Potential related issues

All issues related to long runtime: #21, #24, #27

Description

After spending a day figuring out why the software takes hours to finish on some sequences, I found several issues:

  • Most of the software runtime is not linked to prediction but to pvalue calculation. This was previously done using pure python code and could be massively speed up using numpy.
  • The multiple CPUs option python multiprocessing module sometimes does not work. An issue which might be related to numpy inference on CPU attribution.
  • The code is not well written and could be fixed in multiple ways.

Pull request description

Complete rewritting of the main dvf.py script. This should produce exactly the same results than before. The results order might change however.

With one core and using the CRC test file, old version took 102 seconds while new one and in 29 seconds, meaning a 3.5 times improvement. I suspect the improvement to be actually much better than this since half of the time (~ 15 seconds) is used here to load models. New version of the software does not include the multiprocessing step.

I implement a random check that new pvalue equals old value every 1,000 predictions but this has never been trigger for all my tests.

I added a compare_results.py script to compare results regardless of the order.

Refactoring of all dvf code. Add compare_results.py script.
@ShailNair
Copy link

@jsgounot Hi, Thanks for the commit. I tried to run the modified dvf.py file. However, I get a segmentation fault error.
Here is my code

python /home/soft/DeepVirFinder/dvf_mod.py
-i 0.1.assembly/0.3.merged.contigs/merged.contigs.fix.fasta
-o 0.3.metavirome/0.1.identify_vcontig/0.2.deepvir`

pid 148590's current affinity mask: ffffffffffffffffffffffffff
pid 148590's new affinity mask: ff
Using Theano backend.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

  1. Loading Models. Model directory /home/soft/DeepVirFinder/models
  2. Encoding and Predicting Sequences.
    Predict for c_000000000001
    [Segmentation fault (core dumped)`

Also ther is no -c option for multithreading

@jsgounot
Copy link
Author

HI @ShailNair, it's been a while since I did this and to be honest, I don't remember most of it. Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements. Concerning the multi threading, as I said in my original post, the previous implementation leads to error on my server which made me remove this part. I would be happy to answer some questions but please do not expect me to provide full support, I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

@ShailNair
Copy link

ShailNair commented Sep 19, 2022

@jsgounot Thanks for the quick reply.

Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements.

Yes. I will check regarding this.

I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

That's unfortunate as multiple users have faced the speed issue with DeepVirFinder. I hope the developers @jessieren soon take an action.

owsky added a commit to owsky/DeepVirFinder that referenced this pull request May 24, 2023
Updated Python version to 3.9 alongside the dependencies
Replaced dvf.py with the one provided on pull request jessieren#29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants