Prediction refactoring #29

jsgounot · 2022-01-05T04:06:29Z

Potential related issues

All issues related to long runtime: #21, #24, #27

Description

After spending a day figuring out why the software takes hours to finish on some sequences, I found several issues:

Most of the software runtime is not linked to prediction but to pvalue calculation. This was previously done using pure python code and could be massively speed up using numpy.
The multiple CPUs option python multiprocessing module sometimes does not work. An issue which might be related to numpy inference on CPU attribution.
The code is not well written and could be fixed in multiple ways.

Pull request description

Complete rewritting of the main dvf.py script. This should produce exactly the same results than before. The results order might change however.

With one core and using the CRC test file, old version took 102 seconds while new one and in 29 seconds, meaning a 3.5 times improvement. I suspect the improvement to be actually much better than this since half of the time (~ 15 seconds) is used here to load models. New version of the software does not include the multiprocessing step.

I implement a random check that new pvalue equals old value every 1,000 predictions but this has never been trigger for all my tests.

I added a compare_results.py script to compare results regardless of the order.

Refactoring of all dvf code. Add compare_results.py script.

ShailNair · 2022-09-19T01:55:14Z

@jsgounot Hi, Thanks for the commit. I tried to run the modified dvf.py file. However, I get a segmentation fault error.
Here is my code

python /home/soft/DeepVirFinder/dvf_mod.py
-i 0.1.assembly/0.3.merged.contigs/merged.contigs.fix.fasta
-o 0.3.metavirome/0.1.identify_vcontig/0.2.deepvir`

pid 148590's current affinity mask: ffffffffffffffffffffffffff
pid 148590's new affinity mask: ff
Using Theano backend.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

Loading Models. Model directory /home/soft/DeepVirFinder/models

Encoding and Predicting Sequences.
Predict for c_000000000001
[Segmentation fault (core dumped)`

Also ther is no -c option for multithreading

jsgounot · 2022-09-19T02:55:27Z

HI @ShailNair, it's been a while since I did this and to be honest, I don't remember most of it. Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements. Concerning the multi threading, as I said in my original post, the previous implementation leads to error on my server which made me remove this part. I would be happy to answer some questions but please do not expect me to provide full support, I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

ShailNair · 2022-09-19T05:39:34Z

@jsgounot Thanks for the quick reply.

Regarding your segmentation fault, I guess this can be linked directly to your numpy and keras installation (my guess would be during the prediction part). One way to quickly check this would be to use multiple print statements.

Yes. I will check regarding this.

I'm not one of the author of this tool and people behind this project completely ignored all the merge requests before.

That's unfortunate as multiple users have faced the speed issue with DeepVirFinder. I hope the developers @jessieren soon take an action.

Updated Python version to 3.9 alongside the dependencies Replaced dvf.py with the one provided on pull request jessieren#29

Prediciton refactoring

d6c65b2

Refactoring of all dvf code. Add compare_results.py script.

owsky added a commit to owsky/DeepVirFinder that referenced this pull request May 24, 2023

Refactored the code

a63c010

Updated Python version to 3.9 alongside the dependencies Replaced dvf.py with the one provided on pull request jessieren#29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction refactoring #29

Prediction refactoring #29

jsgounot commented Jan 5, 2022

ShailNair commented Sep 19, 2022

jsgounot commented Sep 19, 2022

ShailNair commented Sep 19, 2022 •

edited

Loading

Prediction refactoring #29

Are you sure you want to change the base?

Prediction refactoring #29

Conversation

jsgounot commented Jan 5, 2022

Potential related issues

Description

Pull request description

ShailNair commented Sep 19, 2022

jsgounot commented Sep 19, 2022

ShailNair commented Sep 19, 2022 • edited Loading

ShailNair commented Sep 19, 2022 •

edited

Loading