Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trained model not found #65

Closed
aniko-meijer opened this issue Nov 3, 2022 · 7 comments
Closed

trained model not found #65

aniko-meijer opened this issue Nov 3, 2022 · 7 comments

Comments

@aniko-meijer
Copy link

Hi Simon, thank you for creating xTEA!

I have been trying to run xtea on illumina reads but at the genotyping step I get the following error message:

Traceback (most recent call last):
  File "/home/aniko.meijer/software/xTea/xtea/x_TEA_main.py", line 932, in <module>
    gc.predict_for_site(sf_model, sf_xTEA, sf_new)
  File "/home/aniko.meijer/software/xTea/xtea/x_genotype_classify.py", line 137, in predict_for_site
    rf_model_df21.load(sf_model)
  File "/home/aniko.meijer/anaconda3/envs/myXtea/lib/python3.7/site-packages/deepforest/cascade.py", line 1292, in load
    d = _io.model_loadobj(dirname, "param")
  File "/home/aniko.meijer/anaconda3/envs/myXtea/lib/python3.7/site-packages/deepforest/_io.py", line 300, in model_loadobj
    raise RuntimeError(msg.format(dirname))
RuntimeError: Cannot find the target directory: /home/aniko.meijer/software/xTea/xtea/genotyping/trained_model_ssc_py2_random_forest_two_category.pkl.
sort: cannot read: /mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt: No such file or directory
Traceback (most recent call last):
  File "/home/aniko.meijer/software/xTea/xtea/x_TEA_main.py", line 964, in <module>
    gvcf.cvt_raw_rslt_to_gvcf(s_sample_id, sf_bam, sf_raw_rslt, i_rep_type, sf_ref, sf_vcf)
  File "/home/aniko.meijer/software/xTea/xtea/x_gvcf.py", line 199, in cvt_raw_rslt_to_gvcf
    with open(sf_raw_rslt_sorted) as fin_rslt:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted'

It seems that the genotyping trained model "/home/aniko.meijer/software/xTea/xtea/genotyping/trained_model_ssc_py2_random_forest_two_category.pkl" can't be found, but the file exists in the specified directory.

I tried to rename the file to "trained_model_ssc_py2_random_forest_two_category.pkl." given the dot that appears after the file name in the error message but that did not solve the problem. I also tried to copy the file to another location and to change the --model parameter in the run_xtea_pipeline.sh file but also in the new location the program was not able to find the file, even through it was there.

Do you know what causes this error? I'm using python 3.7.12 and deep-forest 0.1.7. Thank you!
The commands from the run_xtea_pipeline.sh file:

python ${XTEA_PATH}"x_TEA_main.py" -C -i ${BAM_LIST} --lc 3 --rc 3 --cr 1  -r ${L1_COPY_WITH_FLANK}  -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt"  -n 12 --cp /mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/pub_clip/     --resume
python ${XTEA_PATH}"x_TEA_main.py"  -D -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 12    --resume
python ${XTEA_PATH}"x_TEA_main.py" -N --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt"    --resume
python ${XTEA_PATH}"x_TEA_main.py" --transduction --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_disc_filtered_cns.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 1 -a ${ANNOTATION1}   --resume -o ${PREFIX}"candidate_disc_filtered_cns2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --sibling --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 1 -a ${ANNOTATION1} --blacklist ${BLACK_LIST}   --resume -o ${PREFIX}"candidate_sibling_transduction2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 1 -p ${TMP_CNS} -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -a ${ANNOTATION1}  -o ${PREFIX}"candidate_disc_filtered_cns_post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 1 -p ${TMP_CNS} -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt.high_confident" -a ${ANNOTATION1} --blacklist ${BLACK_LIST}  -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt"  -n 12 -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gntp_classify -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt"  -n 1 --model ${XTEA_PATH}"genotyping/trained_model_ssc_py2_random_forest_two_category.pkl"  -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gVCF -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt"  -o ${PREFIX} -b ${BAM_LIST} --ref ${REF} --rtype 1
@simoncchu
Copy link
Collaborator

simoncchu commented Nov 3, 2022

Could you try with the latest version of xtea from the github? like git clone https://github.com/parklab/xTea.git, and rerun with it?

@aniko-meijer
Copy link
Author

Thank worked! I ran into the bamsnap issue that was raised previsously (#19), but following your advise I'll ignore the error.

Thank you!

@simoncchu
Copy link
Collaborator

Good to know that you have solved the problem. I'll close this issue for now.

@wanqingshao
Copy link

Hello,

I'm having the same issue, cloning the newest xTea repo didn't solve the problem, have tried python 3.7.12 + deep-forest 0.1.7 (installed through conda xTea) and python 3.6 + deep-forest 0.1.5 (installed using pip for individual packages), neither worked.

In python
model = CascadeForestClassifier() then model.load(path_to_pkl) shows that the model is not able to find the pkl file even though it is present. changing trained_model_ssc_py2_random_forest_two_category.pkl to param.pkl and using the folder path in model.load causes the function to read the param.pkl file, but it later ran into other package compatibility issues.

Could you help point to the correct version of deek-forest to use? Or other suggestions on how to fix it are really appreciated.

Thanks in advance for the help! Can't wait to try this awesome tool!

-Wanqing

@wanqingshao
Copy link

Okay, I might have found the problem, my xtea was initially installed with conda install -y xtea=0.1.6 as suggested by README, and I have been using the xtea that came along with the conda install, and it points the model to the trained_model_ssc_py2_random_forest_two_category.pkl file. Switching to xtea within the bin folder from this repo points the model path to the DF21_model_1_2 folder.

@simoncchu
Copy link
Collaborator

Hi, have you solved this problem? If not, please reopen this ticket.

@wanqingshao
Copy link

Hi Simon,

Thanks a lot for the reply! it seems to be working. I'm starting a new run and will reopen if it errors out. Might be good to update the readme conda xtea version, or add a note, so other people won't run into this.

Thanks for creating the tool!

-Wanqing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants