trained model not found #65

aniko-meijer · 2022-11-03T17:54:39Z

Hi Simon, thank you for creating xTEA!

I have been trying to run xtea on illumina reads but at the genotyping step I get the following error message:

Traceback (most recent call last):
  File "/home/aniko.meijer/software/xTea/xtea/x_TEA_main.py", line 932, in <module>
    gc.predict_for_site(sf_model, sf_xTEA, sf_new)
  File "/home/aniko.meijer/software/xTea/xtea/x_genotype_classify.py", line 137, in predict_for_site
    rf_model_df21.load(sf_model)
  File "/home/aniko.meijer/anaconda3/envs/myXtea/lib/python3.7/site-packages/deepforest/cascade.py", line 1292, in load
    d = _io.model_loadobj(dirname, "param")
  File "/home/aniko.meijer/anaconda3/envs/myXtea/lib/python3.7/site-packages/deepforest/_io.py", line 300, in model_loadobj
    raise RuntimeError(msg.format(dirname))
RuntimeError: Cannot find the target directory: /home/aniko.meijer/software/xTea/xtea/genotyping/trained_model_ssc_py2_random_forest_two_category.pkl.
sort: cannot read: /mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt: No such file or directory
Traceback (most recent call last):
  File "/home/aniko.meijer/software/xTea/xtea/x_TEA_main.py", line 964, in <module>
    gvcf.cvt_raw_rslt_to_gvcf(s_sample_id, sf_bam, sf_raw_rslt, i_rep_type, sf_ref, sf_vcf)
  File "/home/aniko.meijer/software/xTea/xtea/x_gvcf.py", line 199, in cvt_raw_rslt_to_gvcf
    with open(sf_raw_rslt_sorted) as fin_rslt:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/L1/candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt.sorted'

It seems that the genotyping trained model "/home/aniko.meijer/software/xTea/xtea/genotyping/trained_model_ssc_py2_random_forest_two_category.pkl" can't be found, but the file exists in the specified directory.

I tried to rename the file to "trained_model_ssc_py2_random_forest_two_category.pkl." given the dot that appears after the file name in the error message but that did not solve the problem. I also tried to copy the file to another location and to change the --model parameter in the run_xtea_pipeline.sh file but also in the new location the program was not able to find the file, even through it was there.

Do you know what causes this error? I'm using python 3.7.12 and deep-forest 0.1.7. Thank you!
The commands from the run_xtea_pipeline.sh file:

python ${XTEA_PATH}"x_TEA_main.py" -C -i ${BAM_LIST} --lc 3 --rc 3 --cr 1  -r ${L1_COPY_WITH_FLANK}  -a ${ANNOTATION} --cns ${L1_CNS} --ref ${REF} -p ${TMP} -o ${PREFIX}"candidate_list_from_clip.txt"  -n 12 --cp /mnt/test/scratch/aniko.meijer/Liz_9_7_51_transposons/xTEA/mark_shortread/pub_clip/     --resume
python ${XTEA_PATH}"x_TEA_main.py"  -D -i ${PREFIX}"candidate_list_from_clip.txt" --nd 5 --ref ${REF} -a ${ANNOTATION} -b ${BAM_LIST} -p ${TMP} -o ${PREFIX}"candidate_list_from_disc.txt" -n 12    --resume
python ${XTEA_PATH}"x_TEA_main.py" -N --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_CNS} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_list_from_disc.txt" -r ${L1_CNS} --ref ${REF} -a ${ANNOTATION} -o ${PREFIX}"candidate_disc_filtered_cns.txt"    --resume
python ${XTEA_PATH}"x_TEA_main.py" --transduction --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_disc_filtered_cns.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 1 -a ${ANNOTATION1}   --resume -o ${PREFIX}"candidate_disc_filtered_cns2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --sibling --cr 3 --nd 5 -b ${BAM_LIST} -p ${TMP_TNSD} --fflank ${SF_FLANK} --flklen 3000 -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -r ${L1_CNS} --ref ${REF} --input2 ${PREFIX}"candidate_list_from_disc.txt.clip_sites_raw_disc.txt" --rtype 1 -a ${ANNOTATION1} --blacklist ${BLACK_LIST}   --resume -o ${PREFIX}"candidate_sibling_transduction2.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 1 -p ${TMP_CNS} -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt" -a ${ANNOTATION1}  -o ${PREFIX}"candidate_disc_filtered_cns_post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --postF --rtype 1 -p ${TMP_CNS} -n 12 -i ${PREFIX}"candidate_disc_filtered_cns2.txt.high_confident" -a ${ANNOTATION1} --blacklist ${BLACK_LIST}  -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gene -a ${GENE} -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering.txt"  -n 12 -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gntp_classify -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene.txt"  -n 1 --model ${XTEA_PATH}"genotyping/trained_model_ssc_py2_random_forest_two_category.pkl"  -o ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt"
python ${XTEA_PATH}"x_TEA_main.py" --gVCF -i ${PREFIX}"candidate_disc_filtered_cns.txt.high_confident.post_filtering_with_gene_gntp.txt"  -o ${PREFIX} -b ${BAM_LIST} --ref ${REF} --rtype 1

The text was updated successfully, but these errors were encountered:

simoncchu · 2022-11-03T18:24:11Z

Could you try with the latest version of xtea from the github? like git clone https://github.com/parklab/xTea.git, and rerun with it?

aniko-meijer · 2022-11-04T07:56:22Z

Thank worked! I ran into the bamsnap issue that was raised previsously (#19), but following your advise I'll ignore the error.

Thank you!

simoncchu · 2022-11-04T11:44:38Z

Good to know that you have solved the problem. I'll close this issue for now.

wanqingshao · 2022-11-18T17:11:40Z

Hello,

I'm having the same issue, cloning the newest xTea repo didn't solve the problem, have tried python 3.7.12 + deep-forest 0.1.7 (installed through conda xTea) and python 3.6 + deep-forest 0.1.5 (installed using pip for individual packages), neither worked.

In python
model = CascadeForestClassifier() then model.load(path_to_pkl) shows that the model is not able to find the pkl file even though it is present. changing trained_model_ssc_py2_random_forest_two_category.pkl to param.pkl and using the folder path in model.load causes the function to read the param.pkl file, but it later ran into other package compatibility issues.

Could you help point to the correct version of deek-forest to use? Or other suggestions on how to fix it are really appreciated.

Thanks in advance for the help! Can't wait to try this awesome tool!

-Wanqing

wanqingshao · 2022-11-18T17:44:38Z

Okay, I might have found the problem, my xtea was initially installed with conda install -y xtea=0.1.6 as suggested by README, and I have been using the xtea that came along with the conda install, and it points the model to the trained_model_ssc_py2_random_forest_two_category.pkl file. Switching to xtea within the bin folder from this repo points the model path to the DF21_model_1_2 folder.

simoncchu · 2022-11-18T17:57:14Z

Hi, have you solved this problem? If not, please reopen this ticket.

wanqingshao · 2022-11-18T18:12:16Z

Hi Simon,

Thanks a lot for the reply! it seems to be working. I'm starting a new run and will reopen if it errors out. Might be good to update the readme conda xtea version, or add a note, so other people won't run into this.

Thanks for creating the tool!

-Wanqing

simoncchu closed this as completed Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trained model not found #65

trained model not found #65

aniko-meijer commented Nov 3, 2022

simoncchu commented Nov 3, 2022 •

edited

Loading

aniko-meijer commented Nov 4, 2022

simoncchu commented Nov 4, 2022

wanqingshao commented Nov 18, 2022

wanqingshao commented Nov 18, 2022

simoncchu commented Nov 18, 2022

wanqingshao commented Nov 18, 2022

trained model not found #65

trained model not found #65

Comments

aniko-meijer commented Nov 3, 2022

simoncchu commented Nov 3, 2022 • edited Loading

aniko-meijer commented Nov 4, 2022

simoncchu commented Nov 4, 2022

wanqingshao commented Nov 18, 2022

wanqingshao commented Nov 18, 2022

simoncchu commented Nov 18, 2022

wanqingshao commented Nov 18, 2022

simoncchu commented Nov 3, 2022 •

edited

Loading