Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking with SMAD4 sequence in 2022 and 2024 JASPAR releases #9

Open
ievarau opened this issue Dec 18, 2024 · 1 comment
Open

Breaking with SMAD4 sequence in 2022 and 2024 JASPAR releases #9

ievarau opened this issue Dec 18, 2024 · 1 comment

Comments

@ievarau
Copy link

ievarau commented Dec 18, 2024

Hello,

we have identified that inference tool breaks with SMAD4 protein sequence in 2022 and 2024 releases of JASPAR. 2020 seems to be working. I have tried running the inference using environment and specific versions of this repo, so I can see what is the error. I will post the output.

SMAD4 protein sequence I used:
MDNMSITNTPTSNDACLSIVHSLMCHRQGGESETFAKRAIESLVKKLKEKKDELDSLITAITTNGAHPSKCVTIQRTLDGRLQVAGRKGFPHVIYARLWRWPDLHKNELKHVKYCQYAFDLKCDSVCVNPYHYERVVSPGIDLSGLTLQSNAPSMLVKDEYVHDFEGQPSLPTEGHSIQTIQHPPSNRASTETYSAPALLAPAESNATSTTNFPNIPVASTSQPASILAGSHSEGLLQIASGPQPGQQQNGFTAQPATYHHNSTTTWTGSRTAPYTPNLPHHQNGHLQHHPPMPPHPGHYWPVHNELAFQPPISNHPAPEYWCSIAYFEMDVQVGETFKVPSSCPVVTVDGYVDPSGGDRFCLGQLSNVHRTEAIERARLHIGKGVQLECKGEGDVWVRCLSDHAVFVQSYYLDREAGRAPGDAVHKIYPSAYIKVFDLRQCHRQMQQQAATAQAAAAAQAAAVAGNIPGPGSVGGIAPAISLSAAAGIGVDDLRRLCILRMSFVKGWGPDYPRQSIKETPCWIEIHLHRALQLLDEVLHTMPIADPQPLD

Running with commit used in 2020:

HEAD is now at faa66049 Update infer_profile.py

(inference) [ievarau@biotin3 JASPAR-inference-tool]$ ./infer_profile.py --fasta-file ./examples/smad.fa --files-dir ./files/ --models-dir ./models/ --latest

0it [00:00, ?it/s]
Query	TF Name	TF Matrix	E-value	Query Start-End	TF Start-End	DBD %ID	Similarity Regression

Running with commit used in 2022:

(inference) [ievarau@biotin3 JASPAR-inference-tool]$ git checkout "5fa64a79c79763384b484d9345bf112bd7dcbf11"

Updating files: 100% (4967/4967), done.
Previous HEAD position was faa66049 Update infer_profile.py
HEAD is now at 5fa64a79 changed modes

(inference) [ievarau@biotin3 JASPAR-inference-tool]$ ./infer_profile.py --latest ./examples/smad.fa

Traceback (most recent call last):
  File "./infer_profile.py", line 661, in <module>
    main()
  File "./infer_profile.py", line 88, in main
    args.output_file, args.threads, args.latest, args.rost, args.taxon)
  File "./infer_profile.py", line 123, in infer_profiles
    pool = Pool(min([threads, len(seq_records)]))
  File "/div/pythagoras/u2/ievarau/micromamba/envs/inference/lib/python3.6/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/div/pythagoras/u2/ievarau/micromamba/envs/inference/lib/python3.6/multiprocessing/pool.py", line 167, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

Running with jaspar-2024 tag:

(inference) [ievarau@biotin3 JASPAR-inference-tool]$ git checkout "jaspar-2024"

Updating files: 100% (2774/2774), done.
Previous HEAD position was 5fa64a79 changed modes
HEAD is now at 4b852a51 check pfam version

(inference) [ievarau@biotin3 JASPAR-inference-tool]$ ./infer_profile.py --latest ./examples/smad.fa

Traceback (most recent call last):
  File "./infer_profile.py", line 662, in <module>
    main()
  File "./infer_profile.py", line 88, in main
    args.output_file, args.threads, args.latest, args.rost, args.taxon)
  File "./infer_profile.py", line 123, in infer_profiles
    pool = Pool(min([threads, len(seq_records)]))
  File "/div/pythagoras/u2/ievarau/micromamba/envs/inference/lib/python3.6/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/div/pythagoras/u2/ievarau/micromamba/envs/inference/lib/python3.6/multiprocessing/pool.py", line 167, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
@ievarau
Copy link
Author

ievarau commented Dec 18, 2024

I am updating the issue. It actually works. The input requires to have fasta headers. Once I adjusted the input file, it worked. However, on JASPAR website the sequences are pasted without header and then the tool does not work. Example sequence though works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant