Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_cistarget_motif_databases #54

Open
gilgolan73 opened this issue Dec 2, 2024 · 11 comments
Open

create_cistarget_motif_databases #54

gilgolan73 opened this issue Dec 2, 2024 · 11 comments

Comments

@gilgolan73
Copy link

Describe the bug
Hi,
I try to create a custom cistarget motif database according to the tutorial:
https://scenicplus.readthedocs.io/en/latest/human_cerebellum_ctx_db.html
However, when I ran the create_cistarget_motif_databases I get an error that the scoring was not done for any of the motifs (see error message below).
Please help.

To Reproduce
"${SCRIPT_DIR}/create_cistarget_motif_databases.py" -f ${FASTA_FILE} -M ${CBDIR} -m ${MOTIF_LIST} -o ${OUT_DIR}/${DATABASE_PREFIX} --bgpadding 1000 -t 20 -c "/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust"

Error output
For any of the motifs, I get this message:
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/transfac_public__M00159.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'

Scoring 10249 motifs with Cluster-Buster took: 305.962677 seconds

Error: None of 10249 motifs were scored successfully.

Expected behavior
Motifs will be scored, a cistarget database will be created.

Screenshots
If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Python: 3.11.10
SCENIC+:1.0a1
If a bug is related to another module [e.g. matplotlib 3.3.9]
Additional context
Add any other context about the problem here.

@ghuls
Copy link
Member

ghuls commented Dec 9, 2024

What is the output of one of the cbust commands?

/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust \
    -f 4 \
    -c 0.0 \
    -r 10000 \
    -b 1000 \
    -t 1 \
    /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb \
    /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa

It likely gives an error at the moment.

@gilgolan73
Copy link
Author

Hi,
I get the following output: "Illegal instruction (core dumped)"

Thank you,
Gil

@ghuls
Copy link
Member

ghuls commented Dec 9, 2024

It looks like your CPU does not support all instructions used in the precompiled version of Cluster-Buster.

You will have to compile Cluster-Buster yourself (skip cbust_amd_libm_aocc lines):
https://github.com/aertslab/create_cisTarget_databases?tab=readme-ov-file#compile-from-source

@gilgolan73
Copy link
Author

Hello,
I compiled Cluster-Buster and now the command works.
I am trying to run ""${SCRIPT_DIR}/create_cistarget_motif_databases.py" -f ${FASTA_FILE} -M ${CBDIR} -m ${MOTIF_LIST} -o ${OUT_DIR}/${DATABASE_PREFIX} --bgpadding 1000 -t 20 -c "/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust"" now, but it is very slow (the output appears on the screen only every ~few minutes).
How long should it take? Is there any way to make it faster?

Thanks,
Gil

@ghuls
Copy link
Member

ghuls commented Dec 11, 2024

Depending on the motif file, it takes between 2 minutes and 40 minutes (the latter only for a few very big metacluster motifs) per motif.

@gilgolan73
Copy link
Author

Hi,
The motif file includes ~10,000 motifs. So I guess it will take a few weeks. Is there a way to filter motifs to shorten the time?

Thank you,
Gil

@ghuls
Copy link
Member

ghuls commented Dec 11, 2024

It won't take a few weeks, normally only a few hours as 20 motifs (number of threads you specified) are being scored at the same time. Also take into account that the motifs that take the longest time are scored at the start (so we don't run into a situation that only 1 slow motif file keeps running at the end for 40 minutes).

@gilgolan73
Copy link
Author

OK,
I will try to run it. Is there a recommended amount of threads? (I have ~14 virtual CPUs).

Gil

@ghuls
Copy link
Member

ghuls commented Dec 11, 2024

Preferably run with the same amount of threads as you have CPUs (or with one less to account for some overhead), so in your case with -t 14 or -t 13.

I have run it here before on one node of the cluster with 72 CPUs with -t 71.

If you have access to a cluster, I recommend running it there, preferably with the original Cluster-Buster binary as it optimized for speed.

In case running it with 13 or 14 threads is to slow and you have access to multiple machines, you could run the motif scoring on multiple nodes, but in that case you will have to run an aditional script to combine the results later.

https://github.com/aertslab/create_cisTarget_databases?tab=readme-ov-file#score-motifs-in-different-parts-and-generate-rankings-in-a-separate-step

# FASTA file with sequences per region IDs / gene IDs.
fasta_filename=
# Directory with motifs in Cluster-Buster format.
motifs_dir=
# File with motif IDs (base name of motif file in ${motifs_dir}).
motifs_list_filename=
# cisTarget motif database output prefix.
db_prefix=

nbr_threads=13
# For example if you have 10 machines available
nbr_parts=10

# Create a partial directory, so partial cisTarget database files can be deleted easily afterwards.
mkdir partial

# Each invocation of the for loop (with different ${current_part}) can also be submitted to a different node to speedup
# the motif scoring.

for current_part in $(seq 1 ${nbr_total_parts}) ; do
    # Print command line for partial motif scoring (-p parameter added) (and run one line on each node).
    echo "${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
         -f "${fasta_filename}" \
         -M "${motifs_dir}" \
         -m "${motifs_list_filename}" \
         --bgpadding 1000 \
         -p "${current_part}" "${nbr_total_parts}" \
         -o "partial/${db_prefix}" \
         -t "${nbr_threads}"
done

# After partial motif scoring, combine the results to one database:
"${create_cistarget_databases_dir}/combine_partial_motifs_or_tracks_vs_regions_or_genes_cistarget_dbs.py \
    -i partial/ \
    -o .

# Partial cisTarget databases can be removed.
#rm -r partial

@gilgolan73
Copy link
Author

I understand, thank you very much for the help

@gilgolan73
Copy link
Author

gilgolan73 commented Dec 22, 2024

Hi @ghuls,
I managed to build the cistarget database, and used it for eGRN inference using the scenic+ pipeline.
When I looked at the results, I saw that some TFs are missing (which are known to be important regulators in this data).
This trend also appeared when used the pre-computed mice cistarget database.
When I looked at the cistarget databases (either custom or pre-computed), I saw that I can find their motifs.

Is there a way to manually add their motifs to the motif collection (From which the cistarget database is built)? These TFs have annotated motifs in JASPAR database, and also their motifs appear in the SCENIC database.

** when I look at the motifs annotation database for mice (https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl), I see these TFs (in the column gene_name).

Thank you,
Gil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants