create_cistarget_motif_databases #54

gilgolan73 · 2024-12-02T09:40:47Z

Describe the bug
Hi,
I try to create a custom cistarget motif database according to the tutorial:
https://scenicplus.readthedocs.io/en/latest/human_cerebellum_ctx_db.html
However, when I ran the create_cistarget_motif_databases I get an error that the scoring was not done for any of the motifs (see error message below).
Please help.

To Reproduce
"${SCRIPT_DIR}/create_cistarget_motif_databases.py" -f ${FASTA_FILE} -M ${CBDIR} -m ${MOTIF_LIST} -o ${OUT_DIR}/${DATABASE_PREFIX} --bgpadding 1000 -t 20 -c "/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust"

Error output
For any of the motifs, I get this message:
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/transfac_public__M00159.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'

Scoring 10249 motifs with Cluster-Buster took: 305.962677 seconds

Error: None of 10249 motifs were scored successfully.

Expected behavior
Motifs will be scored, a cistarget database will be created.

Screenshots
If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Python: 3.11.10
SCENIC+:1.0a1
If a bug is related to another module [e.g. matplotlib 3.3.9]
Additional context
Add any other context about the problem here.

ghuls · 2024-12-09T10:03:01Z

What is the output of one of the cbust commands?

/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust \
    -f 4 \
    -c 0.0 \
    -r 10000 \
    -b 1000 \
    -t 1 \
    /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb \
    /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa

It likely gives an error at the moment.

gilgolan73 · 2024-12-09T11:27:44Z

Hi,
I get the following output: "Illegal instruction (core dumped)"

Thank you,
Gil

ghuls · 2024-12-09T17:11:54Z

It looks like your CPU does not support all instructions used in the precompiled version of Cluster-Buster.

You will have to compile Cluster-Buster yourself (skip cbust_amd_libm_aocc lines):
https://github.com/aertslab/create_cisTarget_databases?tab=readme-ov-file#compile-from-source

gilgolan73 · 2024-12-10T11:24:19Z

Hello,
I compiled Cluster-Buster and now the command works.
I am trying to run ""${SCRIPT_DIR}/create_cistarget_motif_databases.py" -f ${FASTA_FILE} -M ${CBDIR} -m ${MOTIF_LIST} -o ${OUT_DIR}/${DATABASE_PREFIX} --bgpadding 1000 -t 20 -c "/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust"" now, but it is very slow (the output appears on the screen only every ~few minutes).
How long should it take? Is there any way to make it faster?

Thanks,
Gil

ghuls · 2024-12-11T09:14:38Z

Depending on the motif file, it takes between 2 minutes and 40 minutes (the latter only for a few very big metacluster motifs) per motif.

gilgolan73 · 2024-12-11T09:18:35Z

Hi,
The motif file includes ~10,000 motifs. So I guess it will take a few weeks. Is there a way to filter motifs to shorten the time?

Thank you,
Gil

ghuls · 2024-12-11T09:53:04Z

It won't take a few weeks, normally only a few hours as 20 motifs (number of threads you specified) are being scored at the same time. Also take into account that the motifs that take the longest time are scored at the start (so we don't run into a situation that only 1 slow motif file keeps running at the end for 40 minutes).

gilgolan73 · 2024-12-11T09:56:43Z

OK,
I will try to run it. Is there a recommended amount of threads? (I have ~14 virtual CPUs).

Gil

ghuls · 2024-12-11T10:11:29Z

Preferably run with the same amount of threads as you have CPUs (or with one less to account for some overhead), so in your case with -t 14 or -t 13.

I have run it here before on one node of the cluster with 72 CPUs with -t 71.

If you have access to a cluster, I recommend running it there, preferably with the original Cluster-Buster binary as it optimized for speed.

In case running it with 13 or 14 threads is to slow and you have access to multiple machines, you could run the motif scoring on multiple nodes, but in that case you will have to run an aditional script to combine the results later.

https://github.com/aertslab/create_cisTarget_databases?tab=readme-ov-file#score-motifs-in-different-parts-and-generate-rankings-in-a-separate-step

# FASTA file with sequences per region IDs / gene IDs.
fasta_filename=
# Directory with motifs in Cluster-Buster format.
motifs_dir=
# File with motif IDs (base name of motif file in ${motifs_dir}).
motifs_list_filename=
# cisTarget motif database output prefix.
db_prefix=

nbr_threads=13
# For example if you have 10 machines available
nbr_parts=10

# Create a partial directory, so partial cisTarget database files can be deleted easily afterwards.
mkdir partial

# Each invocation of the for loop (with different ${current_part}) can also be submitted to a different node to speedup
# the motif scoring.

for current_part in $(seq 1 ${nbr_total_parts}) ; do
    # Print command line for partial motif scoring (-p parameter added) (and run one line on each node).
    echo "${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
         -f "${fasta_filename}" \
         -M "${motifs_dir}" \
         -m "${motifs_list_filename}" \
         --bgpadding 1000 \
         -p "${current_part}" "${nbr_total_parts}" \
         -o "partial/${db_prefix}" \
         -t "${nbr_threads}"
done

# After partial motif scoring, combine the results to one database:
"${create_cistarget_databases_dir}/combine_partial_motifs_or_tracks_vs_regions_or_genes_cistarget_dbs.py \
    -i partial/ \
    -o .

# Partial cisTarget databases can be removed.
#rm -r partial

gilgolan73 · 2024-12-11T10:13:23Z

I understand, thank you very much for the help

gilgolan73 · 2024-12-22T14:52:49Z

Hi @ghuls,
I managed to build the cistarget database, and used it for eGRN inference using the scenic+ pipeline.
When I looked at the results, I saw that some TFs are missing (which are known to be important regulators in this data).
This trend also appeared when used the pre-computed mice cistarget database.
When I looked at the cistarget databases (either custom or pre-computed), I saw that I can find their motifs.

Is there a way to manually add their motifs to the motif collection (From which the cistarget database is built)? These TFs have annotated motifs in JASPAR database, and also their motifs appear in the SCENIC database.

** when I look at the motifs annotation database for mice (https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl), I see these TFs (in the column gene_name).

Thank you,
Gil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create_cistarget_motif_databases #54

create_cistarget_motif_databases #54

gilgolan73 commented Dec 2, 2024

ghuls commented Dec 9, 2024 •

edited

Loading

gilgolan73 commented Dec 9, 2024

ghuls commented Dec 9, 2024

gilgolan73 commented Dec 10, 2024

ghuls commented Dec 11, 2024 •

edited

Loading

gilgolan73 commented Dec 11, 2024

ghuls commented Dec 11, 2024

gilgolan73 commented Dec 11, 2024

ghuls commented Dec 11, 2024

gilgolan73 commented Dec 11, 2024

gilgolan73 commented Dec 22, 2024 •

edited

Loading

create_cistarget_motif_databases #54

create_cistarget_motif_databases #54

Comments

gilgolan73 commented Dec 2, 2024

ghuls commented Dec 9, 2024 • edited Loading

gilgolan73 commented Dec 9, 2024

ghuls commented Dec 9, 2024

gilgolan73 commented Dec 10, 2024

ghuls commented Dec 11, 2024 • edited Loading

gilgolan73 commented Dec 11, 2024

ghuls commented Dec 11, 2024

gilgolan73 commented Dec 11, 2024

ghuls commented Dec 11, 2024

gilgolan73 commented Dec 11, 2024

gilgolan73 commented Dec 22, 2024 • edited Loading

ghuls commented Dec 9, 2024 •

edited

Loading

ghuls commented Dec 11, 2024 •

edited

Loading

gilgolan73 commented Dec 22, 2024 •

edited

Loading