-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_cistarget_motif_databases #54
Comments
What is the output of one of the cbust commands? /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust \
-f 4 \
-c 0.0 \
-r 10000 \
-b 1000 \
-t 1 \
/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb \
/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa It likely gives an error at the moment. |
Hi, Thank you, |
It looks like your CPU does not support all instructions used in the precompiled version of Cluster-Buster. You will have to compile Cluster-Buster yourself (skip |
Hello, Thanks, |
Depending on the motif file, it takes between 2 minutes and 40 minutes (the latter only for a few very big metacluster motifs) per motif. |
Hi, Thank you, |
It won't take a few weeks, normally only a few hours as 20 motifs (number of threads you specified) are being scored at the same time. Also take into account that the motifs that take the longest time are scored at the start (so we don't run into a situation that only 1 slow motif file keeps running at the end for 40 minutes). |
OK, Gil |
Preferably run with the same amount of threads as you have CPUs (or with one less to account for some overhead), so in your case with I have run it here before on one node of the cluster with 72 CPUs with If you have access to a cluster, I recommend running it there, preferably with the original Cluster-Buster binary as it optimized for speed. In case running it with 13 or 14 threads is to slow and you have access to multiple machines, you could run the motif scoring on multiple nodes, but in that case you will have to run an aditional script to combine the results later. # FASTA file with sequences per region IDs / gene IDs.
fasta_filename=
# Directory with motifs in Cluster-Buster format.
motifs_dir=
# File with motif IDs (base name of motif file in ${motifs_dir}).
motifs_list_filename=
# cisTarget motif database output prefix.
db_prefix=
nbr_threads=13
# For example if you have 10 machines available
nbr_parts=10
# Create a partial directory, so partial cisTarget database files can be deleted easily afterwards.
mkdir partial
# Each invocation of the for loop (with different ${current_part}) can also be submitted to a different node to speedup
# the motif scoring.
for current_part in $(seq 1 ${nbr_total_parts}) ; do
# Print command line for partial motif scoring (-p parameter added) (and run one line on each node).
echo "${create_cistarget_databases_dir}/create_cistarget_motif_databases.py" \
-f "${fasta_filename}" \
-M "${motifs_dir}" \
-m "${motifs_list_filename}" \
--bgpadding 1000 \
-p "${current_part}" "${nbr_total_parts}" \
-o "partial/${db_prefix}" \
-t "${nbr_threads}"
done
# After partial motif scoring, combine the results to one database:
"${create_cistarget_databases_dir}/combine_partial_motifs_or_tracks_vs_regions_or_genes_cistarget_dbs.py \
-i partial/ \
-o .
# Partial cisTarget databases can be removed.
#rm -r partial |
I understand, thank you very much for the help |
Hi @ghuls, Is there a way to manually add their motifs to the motif collection (From which the cistarget database is built)? These TFs have annotated motifs in JASPAR database, and also their motifs appear in the SCENIC database. ** when I look at the motifs annotation database for mice (https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl), I see these TFs (in the column gene_name). Thank you, |
Describe the bug
Hi,
I try to create a custom cistarget motif database according to the tutorial:
https://scenicplus.readthedocs.io/en/latest/human_cerebellum_ctx_db.html
However, when I ran the create_cistarget_motif_databases I get an error that the scoring was not done for any of the motifs (see error message below).
Please help.
To Reproduce
"${SCRIPT_DIR}/create_cistarget_motif_databases.py" -f ${FASTA_FILE} -M ${CBDIR} -m ${MOTIF_LIST} -o ${OUT_DIR}/${DATABASE_PREFIX} --bgpadding 1000 -t 20 -c "/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust"
Error output
For any of the motifs, I get this message:
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/transfac_public__M00159.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'
Error: Non-zero exit status for: '/home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/cbust -f 4 -c 0.0 -r 10000 -b 1000 -t 1 /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/aertslab_motif_colleciton/v10nr_clust_public/singletons/yetfasco__YDR174W_2249.cb /home/gilgolan/bioinfo_analysis/scenicplus_multiomic/cistarget/mm10.p3_pituitary_multiomic.with_1kb_bg_padding.fa'
Scoring 10249 motifs with Cluster-Buster took: 305.962677 seconds
Error: None of 10249 motifs were scored successfully.
Expected behavior
Motifs will be scored, a cistarget database will be created.
Screenshots
If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.
Version (please complete the following information):
Python: 3.11.10
SCENIC+:1.0a1
If a bug is related to another module [e.g. matplotlib 3.3.9]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: