You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a Bus error while trying to cluster a nearly 800GB FASTA file using mmseqs easy-linclust. Below are my command, error message, and system configuration details. I would appreciate your guidance on resolving this issue. Command:
#!/bin/bash#SBATCH --job-name=clust # Job name#SBATCH --output=logs/easy_clust_%j.log # Output log file (%j will be replaced with the job ID)#SBATCH --error=logs/easy_clust_%j.log # Error log file (%j will be replaced with the job ID)#SBATCH --ntasks=1 # Number of tasks#SBATCH --nodes=1 # Number of nodes#SBATCH --cpus-per-task=40#SBATCH --gres=gpu:1 # Number of GPUs#SBATCH --partition=stdg_defq # Partition name#SBATCH --time=168:00:00 # Time limit (hh:mm:ss)# Load necessary modules
module load mamba-24.3 # Example: load any necessary modulessource activate /exchange/xx
# Print job informationecho"Job ID: $SLURM_JOB_ID"echo"Node List: $SLURM_JOB_NODELIST"echo"Submit Directory: $SLURM_SUBMIT_DIR"# Run your application
mmseqs easy-linclust /dfs/is/home/x266288/data_process/assets/FASTA/merged_all.fasta /dfs/is/home/x266288/data_process/assets/db/clustered/indi+oas/clustedRes /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir --min-seq-id 0.95 --cov-mode 1 -c 1.0 # Replace with your actual application command
Error Message:
Job ID: 192313
Node List: stdg22
Submit Directory: /home-cdo/x266288/data_process/utils
easy-linclust /dfs/is/home/x266288/data_process/assets/FASTA/merged_all.fasta /dfs/is/home/x266288/data_process/assets/db/clustered/indi+oas/clustedRes /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir --min-seq-id 0.95 --cov-mode 1 -c 1.0
MMseqs Version: 13.45111
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Threads 40
Compressed 0
Verbosity 3
Substitution matrix nucl:nucleotide.out,aa:blosum62.out
Add backtrace false
Alignment mode 0
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.95
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 1
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a 1
Pseudo count b 1.5
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Gap open cost nucl:5,aa:11
Gap extension cost nucl:2,aa:1
Zdrop 40
Alphabet size nucl:5,aa:21
k-mers per sequence 21
Spaced k-mers 0
Spaced k-mer pattern
Scale k-mers per sequence nucl:0.200,aa:0.000
Adjust k-mer length false
Mask residues 1
Mask lower case residues 0
k-mer length 0
Shift hash 67
Split memory limit 0
Include only extendable false
Skip repeating k-mers false
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Remove temporary files true
Force restart with latest tmp false
MPI runner
Database type 0
Shuffle input database true
Createdb mode 1
Write lookup file 0
Offset of numeric ids 0
linclust /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp -e 0.001 --min-seq-id 0.95 -c 1 --cov-mode 1 --spaced-kmer-mode 0 --remove-tmp-files 1
Set cluster mode GREEDY MEM.
kmermatcher /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.95 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 1 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 40 --compressed 0 -v 3
kmermatcher /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.95 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 1 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 40 --compressed 0 -v 3
Database size: 2080936687 type: Nucleotide
Not enough memory to process at once need to split
[=================================================================] 2.08B 33m 39s 920ms
Process file into 11 parts
Generate k-mers list for 1 split
[=================================================================] 2.08B 37m 43s 776ms
Adjusted k-mer length 19
Sort kmer 0h 4m 42s 840ms
Sort by rep. sequence 0h 1m 40s 458ms
Generate k-mers list for 2 split
[=================================================================] 2.08B 37m 40s 661ms
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 392ms
Sort by rep. sequence 0h 1m 43s 902ms
Generate k-mers list for 3 split
[=================================================================] 2.08B 36m 51s 84ms
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 543ms
Sort by rep. sequence 0h 1m 41s 750ms
Generate k-mers list for 4 split
[=================================================================] 2.08B 37m 24s 796ms
Adjusted k-mer length 19
Sort kmer 0h 2m 52s 357ms
Sort by rep. sequence 0h 1m 40s 557ms
Generate k-mers list for 5 split
[=================================================================] 2.08B 37m 57s 412ms
Adjusted k-mer length 19
Sort kmer 0h 2m 57s 804ms
Sort by rep. sequence 0h 1m 39s 453ms
Generate k-mers list for 6 split
[=================================================================] 2.08B 37m 10s 891ms
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 794ms
Sort by rep. sequence 0h 1m 38s 542ms
Generate k-mers list for 7 split
[=================================================================] 2.08B 36m 53s 9ms
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 788ms
Sort by rep. sequence 0h 1m 40s 551ms
Generate k-mers list for 8 split
[=================================================================] 2.08B 36m 54s 754ms
Adjusted k-mer length 19
Sort kmer 0h 2m 49s 532ms
Sort by rep. sequence 0h 1m 40s 244ms
Generate k-mers list for 9 split
[=================================================================] 2.08B 36m 24s 93ms
Adjusted k-mer length 19
Sort kmer 0h 2m 58s 556ms
Sort by rep. sequence 0h 1m 37s 893ms
Generate k-mers list for 10 split
[=================================================================] 2.08B 36m 46s 198ms
Adjusted k-mer length 19
Sort kmer 0h 2m 57s 392ms
Sort by rep. sequence 0h 1m 36s 238ms
Generate k-mers list for 11 split
[=================================================================
/dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/linclust.sh: line 26: 23857 Bus error (core dumped) $RUNNER"$MMSEQS" kmermatcher "$INPUT""${TMP_PATH}/pref"${KMERMATCHER_PAR}
Error: kmermatcher died
Error: Search died
System Configuration:
MMseqs2 Version:13.45111
MEM:378G
From the error message, it seems related to memory allocation or hardware limitations, but I am unsure how to debug or fix this issue. If you could provide any suggestions or debugging tips, it would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
When the final split was being processed, the program got stuck for a long time. However, from the htop view, it shows that there is still a large portion of memory available, and the CPU core utilization is not very high.
Hello, author:
I encountered a Bus error while trying to cluster a nearly 800GB FASTA file using mmseqs easy-linclust. Below are my command, error message, and system configuration details. I would appreciate your guidance on resolving this issue.
Command:
Error Message:
System Configuration:
From the error message, it seems related to memory allocation or hardware limitations, but I am unsure how to debug or fix this issue. If you could provide any suggestions or debugging tips, it would be greatly appreciated!
The text was updated successfully, but these errors were encountered: