error with 470 genomes #947

macmanes · 2024-12-02T13:36:28Z

Hi All,

I'm having an issue with running orthofinder on 470 genomes in protein space. This occurs at the end of the "initial processing" steps. The error message is below, but (I think) boils down to this one

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.63 MiB for an array with shape (428224,) and data type int32

Full error message:

2024-12-01 17:49:48 : Initial processing of species 466 complete
2024-12-01 18:09:00 : Initial processing of species 468 complete
2024-12-01 18:16:52 : Initial processing of species 469 complete
2024-12-01 18:21:37 : Initial processing of species 470 complete
Process Process-95:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
    WaterfallMethod.ConnectCognates(*args, d_pickle=d_pickle)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 549, in ConnectCognates
    B = matrices.LoadMatrixArray("B", seqsInfo, iSpecies, d_pickle)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/matrices.py", line 54, in LoadMatrixArray
    matrixArray.append(LoadMatrix(name, iSpecies, jSpecies, d_pickle))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/matrices.py", line 47, in LoadMatrix
    M = pic.load(picFile)
        ^^^^^^^^^^^^^^^^^


...


MemoryError
Process Process-111:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 550, in ConnectCognates
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 620, in ConnectAllBetterThanAnOrtholog_s
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 589, in GetMostDistant_s
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/site-packages/scipy/sparse/_lil.py", line 412, in tocsr
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.63 MiB for an array with shape (428224,) and data type int32
Process Process-108:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
    WaterfallMethod.ConnectCognates(*args, d_pickle=d_pickle)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 549, in ConnectCognates
    B = matrices.LoadMatrixArray("B", seqsInfo, iSpecies, d_pickle)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I do have 700Gb of RAM available, and 64-bit python. No indication (from slurm) that this is a RAM/disk issue.

Thoughts about this? Any help appreciated.

The text was updated successfully, but these errors were encountered:

Jonathan-Holmes-Bioinformatics · 2024-12-05T14:45:44Z

Hi macmanes,

Running 470 species on orthofinder-2.5.5 is quite a challenge (+16 days). You will also be making a very large matrix file which might max out your RAM, are you running this with MAFFT or DendroBLAST?

I would recommend potentially switching to using the new --core --assign function. To do this sample a subset of your proteomes to build a core and the assign further proteomes using --assign. You can view this information on the main github page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error with 470 genomes #947

error with 470 genomes #947

macmanes commented Dec 2, 2024

Jonathan-Holmes-Bioinformatics commented Dec 5, 2024

error with 470 genomes #947

error with 470 genomes #947

Comments

macmanes commented Dec 2, 2024

Jonathan-Holmes-Bioinformatics commented Dec 5, 2024