Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with 470 genomes #947

Open
macmanes opened this issue Dec 2, 2024 · 1 comment
Open

error with 470 genomes #947

macmanes opened this issue Dec 2, 2024 · 1 comment

Comments

@macmanes
Copy link

macmanes commented Dec 2, 2024

Hi All,

I'm having an issue with running orthofinder on 470 genomes in protein space. This occurs at the end of the "initial processing" steps. The error message is below, but (I think) boils down to this one

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.63 MiB for an array with shape (428224,) and data type int32

Full error message:

2024-12-01 17:49:48 : Initial processing of species 466 complete
2024-12-01 18:09:00 : Initial processing of species 468 complete
2024-12-01 18:16:52 : Initial processing of species 469 complete
2024-12-01 18:21:37 : Initial processing of species 470 complete
Process Process-95:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
    WaterfallMethod.ConnectCognates(*args, d_pickle=d_pickle)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 549, in ConnectCognates
    B = matrices.LoadMatrixArray("B", seqsInfo, iSpecies, d_pickle)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/matrices.py", line 54, in LoadMatrixArray
    matrixArray.append(LoadMatrix(name, iSpecies, jSpecies, d_pickle))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/matrices.py", line 47, in LoadMatrix
    M = pic.load(picFile)
        ^^^^^^^^^^^^^^^^^


...


MemoryError
Process Process-111:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 550, in ConnectCognates
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 620, in ConnectAllBetterThanAnOrtholog_s
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 589, in GetMostDistant_s
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/site-packages/scipy/sparse/_lil.py", line 412, in tocsr
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 1.63 MiB for an array with shape (428224,) and data type int32
Process Process-108:
Traceback (most recent call last):
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/mnt/lustre/software/anaconda/colsa/envs/orthofinder-2.5.5/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 560, in Worker_ConnectCognates
    WaterfallMethod.ConnectCognates(*args, d_pickle=d_pickle)
  File "/mnt/gpfs01/software/anaconda/colsa/envs/orthofinder-2.5.5/bin/scripts_of/__main__.py", line 549, in ConnectCognates
    B = matrices.LoadMatrixArray("B", seqsInfo, iSpecies, d_pickle)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I do have 700Gb of RAM available, and 64-bit python. No indication (from slurm) that this is a RAM/disk issue.

Thoughts about this? Any help appreciated.

@Jonathan-Holmes-Bioinformatics

Hi macmanes,

Running 470 species on orthofinder-2.5.5 is quite a challenge (+16 days). You will also be making a very large matrix file which might max out your RAM, are you running this with MAFFT or DendroBLAST?

I would recommend potentially switching to using the new --core --assign function. To do this sample a subset of your proteomes to build a core and the assign further proteomes using --assign. You can view this information on the main github page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants