Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D Classification gets stuck immediately after noise estimation #473

Closed
clemensgrimm opened this issue May 23, 2019 · 18 comments
Closed

3D Classification gets stuck immediately after noise estimation #473

clemensgrimm opened this issue May 23, 2019 · 18 comments

Comments

@clemensgrimm
Copy link

I am experiencing a problem with a particular data.star file during 3D classification. The run always hangs immediately after noise estimation when launched with that particular file regardless what else is changed (see attached log).

A 3D refinement launched with that particular data.star file runs fine. The data.star file from this refinement run as an input to another 3D classification again stalls the classification run.

So far I was not able to determine an obvious problem with the data.

Thanks for your help,
Clemens


RELION version: 3.0.5
Precision: BASE=double, CUDA-ACC=single

=== RELION MPI setup ===

  • Number of MPI processes = 6
  • Number of threads per MPI process = 8
  • Total number of threads therefore = 48
  • Master (0) runs on host = wbbc148
  • Slave 1 runs on host = wbbc148
  • Slave 2 runs on host = wbbc148
  • Slave 3 runs on host = wbbc148
  • Slave 4 runs on host = wbbc148
  • Slave 5 runs on host = wbbc148
    =================
    Running CPU instructions in double precision.
    Estimating initial noise spectra
    11.82/11.82 min ............................................................~~(,_,">
    Setting subset size to 4500 particles

HANGS HERE <<<<<<<<<<<<<<<<<<<<<

@biochem-fan
Copy link
Member

Can you find out which particle is problematic? For example, you can split your input particles into half and see which dataset causes the issue. Continue this until the dataset becomes sufficiently small.

@biochem-fan
Copy link
Member

@clemensgrimm
Copy link
Author

clemensgrimm commented May 23, 2019 via email

@biochem-fan
Copy link
Member

Unfortunately no.

@clemensgrimm
Copy link
Author

OK, after splitting the data into four parts, all four parts work. I do not see any NANs in the data.

@clemensgrimm
Copy link
Author

... there was an issue during splitting which limited the subset size to 100 lines.

According to the documentation, the --size_split option should be ignored when giving --nr_split. This might be be a bug ...

@biochem-fan
Copy link
Member

You are right. It is a bug in relion_star_handler. Thank you very much for reporting. I will fix this in the next update.

Meanwhile, you can split files in (roughly) half by a text editor.

@clemensgrimm
Copy link
Author

clemensgrimm commented May 23, 2019 via email

@clemensgrimm
Copy link
Author

clemensgrimm commented May 23, 2019 via email

@biochem-fan
Copy link
Member

What is the box size and the angular sampling? What happens if you use the non-MPI version?

@clemensgrimm
Copy link
Author

... after a while the small datasets (1567) have proceded to the iterations. So 'stalled' is probably better described as pausing for at least several hours. As far as I can remember, there used to be nearly no lag time between noise estimation and the first expectetion iteration, at least with related datasets and earlier relion versions ...

@clemensgrimm
Copy link
Author

The box size is 504. I am currently trying the non-MPI version ...

@clemensgrimm
Copy link
Author

clemensgrimm commented May 23, 2019 via email

@clemensgrimm
Copy link
Author

As the issue could have been just lack of patience, I will give the original datset another try and wait overnight ...

@clemensgrimm
Copy link
Author

clemensgrimm commented May 24, 2019 via email

@biochem-fan
Copy link
Member

This bug was hopefully fixed. I am testing it internally. I will upload the fix in few days.

biochem-fan added a commit that referenced this issue Jun 21, 2019
…gnored when calculating ranks. This fixes the GitHub issue #473 and problems reported in CCPEM by Wim Hagen, Dieter Blaas and others.
@biochem-fan
Copy link
Member

@clemensgrimm Hopefully the above commit fixes this issue. Please try and let me know if it works.

@clemensgrimm
Copy link
Author

clemensgrimm commented Jun 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants