Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load a real world dataset? #3

Open
dugu9sword opened this issue Aug 25, 2022 · 3 comments
Open

How to load a real world dataset? #3

dugu9sword opened this issue Aug 25, 2022 · 3 comments

Comments

@dugu9sword
Copy link

Hi,

I have downloaded the EMPIAR 10049 data following this link: https://github.com/zhonge/cryodrgn_empiar

But there are some questions:

  • Error occurs when loading the provided .star file, how can I resolve it?
  • How to load the ctf.pkl?
Traceback (most recent call last):
  File "/root/cryoAI/src/reconstruct/main.py", line 273, in <module>
    retval, status_message = main()
  File "/root/cryoAI/src/reconstruct/main.py", line 260, in main
    train(config)
  File "/root/cryoAI/src/reconstruct/train.py", line 20, in experiment
    dataset = StarfileDataLoader(config)
  File "/root/cryoAI/src/dataio.py", line 48, in __init__
    self.true_sidelen = self.df['optics']['rlnImageSize'][0]
  File "/root/miniconda3/envs/cryoai/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/root/miniconda3/envs/cryoai/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'optics'

Error: Training failed.

Thanks!

@bHimes
Copy link
Collaborator

bHimes commented Aug 25, 2022

Hi @dugu9sword

Based on your error message, your star file is missing the 'rlnImageSize' column in the optics group.

The easiest thing to get around this would be to modify your star file. Optics groups are defined on the relion wiki and have something like five columns if my memory serves.

@fredericpoitevin I think I had run into this too, but haven't had time to work with cryoAI much : (

If I could suggest: This case could be easily caught using a dictionaries get method, which would return 'None' on key error. For some keys, like this one, the value is implicitly defined by the input data, so could be deduced by cryoAI.

@dugu9sword
Copy link
Author

dugu9sword commented Aug 30, 2022

Hi @bHimes ,

Thanks a lot for your fruitful suggestions! I checked out the RELION's wiki and found some workaround. I re-used the parameters (of optics) for generating synthetic data of EMPIAR-10028 in CryoAI, and trained on the real-world 10028 dataset (https://github.com/zhonge/cryodrgn_empiar).

"optics": {
                    'rlnVoltage': {0: 300.0},
                    'rlnSphericalAberration': {0: 2.7},
                    'rlnAmplitudeContrast': {0: 0.1},
                    'rlnOpticsGroup': {0: 1},
                    'rlnImageSize': {0: 128},
                    'rlnImagePixelSize': {0: 3.77}
                }, 

The reconstructed volume (at step 52648) seems poor. It is acceptable since amotized inference for pose estimation is a proof-of-concept technology in this area.

I am wondering is there any advice or best practice for runing CryoAI on real-world data? Thanks!

drawing

drawing

drawing

@ff98li
Copy link

ff98li commented Dec 8, 2022

I ran into the same key error for missing the optics group in the input .star file when trying to load a real EMPIAR dataset. Based on what I have read in RELION's docs, it appears that the .star file parser used by CryoAI has been assuming a new feature added after RELION 3.1+ i.e. the optics group, which could be missing for cryo-em datasets released before 2020...for now the best way of solving it is probably using RELION 3.1+ to convert the old file format into the new one (with optics group added)...
As for @dugu9sword 's question regarding the reconstruction quality, if you take a look at the .star file of the 10028 dataset, you will find that rlnSphericalAberration is actually 2.000000 in the fifth column, rather than the 2.7 used in your input. I presume this could be a possible source of poor reconstruction quality? I'm not 100% cryo-em expert but I hope this can help.

Edit: 2022.12.11

I have found a solution to op's issue of getting KeyError: 'optics'. Again, the problem comes from CryoAI assuming .star file containing the optics group, which is a feature that came out after RELION 3.1+. However, even if you converted your raw .star file (for example, shiny_2sets.star in empiar-10028) to the updated format that has the optics group included by running relion_convert_star with RELION 3.1+, you will still get another key error for missing rlnAngleRot, which is a parameter that would exist only if you had performed 3D refinement beforehand...well, since this is supposed be an ab initio reconstruction pipeline...☹️

Luckily, among the preprocessed files of cryo-em datasets provided by Zhong, the .cs file contains all the information that you need for running CryoAI. So what you need to do for making CryoAI work with empiar-10028 is the following:

Step 1. Install pyem

Step 2. Run csparc2star.py to convert cryosparc_P11_J4_003_particles.cs into a .star file (remember to update your .ini file)

Step 3. If you open your converted .star file, you will notice that the particle stacks .mrcs were placed in a relative path J1/imported/MRC_1901/. So in the directory where you saved your converted .star file, mkdir -p J1/imported and mv the two particle stack directories inside.

Step 4. In your first run, if you encounter this particle invalid warning:

In my case, images within these two particle stack files MRC_0601/095_particles_shiny_nb50_new.mrcs and MRC_0601/408_particles_shiny_nb50_new.mrcs are invalid. This is a simple fix: open up your converted .star file and remove records associated with these two files, rerun CryoAI and it should train without issues.

Nevertheless, I'm also getting a poor reconstruction as op's:

Reconstruction for empiar-10028 after 50 epochs (82000 steps):
Volume

Losses over 50 epochs:
loss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants