Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example configs #166

Merged
merged 2 commits into from
May 2, 2024
Merged

Add example configs #166

merged 2 commits into from
May 2, 2024

Conversation

ordabayevy
Copy link
Contributor

Resolves #114

@ordabayevy ordabayevy force-pushed the example-configs branch 5 times, most recently from 55f0969 to e91fe62 Compare April 13, 2024 20:27

### data

Configure the `DistributedAnnDataCollection`. Here we validate `obs` columns that are used by the transforms and the model (`total_mrna_umis`):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say a word about what it means to "validate` an obs column.

> attr: model.var_names_g
> convert_fn: numpy.ndarray.tolist
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note below that cellarium-ml does not perform any validation on the content of data loaded from the checkpoints, or whether it is consistent with the rest of the configuration. For example, the mean and std were calculate in a prior stop from data that was subject to normalize total and log1p transform. If the user inadvertently forgets to perform the same transforms here before z-scoring, the workflow will run w/o any error though will produce wrong results.

> n_components: 50
> perform_mean_correction: true
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, add a sticky note that since we have z-scored the data, mean correction is not strictly necessary but its presence may help mitigate roundoff errors. It is a good exercise for the user to ascertain that the learned mean by IPCA is close to 0/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I setperform_mean_correction: true by mistake. It should be false in this case because of the z-score!


### train

Change the number of devices, change strategy to `ddp_find_unused_parameters_true` (because trained PCA model contains parameters that are fixed during training), set the number of epochs, and set the path for logs and weights:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps give a reference to PL doc related to this?

Copy link
Member

@mbabadi mbabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, just a few small suggestions.

@ordabayevy ordabayevy merged commit 35183f0 into main May 2, 2024
6 checks passed
@ordabayevy ordabayevy deleted the example-configs branch May 2, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add example config files
2 participants