Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: standardize and simplify curation of yeast-GEM #303

Closed
8 of 10 tasks
edkerk opened this issue Mar 15, 2022 · 10 comments
Closed
8 of 10 tasks

feat: standardize and simplify curation of yeast-GEM #303

edkerk opened this issue Mar 15, 2022 · 10 comments

Comments

@edkerk
Copy link
Member

edkerk commented Mar 15, 2022

Description of the issue:

Currently, code/ and data/ contain various scripts and datasets in a variety of formats, which have previously been used to curate yeast-GEM. However, this is quite heterogenous, so that it is not that straightforward to use these scripts for future model curations.

In addition, there are a few other issues that are a hurdle to contribute towards development of yeast-GEM. One of these is that the model in- and output requires both MATLAB and COBRA toolboxes, which unnecessary increases software dependencies. This also raises the risk of conflicts between model files.

These and similar issues should be addressed to make yeast-GEM more accessible, reproducible and easier to contribute to. This consists of the following steps:

  • Introduce a function (curateMetsRxnsGenes()) and a standardized table format (as *.tsv file) that can be reused for adding new metabolites/reactions/genes, or curation of existing metabolites/reactions/genes. While this does not cover all types of curations (it does not allow for deletion of model entities), it would simplify & standardize many of the curations. Note that this can also be used to e.g. add or correct MetaNetX identifiers for all reactions, or change the subSystem assignment of reactions. This function is introduced in add-rxn.prop: Hydrogen Sulfide Addition #300.
  • Reduce software dependencies for contributing to model development. As RAVEN is relatively lightweight, has essential functions and we have control over its codebase, we will only rely on RAVEN for generic functions (such as model in- and output, modifying biomass, etc.). Other software can be used unlimitedly by users, and specific curation scripts may still use e.g. COBRA if required, but the generic functions should only dependent on RAVEN. This is introduced in refactor: reduce dependencies and remove metabolite ID suffixes #301.
  • Related to the above, the documentation surrounding model curation should be updated. The README.md is modified in refactor: reduce dependencies and remove metabolite ID suffixes #301, but CONTRIBUTING.md should be overhauled, to reflect the above changes, and to give clear examples of how to implement this.
  • Ideally it would be nice to make scripts that can convert a model version to the next, similar as done in Sco-GEM. This scripts then calls for instance curateMetsRxnsGenes and refers to the relevant files with data, or could even directly make changes to the model.
  • The existing scripts and data files should be reorganized so that those that have generic use are readily available in the code and data folders, while files that have only been used once (to update one version to another) can be gathered in specific folders.
  • .... further ideas are welcome!

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the main branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue
@mihai-sysbio
Copy link
Member

Great to see this @edkerk.

  • Reduce software dependencies for contributing to model development.

Any thoughts on the /requirements folder?

@edkerk
Copy link
Member Author

edkerk commented Mar 15, 2022

That folder seems to refer strictly to Python/cobrapy, particularly useful for the GitHub Actions. But MATLAB-specific requirements are not part of that. The above points do not directly refer to GitHub Actions / CI, where the model is just loaded and its content tested. I tried to clarify this in the README.md in PR #301, but this can probably be improved?

Perhaps the point you raise should be rephrased as "Reduce software dependencies for contributing to MATLAB-based model development." So at least to reduce the complexity of the MATLAB-based pipeline. Contribution by using Python is also very welcome, but most of the generic functions are only available for MATLAB and I'm not sufficiently cobrapy-fluent to correct this.

@edkerk edkerk mentioned this issue May 12, 2022
3 tasks
@edkerk
Copy link
Member Author

edkerk commented May 26, 2022

This is now being implemented in PR #313, where multiple curations (incl. #305 and #306) are all documented in one script, that can change the current yeast-GEM release 8.6.0 to its next version.

@edkerk
Copy link
Member Author

edkerk commented May 28, 2022

A little road map:

Once releasing version 8.6.0, it probably becomes more clear how well this new approach will work in reality.

@mihai-sysbio
Copy link
Member

@edkerk coming back to the /requirements folder, one idea could be to move it under /code. To me, it belongs there more. If it were used just by GH Actions, it could then be moved to there the workflows are stored.

@edkerk
Copy link
Member Author

edkerk commented May 29, 2022

Sounds reasonable, but probably good to then move it to /code, as /code/io.py is not only for GH Actions.

@edkerk
Copy link
Member Author

edkerk commented Jun 16, 2022

requirements/ is now moved to code/requirements/ in dcf1cae

@mihai-sysbio
Copy link
Member

requirements/ is now moved to code/requirements/ in dcf1cae

very nice @edkerk. I think there are more place that should be updated, such as the Contributing guidelines.

@edkerk
Copy link
Member Author

edkerk commented Jun 17, 2022

I thought I found all references, but I missed Contributing guidelines.

@mihai-sysbio
Copy link
Member

With the recent deprecation of old files /code and /data in #345, perhaps this issue can be considered complete, and have any further ideas as new issues?

@edkerk edkerk closed this as completed Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants