Welcome and thank you for considering a contribution to xOmics! We are an open-source project focusing on interpretable protein prediction. Your involvement is invaluable to us. Contributions can be made in the following ways:
- Filing bug reports or feature suggestions on our GitHub issue tracker.
- Submitting improvements via Pull Requests.
- Participating in project discussions.
Newcomers can start by tackling issues labeled good first issue. Please email stephanbreimann@gmail.com for further questions or suggestions?
- Establish a toolkit for explainable omics analysis, focusing on protein/gene-centric analysis.
- Offer flexible interoperability with other omics analysis software such as MaxQuant or gProfiler.
- Reimplementation of existing solutions.
- Ignoring the biological context.
- Cherry-picking of biological hits.
- Algorithms should be biologically inspired and combine empirical insights with cutting-edge computational methods.
- We're committed to offering diverse evaluation metrics and interpretable visualizations, aiming to extend to other aspects of interpretable data analysis and explainable AI such as causal inference.
For effective bug reports, please include a Minimal Reproducible Example (MRE):
- Minimal: Include the least amount of code to demonstrate the issue.
- Self-contained: Ensure all necessary data and imports are included.
- Reproducible: Confirm the example reliably replicates the issue.
Further guidelines can be found here.
To install the latest development version using pip, execute the following:
pip install git+https://github.com/breimanntools/xomics.git@master
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/xomics.git
Navigate to the project folder and set up the Python environment.
- Navigate to project folder:
cd xomics
2a. Using conda for Environment Setup
Create and activate a new conda environment named 'venv', using Python 3.9:
conda create -n venv python=3.9
conda activate venv
2b. Using venv for Environment Setup
Alternatively, create and activate a virtual environment within the project folder using venv:
python -m venv venv
source venv/bin/activate # Use `venv\Scripts\activate` on Windows
3a. Installing Dependencies with poetry
Install dependencies as defined in 'pyproject.toml' using poetry:
poetry install
3b. Installing Dependencies with pip
Alternatively, use pip to install dependencies from 'requirements.txt' and additional development requirements:
pip install -r requirements.txt
pip install -r docs/source/requirements_docs.txt
General Notes
- Additional Requirement: Some non-Python utilities might to be need installed separately, such as Pandoc.
- Manage Dependencies: Ensure dependencies are updated as specified in 'pyproject.toml' or 'requirements.txt' after pulling updates from the repository.
We utilize pytest and hypothesis.
pytest
This will execute all the test cases in the tests/ directory. Check out our README on testing. See further useful commands in our Project Cheat Sheet.
For substantial changes, start by opening an issue for discussion. For minor changes like typos, submit a pull request directly.
Ensure your pull request:
- Is focused and concise.
- Has a descriptive and clear branch name like
fix/data-loading-issue
ordoc/update-readme
. - Is up-to-date with the master branch and passes all tests.
To preview documentation changes in pull requests, follow the "docs/readthedocs.org" check link under "All checks have passed".
Documentation is a crucial part of the project. If you make any modifications to the documentation, please ensure they render correctly.
We strive for consistency of our public interfaces with well-established libraries like scikit-learn, pandas, matplotlib, and seaborn.
We primarily use one class templates for organizing our codebase:
- Tool: Standalone classes that focus on specialized tasks, such as feature engineering for protein prediction. They feature .run and .eval methods to carry out the complete processing pipeline and generate various evaluation metrics.
The remaining classes should fulfill two further purposes, without being directly implemented using class inheritance.
- Data visualization: Supplementary plotting classes for Tool classes. These classes implement an .eval method to visualize the key evaluation measures.
- Analysis support: Supportive pre-processing classes for Tool classes.
We semi-strictly adhere to the naming conventions established by the aforementioned libraries. Functions/Methods processing data values should correspond with the names specified in our primary pd.DataFrame columns, as defined in xomics/_utils/_utils_constants.py.
We aim for a modular, robust, and easily extendable codebase. Therefore, we adhere to using flat class hierarchies (i.e., only inheriting from Tool is recommended and using classes as container for data and functionality) and functional programming principles, as outlined in A Philosophy of Software Design. Our goal is to provide a user-friendly public interface using concise description and Python type hints (see also this Python Enhancement Proposal PEP 484 or the Robust Python book). For the validation of user inputs, we use comprehensive checking functions with descriptive error messages.
- Docstring Style: We use the Numpy Docstring style and adhere to the PEP 257 docstring conventions.
- Code Style: Please follow the PEP 8 and PEP 20 style guides for Python code.
- Markup Language: Documentation is in reStructuredText (.rst). See for an introduction ( reStructuredText Primer) and for cheat sheets (reStructureText Cheatsheet or Sphinx Tutorial).
- Autodoc: We use Sphinx for automatic inclusion of docstrings in the documentation, including its autodoc, napoleon, and sphinx-design extensions.
- Further Details: See our conf.py for more.
This project's documentation is organized across four distinct layers, each with a specific focus and level of detail:
- Docstrings: Concise code description, with minimal usage examples and references to other layers (in 'See also').
- Usage Principles: Bird's-eye view with background and key principles, reflecting by selected code examples.
- Tutorial: Close-up on public interface, as step-by-step guide on essential usage with medium detail.
- Tables: Close-up on data or other tabular overviews, with detailed explanation of columns and critical values.
See our reference order here (exceptions confirm the rules):
The API showcases Docstrings for our public objects and functions. Within these docstrings, scientific References may be mentioned in their extended sections. For additional links in docstrings, use the See Also section in this order: Usage Principles, Tables, Tutorials. Only include External library references when absolutely necessary. Note that the Usage Principles documentation is open for direct linking to References, Tutorials, and Tables, which can as well include links to References.
To generate the documentation locally:
- Go to the docs directory.
- Run make html.
cd docs
make html
- Open _build/html/index.html in a browser.
To optimize testing, use ChatGPT with the template below and fill in the blank spaces between START OF CODE
and END OF CODE
. Examples of testing templates can be found here:
our.
"
Generate test functions for a given TARGET FUNCTION using the style of the provided TESTING TEMPLATE. Please take your time to ensure thoroughness and accuracy.
Inputs:
TARGET FUNCTION:
- START OF CODE
-----------------------------------------
[your code here]
-----------------------------------------
- END OF CODE
TESTING TEMPLATE:
- START OF CODE
-----------------------------------------
[your code]
-----------------------------------------
- END OF CODE
**Key Directive**: For the Normal Cases Test Class, EACH function MUST test ONLY ONE individual parameter of the TARGET FUNCTION using Hypothesis for property-based testing. This is crucial.
Requirements:
1. Normal Cases Test Class:
- Name: 'Test[TARGET FUNCTION NAME]'.
- Objective: Test EACH parameter *INDIVIDUALLY*.
- Tests: Test EACH parameter, at least 10 positive and 10 negative tests for this class.
2. Complex Cases Test Class:
- Name: 'Test[TARGET FUNCTION NAME]Complex'.
- Objective: Test combinations of the TARGET FUNCTION parameters.
- Tests: At least 5 positive and 5 negative that intricately challenge the TARGET FUNCTION.
3. General Guidelines:
- Use Hypothesis for property-based testing, but test parameters individually for the Normal Cases Test Class .
- Tests should be clear, concise, and non-redundant.
- Code must be complete, without placeholders like 'TODO', 'Fill this', or 'Add ...'.
- Explain potential issues in the TARGET FUNCTION.
Output Expectations:
- Two test classes: one for normal cases (individual parameters) and one for complex cases (combinations).
- In Normal Cases, one function = one parameter tested.
- Aim for at least 30 unique tests, totaling 150+ lines of code.
Reminder: In Normal Cases, it's crucial to test parameters individually. Take your time and carefully create the Python code for all cases!
"
ChatGPT has a token limit, which may truncate responses. To continue, simply ask 'continue processing' or something similar. Repeat as necessary and compile the results.
We recommend the following workflow:
- Repeat the prompt in new ChatGPT sessions until most of the positive test cases are covered.
- Adjust the testing script manually such that all positive tests are passed.
- Continue in the same session, sharing the revised script, and request the creation of negative tests.
- Finally, provide the complete testing script, including positive and negative cases, and request the development of complex test cases
Leverage ChatGPT to generate testing scripts and refine your code's functionality and its interface. If ChatGPT struggles or produces erroneous tests, it often indicates ambiguities or complexities in your function's logic, variable naming, or documentation gaps, especially regarding edge cases. Address these insights to ensure intuitive and robust code design through the TGD approach.
Essential Strategies for Effective TGD:
- Isolated Functionality Testing: Test one function or method at a time, adhering to unit testing principles. Provide an entire and well-documented function. The better the docstring, the more comprehensive our automatically generated tests will be.
- Isolated Test Sessions: Start each test scenario in a new ChatGPT session to maintain clarity and prevent context overlap, ensuring focused and relevant test generation.
- Consistent Template Usage: Align your test creation with existing templates for similar functionalities, utilizing them as a structured guide to maintain consistency in your test design.
- Initial Test Baseline: Aim for an initial set of tests where about 25% pass, providing a foundational baseline that identifies primary areas for iterative improvement in both tests and code.
- Iterative Refinement and Simplification: Use ChatGPT-generated tests to iteratively refine your code, especially if repeated test failures indicate areas needing clarification or simplification in your function's design.
Through an iterative TGD process, you can systematically uncover and address any subtleties or complexities in your code, paving the way for a more robust and user-friendly application.