Skip to content

Latest commit

 

History

History
379 lines (257 loc) · 15.2 KB

CONTRIBUTING.rst

File metadata and controls

379 lines (257 loc) · 15.2 KB

Contributing

Welcome and thank you for considering a contribution to xOmics! We are an open-source project focusing on interpretable protein prediction. Your involvement is invaluable to us. Contributions can be made in the following ways:

  • Filing bug reports or feature suggestions on our GitHub issue tracker.
  • Submitting improvements via Pull Requests.
  • Participating in project discussions.

Newcomers can start by tackling issues labeled good first issue. Please email stephanbreimann@gmail.com for further questions or suggestions?

Objectives

  • Establish a toolkit for explainable omics analysis, focusing on protein/gene-centric analysis.
  • Offer flexible interoperability with other omics analysis software such as MaxQuant or gProfiler.

Non-goals

  • Reimplementation of existing solutions.
  • Ignoring the biological context.
  • Cherry-picking of biological hits.

Principles

  • Algorithms should be biologically inspired and combine empirical insights with cutting-edge computational methods.
  • We're committed to offering diverse evaluation metrics and interpretable visualizations, aiming to extend to other aspects of interpretable data analysis and explainable AI such as causal inference.

For effective bug reports, please include a Minimal Reproducible Example (MRE):

  • Minimal: Include the least amount of code to demonstrate the issue.
  • Self-contained: Ensure all necessary data and imports are included.
  • Reproducible: Confirm the example reliably replicates the issue.

Further guidelines can be found here.

Latest Version

To install the latest development version using pip, execute the following:

pip install git+https://github.com/breimanntools/xomics.git@master

Local Development Environment

Fork and Clone the Repository

  1. Fork the repository
  2. Clone your fork:
git clone https://github.com/YOUR_USERNAME/xomics.git

Install Dependencies

Navigate to the project folder and set up the Python environment.

  1. Navigate to project folder:
cd xomics

2a. Using conda for Environment Setup

Create and activate a new conda environment named 'venv', using Python 3.9:

conda create -n venv python=3.9
conda activate venv

2b. Using venv for Environment Setup

Alternatively, create and activate a virtual environment within the project folder using venv:

python -m venv venv
source venv/bin/activate  # Use `venv\Scripts\activate` on Windows

3a. Installing Dependencies with poetry

Install dependencies as defined in 'pyproject.toml' using poetry:

poetry install

3b. Installing Dependencies with pip

Alternatively, use pip to install dependencies from 'requirements.txt' and additional development requirements:

pip install -r requirements.txt
pip install -r docs/source/requirements_docs.txt

General Notes

  • Additional Requirement: Some non-Python utilities might to be need installed separately, such as Pandoc.
  • Manage Dependencies: Ensure dependencies are updated as specified in 'pyproject.toml' or 'requirements.txt' after pulling updates from the repository.

Run Unit Tests

We utilize pytest and hypothesis.

pytest

This will execute all the test cases in the tests/ directory. Check out our README on testing. See further useful commands in our Project Cheat Sheet.

For substantial changes, start by opening an issue for discussion. For minor changes like typos, submit a pull request directly.

Ensure your pull request:

  • Is focused and concise.
  • Has a descriptive and clear branch name like fix/data-loading-issue or doc/update-readme.
  • Is up-to-date with the master branch and passes all tests.

Preview Changes

To preview documentation changes in pull requests, follow the "docs/readthedocs.org" check link under "All checks have passed".

Documentation is a crucial part of the project. If you make any modifications to the documentation, please ensure they render correctly.

Naming Conventions

We strive for consistency of our public interfaces with well-established libraries like scikit-learn, pandas, matplotlib, and seaborn.

Class Templates

We primarily use one class templates for organizing our codebase:

  • Tool: Standalone classes that focus on specialized tasks, such as feature engineering for protein prediction. They feature .run and .eval methods to carry out the complete processing pipeline and generate various evaluation metrics.

The remaining classes should fulfill two further purposes, without being directly implemented using class inheritance.

  • Data visualization: Supplementary plotting classes for Tool classes. These classes implement an .eval method to visualize the key evaluation measures.
  • Analysis support: Supportive pre-processing classes for Tool classes.

Function and Method Naming

We semi-strictly adhere to the naming conventions established by the aforementioned libraries. Functions/Methods processing data values should correspond with the names specified in our primary pd.DataFrame columns, as defined in xomics/_utils/_utils_constants.py.

Code Philosophy

We aim for a modular, robust, and easily extendable codebase. Therefore, we adhere to using flat class hierarchies (i.e., only inheriting from Tool is recommended and using classes as container for data and functionality) and functional programming principles, as outlined in A Philosophy of Software Design. Our goal is to provide a user-friendly public interface using concise description and Python type hints (see also this Python Enhancement Proposal PEP 484 or the Robust Python book). For the validation of user inputs, we use comprehensive checking functions with descriptive error messages.

Documentation Style

Documentation Layers

This project's documentation is organized across four distinct layers, each with a specific focus and level of detail:

  • Docstrings: Concise code description, with minimal usage examples and references to other layers (in 'See also').
  • Usage Principles: Bird's-eye view with background and key principles, reflecting by selected code examples.
  • Tutorial: Close-up on public interface, as step-by-step guide on essential usage with medium detail.
  • Tables: Close-up on data or other tabular overviews, with detailed explanation of columns and critical values.

See our reference order here (exceptions confirm the rules):

/docs/source/_artwork/diagrams/ref_order.png

The API showcases Docstrings for our public objects and functions. Within these docstrings, scientific References may be mentioned in their extended sections. For additional links in docstrings, use the See Also section in this order: Usage Principles, Tables, Tutorials. Only include External library references when absolutely necessary. Note that the Usage Principles documentation is open for direct linking to References, Tutorials, and Tables, which can as well include links to References.

Building the Docs

To generate the documentation locally:

  • Go to the docs directory.
  • Run make html.
cd docs
make html
  • Open _build/html/index.html in a browser.

To optimize testing, use ChatGPT with the template below and fill in the blank spaces between START OF CODE and END OF CODE. Examples of testing templates can be found here: our.

"
Generate test functions for a given TARGET FUNCTION using the style of the provided TESTING TEMPLATE. Please take your time to ensure thoroughness and accuracy.

Inputs:
TARGET FUNCTION:
- START OF CODE
-----------------------------------------
[your code here]
-----------------------------------------
- END OF CODE

TESTING TEMPLATE:
- START OF CODE
-----------------------------------------
[your code]
-----------------------------------------
- END OF CODE

**Key Directive**: For the Normal Cases Test Class, EACH function MUST test ONLY ONE individual parameter of the TARGET FUNCTION using Hypothesis for property-based testing. This is crucial.

Requirements:

1. Normal Cases Test Class:
- Name: 'Test[TARGET FUNCTION NAME]'.
- Objective: Test EACH parameter *INDIVIDUALLY*.
- Tests: Test EACH parameter, at least 10 positive and 10 negative tests for this class.

2. Complex Cases Test Class:
- Name: 'Test[TARGET FUNCTION NAME]Complex'.
- Objective: Test combinations of the TARGET FUNCTION parameters.
- Tests: At least 5 positive and 5 negative that intricately challenge the TARGET FUNCTION.

3. General Guidelines:
- Use Hypothesis for property-based testing, but test parameters individually for the Normal Cases Test Class .
- Tests should be clear, concise, and non-redundant.
- Code must be complete, without placeholders like 'TODO', 'Fill this', or 'Add ...'.
- Explain potential issues in the TARGET FUNCTION.

Output Expectations:
- Two test classes: one for normal cases (individual parameters) and one for complex cases (combinations).
- In Normal Cases, one function = one parameter tested.
- Aim for at least 30 unique tests, totaling 150+ lines of code.

Reminder: In Normal Cases, it's crucial to test parameters individually. Take your time and carefully create the Python code for all cases!
"

ChatGPT has a token limit, which may truncate responses. To continue, simply ask 'continue processing' or something similar. Repeat as necessary and compile the results.

We recommend the following workflow:

  1. Repeat the prompt in new ChatGPT sessions until most of the positive test cases are covered.
  2. Adjust the testing script manually such that all positive tests are passed.
  3. Continue in the same session, sharing the revised script, and request the creation of negative tests.
  4. Finally, provide the complete testing script, including positive and negative cases, and request the development of complex test cases

Test Guided Development (TGD)

Leverage ChatGPT to generate testing scripts and refine your code's functionality and its interface. If ChatGPT struggles or produces erroneous tests, it often indicates ambiguities or complexities in your function's logic, variable naming, or documentation gaps, especially regarding edge cases. Address these insights to ensure intuitive and robust code design through the TGD approach.

Essential Strategies for Effective TGD:

  • Isolated Functionality Testing: Test one function or method at a time, adhering to unit testing principles. Provide an entire and well-documented function. The better the docstring, the more comprehensive our automatically generated tests will be.
  • Isolated Test Sessions: Start each test scenario in a new ChatGPT session to maintain clarity and prevent context overlap, ensuring focused and relevant test generation.
  • Consistent Template Usage: Align your test creation with existing templates for similar functionalities, utilizing them as a structured guide to maintain consistency in your test design.
  • Initial Test Baseline: Aim for an initial set of tests where about 25% pass, providing a foundational baseline that identifies primary areas for iterative improvement in both tests and code.
  • Iterative Refinement and Simplification: Use ChatGPT-generated tests to iteratively refine your code, especially if repeated test failures indicate areas needing clarification or simplification in your function's design.

Through an iterative TGD process, you can systematically uncover and address any subtleties or complexities in your code, paving the way for a more robust and user-friendly application.