Skip to content

Conversation

@marcbal77
Copy link
Member

@marcbal77 marcbal77 commented Aug 5, 2025

Description

This PR introduces a minimal framework for integrating third-party API models into biolearn, with Hurdle Bio's Inflammage model as the first implementation.

Key Changes

  • Add class to for API-based predictions
  • Integrate into for consistent usage
  • Implement privacy-first approach with explicit user consent before sending data
  • Add automatic imputation for missing CpG sites (uses 0.5 for missing values)
  • Include comprehensive error handling with user-friendly messages

Security & Privacy Features

  • API keys managed via environment variables () or direct input
  • Explicit consent required before sending data to external servers
  • Updated to exclude credential files
  • Clear warnings about third-party data sharing

Documentation

  • User guide in with setup instructions
  • Working example in
  • Placeholder file for Hurdle CpG sites list

Testing

  • Comprehensive test suite in
  • Mock API calls to avoid requiring credentials in tests
  • Tests cover consent flow, error handling, and API integration

Usage Example

# Set API key
export HURDLE_API_KEY="your_key"

# Use the model
from biolearn.model_gallery import ModelGallery

gallery = ModelGallery()
model = gallery.get("HurdleInflammage")
predictions = model.predict(methylation_data)

@marcbal77 marcbal77 changed the title feat: Add support for third-party API models with Hurdle implementation feat: Add support for third-party API models Aug 5, 2025
@marcbal77 marcbal77 force-pushed the feature/third-party-api-models branch from 2a46602 to 4be828e Compare August 16, 2025 02:33
@marcbal77 marcbal77 force-pushed the feature/third-party-api-models branch from 0c2c5ef to ad8b326 Compare October 14, 2025 07:19
@marcbal77 marcbal77 self-assigned this Oct 17, 2025
@marcbal77 marcbal77 added the enhancement New feature or request label Oct 17, 2025
@marcbal77 marcbal77 force-pushed the feature/third-party-api-models branch from 947bf6d to c3d75fe Compare January 12, 2026 23:48
This commit introduces a minimal framework for integrating third-party API models
into biolearn, with Hurdle Bio's Inflammage model as the first implementation.

Key changes:
- Add HurdleAPIModel class to model.py for API-based predictions
- Integrate HurdleAPIModel into ModelGallery for consistent usage
- Implement privacy-first approach with explicit user consent
- Add automatic imputation for missing CpG sites
- Include comprehensive error handling with user-friendly messages

Security & Privacy:
- API keys managed via environment variables or direct input
- Explicit consent required before sending data to external servers
- Updated .gitignore to exclude credential files

Documentation:
- Add user guide in docs/hurdle_api_guide.md
- Include working example in examples/hurdle_api_example.py
- Placeholder for Hurdle CpG sites list

Testing:
- Add test suite for HurdleAPIModel functionality
- Mock API calls to avoid requiring credentials in tests

This implementation minimizes changes to the existing codebase while providing
a robust foundation for future API model integrations.
- Add actual CpG sites to Hurdle_CpGs.csv.example file (first 100 sites)
- Fix test_consent_denied decorator issue (missing mock_input parameter)
- Skip HurdleAPIModel in main test_model.py suite (requires API credentials)
- Update test data to include proper CpG site indices for Hurdle tests
- Fix file path to use Hurdle_CpGs.csv.example instead of .csv
- Format biolearn/test/test_hurdle_model.py
- Format biolearn/test/test_model.py
- Add real CpG sites to example file, improve error handling and validation
- Simplify consent mechanism, reduce test redundancy, add type hints
- Move documentation to doc/ folder, clean up examples
- Update HurdleAPIModel implementation to match latest codebase
- Fix test imports and references after rebase
- Maintain compatibility with existing model architecture
- Add self._consent_given flag to track consent state
- Only ask for consent if not already given
- Remove duplicate base_url initialization
- Fixes failing test test_consent_only_asked_once
- Format base_url assignment as multi-line for consistency
- Fixes CI formatting check failure
…ntation

- Rename HurdleInflammage to HurdleInflammAge
- Switch to production API (use_production=True)
- Update registration URL to https://dashboard.hurdle.bio/register
- Add non-commercial use disclaimer to documentation and docstring
- Rename Hurdle_CpGs.csv.example to Hurdle_CpGs.csv
- Document 0.5 imputation for missing CpG sites
- Update example script with better error handling
- Update all references in tests and documentation
- Model now errors if any required CpG sites are missing
- Provides informative error with count and examples of missing sites
- Directs users to use ModelGallery imputation methods
- Updated documentation to explain missing data handling
- Added test for missing CpG error
- All tests passing (143 passed, 4 skipped)
@marcbal77 marcbal77 force-pushed the feature/third-party-api-models branch from c3d75fe to 418a91b Compare January 12, 2026 23:58
@marcbal77
Copy link
Member Author

This PR is ready for review (and to be merged)

I rebased on latest master, all CI passing. Tested with production API key - smoke test successful (5 samples returned valid InflammAge predictions). There ended up being a couple fixes: corrected sex mapping (0=female per biolearn standard that we previously merged and I hadn't updated to prior), added methylation_sites() for imputation support, removed empty row in CpG file that somehow I missed prior. @sarudak

@marcbal77 marcbal77 requested a review from sarudak January 13, 2026 00:13
Copy link
Member

@sarudak sarudak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few minor concerns

# API credentials and sensitive files
*_api_key*
*_credentials*
HurdleTesting.ipynb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This probably doesn't need to be here. We already have an excluded folder for notebooks

def __init__(
self,
api_key: Optional[str] = None,
use_production: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't production be the default for users of the library?

# Biolearn standard: 0=female, 1=male (metadata_standard.rst)
sample_meta["sex"] = (
"f"
if sex_value in [0, "f", "F", "female"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit uncomfortable with this kind of logic. Perhaps we should only accept biolearn standard and if for some reason their data has values other than 0 and 1 we throw an exception. We could add a validate_sex function to the geodata. That way it won't proceed silently on an incorrect assumption about what the sex data is.

from biolearn.model_gallery import ModelGallery


class TestHurdleAPIModel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be nice to have at least one integration test that can be run if you have the key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the example kinda counts as an integration test. Required some effort to actually run though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants