-
Notifications
You must be signed in to change notification settings - Fork 34
feat: Add support for third-party API models #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Add support for third-party API models #164
Conversation
2a46602 to
4be828e
Compare
0c2c5ef to
ad8b326
Compare
947bf6d to
c3d75fe
Compare
This commit introduces a minimal framework for integrating third-party API models into biolearn, with Hurdle Bio's Inflammage model as the first implementation. Key changes: - Add HurdleAPIModel class to model.py for API-based predictions - Integrate HurdleAPIModel into ModelGallery for consistent usage - Implement privacy-first approach with explicit user consent - Add automatic imputation for missing CpG sites - Include comprehensive error handling with user-friendly messages Security & Privacy: - API keys managed via environment variables or direct input - Explicit consent required before sending data to external servers - Updated .gitignore to exclude credential files Documentation: - Add user guide in docs/hurdle_api_guide.md - Include working example in examples/hurdle_api_example.py - Placeholder for Hurdle CpG sites list Testing: - Add test suite for HurdleAPIModel functionality - Mock API calls to avoid requiring credentials in tests This implementation minimizes changes to the existing codebase while providing a robust foundation for future API model integrations.
- Add actual CpG sites to Hurdle_CpGs.csv.example file (first 100 sites) - Fix test_consent_denied decorator issue (missing mock_input parameter) - Skip HurdleAPIModel in main test_model.py suite (requires API credentials) - Update test data to include proper CpG site indices for Hurdle tests - Fix file path to use Hurdle_CpGs.csv.example instead of .csv
- Format biolearn/test/test_hurdle_model.py - Format biolearn/test/test_model.py
- Add real CpG sites to example file, improve error handling and validation - Simplify consent mechanism, reduce test redundancy, add type hints - Move documentation to doc/ folder, clean up examples
- Update HurdleAPIModel implementation to match latest codebase - Fix test imports and references after rebase - Maintain compatibility with existing model architecture
- Add self._consent_given flag to track consent state - Only ask for consent if not already given - Remove duplicate base_url initialization - Fixes failing test test_consent_only_asked_once
- Format base_url assignment as multi-line for consistency - Fixes CI formatting check failure
…ntation - Rename HurdleInflammage to HurdleInflammAge - Switch to production API (use_production=True) - Update registration URL to https://dashboard.hurdle.bio/register - Add non-commercial use disclaimer to documentation and docstring - Rename Hurdle_CpGs.csv.example to Hurdle_CpGs.csv - Document 0.5 imputation for missing CpG sites - Update example script with better error handling - Update all references in tests and documentation
- Model now errors if any required CpG sites are missing - Provides informative error with count and examples of missing sites - Directs users to use ModelGallery imputation methods - Updated documentation to explain missing data handling - Added test for missing CpG error - All tests passing (143 passed, 4 skipped)
c3d75fe to
418a91b
Compare
|
This PR is ready for review (and to be merged) I rebased on latest master, all CI passing. Tested with production API key - smoke test successful (5 samples returned valid InflammAge predictions). There ended up being a couple fixes: corrected sex mapping (0=female per biolearn standard that we previously merged and I hadn't updated to prior), added methylation_sites() for imputation support, removed empty row in CpG file that somehow I missed prior. @sarudak |
sarudak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few minor concerns
| # API credentials and sensitive files | ||
| *_api_key* | ||
| *_credentials* | ||
| HurdleTesting.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This probably doesn't need to be here. We already have an excluded folder for notebooks
| def __init__( | ||
| self, | ||
| api_key: Optional[str] = None, | ||
| use_production: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't production be the default for users of the library?
| # Biolearn standard: 0=female, 1=male (metadata_standard.rst) | ||
| sample_meta["sex"] = ( | ||
| "f" | ||
| if sex_value in [0, "f", "F", "female"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a bit uncomfortable with this kind of logic. Perhaps we should only accept biolearn standard and if for some reason their data has values other than 0 and 1 we throw an exception. We could add a validate_sex function to the geodata. That way it won't proceed silently on an incorrect assumption about what the sex data is.
| from biolearn.model_gallery import ModelGallery | ||
|
|
||
|
|
||
| class TestHurdleAPIModel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would be nice to have at least one integration test that can be run if you have the key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the example kinda counts as an integration test. Required some effort to actually run though
Description
This PR introduces a minimal framework for integrating third-party API models into biolearn, with Hurdle Bio's Inflammage model as the first implementation.
Key Changes
Security & Privacy Features
Documentation
Testing
Usage Example