Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define an API surface between Spezi Data Pipeline and Python notebook #10

Merged
merged 61 commits into from
Apr 22, 2024

Conversation

Vicbi
Copy link
Collaborator

@Vicbi Vicbi commented Mar 22, 2024

Define an API surface between Spezi Data Pipeline and Python notebook

♻️ Current situation & Problem

This PR addresses issue #7. The current helper.py file lacks structure and does not offer a clear API surface.

⚙️ Release Notes

  • Introduced the basis for a structured API surface for the Spezi Data Pipeline to facilitate easier integration with Python notebooks.
  • Utilized the fhir.resources package for describing FHIR resources.
  • Implemented Firebase QUERY for querying specific LOINC codes directly from Firebase, in order to avoid unnecessary data download and ptimize data retrieval.
  • Added a process_FHIR_data() function for basic filtering and analysis of acquired data, streamlining data processing tasks for users' convenience.

📚 Documentation

API Surface Structure

spezi_data_pipeline/
│
├── spezi_data_pipeline/
│   ├── __init__.py                          
│   ├── data_access/
│   │   ├── __init__.py
│   │   ├── firebase_FHIR_data_access.py
│   │       ├── class EnhancedObservation
│   │       |   ├── __init__(self, observation: Observation, UserId=None)
│   │       ├── class FirebaseFHIRAccess
│   │           ├── __init__(self, service_account_key_file: str, project_id: str) -> None
│   │           ├── connect(self) -> None
│   │           └── fetch_data(
        self, 
        collection_name: str = 'users', 
        subcollection_name: str = 'HealthKit', 
        loinc_codes: Optional[List[str]] = None
    ) -> List[EnhancedObservation]
│   │
│   ├── data_flattening/                  
│   │   ├── __init__.py
│   │   └── FHIR_data_flattener.py          
│   │       ├── class FHIRDataFrame
│   │       |   ├── __init__(self, data: pd.DataFrame, resource_type: str = "Observation") -> None
│   │       |   ├── df(self) -> pd.DataFrame
│   │       └── flatten_FHIR_resources(FHIR_resources: List[EnhancedObservation]) -> FHIRDataFrame
│   │
│   ├── data_analysis/
│   │   ├── __init__.py
│   │   ├── data_analyzer.py                    
│   │   |   └── class FHIRDataProcessor
│   │   |       ├── __init__(self)
│   │   |       ├── process_FHIR_data(self, flattened_FHIRDataFrame: FHIRDataFrame) -> FHIRDataFrame
│   │   |       ├── calculate_daily_data(self, group_FHIRDataFrame: FHIRDataFrame) -> FHIRDataFrame
│   │   |       ├── calculate_average_data(self, group_FHIRDataFrame: FHIRDataFrame) -> FHIRDataFrame
|   |   |       ├── def _finalize_group(self, original_df: pd.DataFrame, aggregated_df: pd.DataFrame, prefix: str) -> pd.DataFrame
|   |   |       ├── filter_outliers(self, flattened_FHIRDataFrame: FHIRDataFrame, value_range=None) -> FHIRDataFrame
|   |   |       ├── validate_columns(self, flattened_FHIRDataFrame: FHIRDataFrame) -> None
|   |   |       ├── select_data_by_user(self, flattened_FHIRDataFrame: FHIRDataFrame, user_id: str) -> FHIRDataFrame
|   |   |       ├── select_data_by_dates(self, flattened_FHIRDataFrame: FHIRDataFrame, start_date: str, end_date: str) -> FHIRDataFrame
│   │   |       └── calculate_moving_average(self, flattened_FHIRDataFrame: FHIRDataFrame, n=7) -> FHIRDataFrame
│   │
│   ├── data_visualization/
│   │   ├── __init__.py
│   │   ├── data_visualizer.py              
│   │   │   └── class DataVisualizer(FHIRDataProcessor)
│   │   |       ├── __init__(self)
│   │   |       ├── set_date_range(self, start_date: str, end_date: str)
│   │   |       ├── set_user_ids(self, user_ids: List[str])
│   │   |       ├── set_y_bounds(self, y_lower: float, y_upper: float)
│   │   |       ├── set_same_plot(self, same_plot: bool)
│   │   |       ├── set_dpi(self, dpi: float)
│   │   │       └── create_static_plot(self, flattened_FHIRDataFrame: FHIRDataFrame) -> Optional[plt.Figure]
│   │
│   ├── data_export/
│   │   ├── __init__.py
│   │   ├── data_exporter.py
│   │   │   └── class DataExporter(DataVisualizer)
│   │   |       ├── __init__(self, flattened_FHIRDataFrame: FHIRDataFrame)
│   │   |       ├── export_to_csv(self, filename)
│   │   │       └── create_and_save_plot(self, filename)
│   │
│   └── utils/
│       ├── __init__.py
│       └── helpers.py
│           └── snake_case(s: str) -> str
|
├── tests/ (currently missing)
│   ├── __init__.py
│   ├── test_data_access/
│   ├── ...
│   └── test_export/
│
├── docs/ (currently missing)
│   ├── index.md
│   ├── setup.md
│   └── usage.md
│
├── examples/ (currently missing)
│   └── example_usage.py                                     
│
├── setup.py (currently missing)
├── README.md
├── LICENSE
└── .gitignore

📝 Code of Conduct & Contributing Guidelines

By submitting creating this pull request, you agree to follow our Code of Conduct and Contributing Guidelines:

Copy link
Member

@PSchmiedmayer PSchmiedmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vicbi Thank you for the PR and first version of the decomposed architecture; it is great to see some next steps and to see the architecture and elements that we discussed taking shape within the PR!

I started to dive into the code but came up with a lot of basic code style feedbacks so I decided to fist install a linter in a GitHub action and I added a proposed linting file to define a common code style to agree on.

I think this would be a great next step. Let me know what you think about this configuration. Feel free to adapt it and incorporate the elements that it provides as feedback into this PR so we have a well-formatted codebase to start with.

In addition to that, I have also provided some high-level feedback on some functions and elements; I can provide some more aspects once we have an agreement about the code style guidelines 🚀

.github/workflows/build-and-test.yml Outdated Show resolved Hide resolved
SpeziDataPipelineTemplate.ipynb Outdated Show resolved Hide resolved
SpeziDataPipelineTemplate.ipynb Outdated Show resolved Hide resolved
data_access/firebase_FHIR_data_access.py Outdated Show resolved Hide resolved
data_access/firebase_FHIR_data_access.py Outdated Show resolved Hide resolved
data_analysis/data_analyzer.py Outdated Show resolved Hide resolved
data_export/data_exporter.py Outdated Show resolved Hide resolved
data_flattening/FHIR_data_flattener.py Outdated Show resolved Hide resolved
data_flattening/FHIR_data_flattener.py Outdated Show resolved Hide resolved
data_visualization/data_visualizer.py Outdated Show resolved Hide resolved
@PSchmiedmayer PSchmiedmayer added the enhancement New feature or request label Apr 6, 2024
@Vicbi
Copy link
Collaborator Author

Vicbi commented Apr 11, 2024

Hi @PSchmiedmayer, I've pushed the code updates to the defineAPISurface branch. I chose to keep the ECGObservation class (potentially, it can be a wrapper for the Observation class). This approach simplifies method selection across modules based on data type, eliminating the need for conditions to exclude ECG LOINC code from the FHIRDataFrame object. Moving forward, we might consider renaming ECGObservation to cover all LOINC codes with attributes similar to ECG data.

Thank you for reviewing the changes. Please note that the README hasn't been updated to include these adjustments yet, but it will be shortly. However, the docstrings within the modules are up-to-date.

Feel free to reach out if you need any help!

Copy link
Member

@PSchmiedmayer PSchmiedmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the work here @Vicbi; happy to see this merged soon! 🚀

I took a look at the code and would have a few high-level suggestions

  • The ECG parsing is currently assuming a very rigid structure of the elements ordered in the correct way. We can't really guarantee the order in the JSON files and should therefore rather use the coding schemes attached to the elements to parse this.
  • Unit Testing CI Setup: We currently only execute the notebook for the CI setup but not any of the unit test as far as I can see. It would be good if they are executed as part of the GitHub Action CI setup.
  • Related to that: Test Coverage. Thank you for writing some first tests. I think it would be important to automatically asses what lines are already covered and where we are missing tests to ensure that the code is working as expected. We currently use CodeCov (https://about.codecov.io) for all our projects and I would suggest to use it here as well to generate automated coverage reports. They have some good documentation for python code.
  • Code Linting: The current lint rules still throw some errors that would be great to be addressed before we merge the PR.
  • As discussed, we will also need to adjust the README to reflect the latest changes.

Apart from this: I would suggest to merge the PR without too many smaller changes to ensure that we have a reworked state in the main branch and we can use subsequent smaller PRs and our meetings to discuss smaller changes to refine the API.

.reuse/dep5 Show resolved Hide resolved
SpeziDataPipelineTemplate.ipynb Show resolved Hide resolved
tests/test_data_access.py Outdated Show resolved Hide resolved
data_flattening/fhir_resources_flattener.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 22, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Copy link
Member

@PSchmiedmayer PSchmiedmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! 🎉

@Vicbi Vicbi merged commit c519aa9 into main Apr 22, 2024
9 checks passed
@Vicbi Vicbi deleted the defineAPISurface branch April 22, 2024 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants