Skip to content

Commit

Permalink
Clean compute params (#24)
Browse files Browse the repository at this point in the history
Clean compute params
  • Loading branch information
Aremaki authored Aug 9, 2023
1 parent a0c9685 commit 479a47e
Show file tree
Hide file tree
Showing 53 changed files with 822 additions and 832 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pip install edsteva
We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```
pip install edsteva==0.2.4
pip install edsteva==0.2.5
```
## Example

Expand Down
7 changes: 7 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# Changelog
## v0.2.5 - 10-08-2023

- New Probe parameters:
- Age range: Age of patient at visit
- Provenance source: Where the patient came from before the visit (emergency, consultation, etc.)
- stay source: Type of care (MCO, PSY, SSY)
- Refacto the params type to make it more uniform.
## v0.2.4 - 28-07-2023

- Viz: Simplify normalized probe plot
Expand Down
2 changes: 1 addition & 1 deletion docs/assets/charts/estimates_densities.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/assets/charts/fitted_visit.json

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions docs/assets/charts/interactive_fitted_visit.html

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions docs/assets/charts/interactive_visit.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/assets/charts/normalized_probe.json

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions docs/assets/charts/normalized_probe_dashboard.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/assets/charts/visit.json

Large diffs are not rendered by default.

20 changes: 13 additions & 7 deletions docs/components/probe.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,10 @@ It must include one and only one time related column:

- **`date`**: date of the event associated with the target variable (by default, the dates are truncated to the month in which the event occurs).

It must include the following string type column :
Then, it can have any other string type column such as:

- **`care_site_level`**: care site hierarchic level (`uf`, `pole`, ``hospital``).
- **`care_site_id`**: care site unique identifier.
- **`care_site_short_name`**: care site short name used for visualization.

Then, it can have any other string type column such as:

- **`stay_type`**: type of stay (``hospitalisés``, ``urgence``, ``hospitalisation incomplète``, ``consultation externe``).
- **`note_type`**: type of note (``CRH``, ``Ordonnance``, ``CR Passage Urgences``).

Expand Down Expand Up @@ -104,9 +100,19 @@ from edsteva.probes import BaseProbe

# Definition of a new Probe class
class CustomProbe(BaseProbe):
_index = ["my_custom_column_1", "my_custom_column_2"]
def __init__(
self,
):
self._index = ["my_custom_column_1", "my_custom_column_2"]
super().__init__(
index=self._index,
)

def compute_process(self, data: Data):
def compute_process(
self,
data: Data,
**kwargs,
):
# query using Pandas API
return custom_predictor
```
Expand Down
33 changes: 20 additions & 13 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ color:green Successfully installed edsteva
We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).

```
pip install edsteva==0.2.4
pip install edsteva==0.2.5
```
## Working example: administrative records relative to visits

Let's consider a basic category of data: administrative records relative to visits. Visits are characterized by a stay type (full hospitalisation, emergency, consultation, etc.). In this example, the objective is to estimate the availability of visits records with respect to time, care site and stay type.
Let's consider a basic category of data: administrative records relative to visits. A visit is characterized by a care site, a length of stay, a stay type (full hospitalisation, emergency, consultation, etc.) and other characteristics. In this example, the objective is to estimate the availability of visits records with respect to time, care site and stay type.

### 1. Load your [data][loading-data]

Expand Down Expand Up @@ -164,7 +164,7 @@ As detailled in [the dedicated section][loading-data], EDS-TeVa is expecting to
!!! info "Probe"
A [Probe][probe] is a python class designed to compute a completeness predictor $c(t)$ that characterizes data availability of a target variable over time $t$.

In this example, $c(t)$ predicts the availability of administrative records relative to visits. It is defined for each care site and stay type as the number of visits $n_{visit}(t)$ per month $t$, normalized by the maximum number of records per month $n_{max} = \max_{t}(n_{visit}(t))$ computed over the entire study period:
In this example, $c(t)$ predicts the availability of administrative records relative to visits. It is defined for each characteristic (care site, stay type, age range, length of stay, etc.) as the number of visits $n_{visit}(t)$ per month $t$, normalized by the maximum number of records per month $n_{max} = \max_{t}(n_{visit}(t))$ computed over the entire study period:

$$
c(t) = \frac{n_{visit}(t)}{n_{max}}
Expand All @@ -188,19 +188,29 @@ probe_path = "my_path/visit.pkl"
visit = VisitProbe()
visit.compute(
data,
care_site_levels=["Hospital", "Pole", "UF"], # (1)
stay_types={
"All": ".*",
"Urg_Hospit": "urgence|hospitalisés", # (1)
"Urg_Hospit": "urgence|hospitalisés", # (2)
},
care_site_levels=["Hospital", "Pole", "UF"], # (2)
care_site_specialties=None, # (3)
stay_sources=None, # (4)
length_of_stays=None, # (5)
provenance_sources=None, # (6)
age_ranges=None, # (7)
)
visit.save(path=probe_path) # (3)
visit.save(path=probe_path) # (8)
visit.predictor.head()
```

1. The stay_types argument expects a python dictionary with labels as keys and regex as values.
2. The care sites are articulated into levels (cf. [AP-HP's reference structure](https://doc-new.eds.aphp.fr/donnees_dispo/donnees_par_domaine/R%C3%A9f%C3%A9rentielsStructures)).
3. Saving the Probe after computation saves you from having to compute it again. You just use `VisitProbe.load(path=probe_path)`.
1. The care sites are articulated into levels (cf. [AP-HP's reference structure](https://bigdata-pages.eds.aphp.fr/omop/dev/clinical_data/care_site/#schema-de-la-hierarchie-des-structures)). Here, as an example, we are only interested in those three levels.
2. The ``stay_types`` argument expects a python dictionary with labels as keys and regex as values.
3. In this example we want to ignore the care site specialty (e.g., Cardiology, Pediatrics).
4. In this example we want to ignore the stay source (e.g., MCO, SSR, PSY).
5. In this example we want to ignore the length of stay (e.g., $>=$ 7 days, $<=$ 2 days).
6. In this example we want to ignore the provenance source (e.g., service d'urgence, d'une unité de soins de courte durée).
7. In this example we want to ignore the age range (e.g., 0-18 years, 18-25 years, 25-30 years).
8. Saving the Probe after computation saves you from having to compute it again. You just use `VisitProbe.load(path=probe_path)`.

``Saved to /my_path/visit.pkl``

Expand All @@ -214,7 +224,7 @@ visit.predictor.head()

#### 2.2 Filter your Probe

In this example, we consider the poles of three hospitals. We consequently filter data before any further analysis.
In this example, we are interested in three hospitals. We consequently filter data before any further analysis.

```python
from edsteva.probes import VisitProbe
Expand All @@ -238,7 +248,6 @@ from edsteva.viz.dashboards import probe_dashboard

probe_dashboard(
probe=filtered_visit,
care_site_level="Pole",
)
```
Interactive dashboard is available [here](assets/charts/interactive_visit.html)
Expand Down Expand Up @@ -331,7 +340,6 @@ from edsteva.viz.dashboards import probe_dashboard
probe_dashboard(
probe=filtered_visit,
fitted_model=step_function_model,
care_site_level="Pole",
)
```
Interactive dashboard is available [here](assets/charts/interactive_fitted_visit.html).
Expand Down Expand Up @@ -391,7 +399,6 @@ from edsteva.viz.dashboards import estimates_dashboard
estimates_dashboard(
probe=filtered_visit,
fitted_model=step_function_model,
care_site_level="Pole",
)
```

Expand Down
2 changes: 1 addition & 1 deletion edsteva/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.2.4"
__version__ = "0.2.5"


import importlib
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ def loss_minimization(
$c(t)$ computed in the Probe.
index : List[str]
Variable from which data is grouped.
**EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
x_col : str, optional
Column name for the time variable $t$.
Expand Down
13 changes: 9 additions & 4 deletions edsteva/models/rectangle_function/rectangle_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,20 @@ class RectangleFunction(BaseModel):
Attributes
----------
_algo: List[str]
Algorithm used to compute the estimates
Algorithm used to compute the estimates.
**VALUE**: ``"loss_minimization"``
_coefs: List[str]
Model coefficients
Model coefficients.
**VALUE**: ``["t_0", "c_0", "t_1"]``
_default_metrics: List[str]
Metrics to used by default
Metrics to used by default.
**VALUE**: ``[error_between_t0_t1]``
_viz_config: List[str]
Dictionary of configuration for visualization purpose.
**VALUE**: ``{}``
Example
Expand Down Expand Up @@ -87,7 +91,8 @@ def fit_process(
predictor : pd.DataFrame
Target variable to be fitted
index : List[str], optional
Variable from which data is grouped
Variable from which data is grouped.
**EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
"""
return algos.get(self._algo)(predictor=predictor, index=index, **kwargs)
Expand Down
13 changes: 9 additions & 4 deletions edsteva/models/step_function/step_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,20 @@ class StepFunction(BaseModel):
Attributes
----------
_algo: List[str]
Algorithm used to compute the estimates
Algorithm used to compute the estimates.
**VALUE**: ``"loss_minimization"``
_coefs: List[str]
Model coefficients
Model coefficients.
**VALUE**: ``["t_0", "c_0"]``
_default_metrics: List[str]
Metrics to used by default
Metrics to used by default.
**VALUE**: ``[error_after_t0]``
_viz_config: List[str]
Dictionary of configuration for visualization purpose.
**VALUE**: ``{}``
Example
Expand Down Expand Up @@ -86,7 +90,8 @@ def fit_process(
predictor : pd.DataFrame
Target variable to be fitted
index : List[str], optional
Variable from which data is grouped
Variable from which data is grouped.
**EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
"""

Expand Down
23 changes: 4 additions & 19 deletions edsteva/probes/base/base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import datetime
from abc import ABCMeta, abstractmethod
from typing import ClassVar, Dict, List, Union
from typing import ClassVar, List, Union

import altair as alt
import pandas as pd
Expand All @@ -21,9 +21,9 @@ class BaseProbe(metaclass=ABCMeta):
Attributes
----------
_schema: List[str]
The columns a predictor must have
The columns a predictor must have.
**VALUE**: ``["care_site_id", "care_site_level", "stay_type", "date", "c"]``
**VALUE**: ``["date", "c"]``
predictor: pd.DataFrame
Available with the [``compute()``][edsteva.probes.base.BaseProbe.compute] method
_cache_predictor: pd.DataFrame
Expand All @@ -36,7 +36,7 @@ class BaseProbe(metaclass=ABCMeta):
It describes the care site structure (cf. [``prepare_care_site_relationship()``][edsteva.probes.utils.prepare_df.prepare_care_site_relationship])
"""

_schema: ClassVar[List[str]] = ["care_site_level", "care_site_id", "date", "c"]
_schema: ClassVar[List[str]] = ["date", "c"]

def __init__(
self,
Expand Down Expand Up @@ -127,9 +127,6 @@ def compute_process(
care_site_relationship: pd.DataFrame,
start_date: datetime,
end_date: datetime,
care_site_levels: List[str],
stay_types: Union[str, Dict[str, str]],
care_site_ids: List[int],
**kwargs,
) -> pd.DataFrame:
"""Process the data in order to obtain a predictor table"""
Expand All @@ -139,9 +136,6 @@ def compute(
data: Data,
start_date: datetime = None,
end_date: datetime = None,
care_site_levels: List[str] = None,
stay_types: Union[str, Dict[str, str]] = None,
care_site_ids: List[int] = None,
with_cache: bool = True,
**kwargs,
) -> None:
Expand All @@ -165,12 +159,6 @@ def compute(
**EXAMPLE**: `"2019-05-01"`
end_date : datetime, optional
**EXAMPLE**: `"2021-07-01"`
care_site_levels : List[str], optional
**EXAMPLE**: `["Hospital", "Pole", "UF"]`
stay_types : Union[str, Dict[str, str]], optional
**EXAMPLE**: `{"All": ".*"}` or `{"All": ".*", "Urg_and_consult": "urgences|consultation"}` or `"hospitalisés`
care_site_ids : List[int], optional
**EXAMPLE**: `[8312056386, 8312027648]`
Attributes
----------
Expand Down Expand Up @@ -215,9 +203,6 @@ def compute(
care_site_relationship=care_site_relationship,
start_date=start_date,
end_date=end_date,
care_site_levels=care_site_levels,
stay_types=stay_types,
care_site_ids=care_site_ids,
**kwargs,
)
self.is_computed_probe()
Expand Down
Loading

0 comments on commit 479a47e

Please sign in to comment.