Clean compute params (#24)

Clean compute params
aphp · Aug 9, 2023 · 479a47e · 479a47e
1 parent a0c9685
commit 479a47e
Show file tree

Hide file tree

Showing 53 changed files with 822 additions and 832 deletions.
diff --git a/README.md b/README.md
@@ -59,7 +59,7 @@ pip install edsteva
 We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```
-pip install edsteva==0.2.4
+pip install edsteva==0.2.5
 ```
 ## Example
 

diff --git a/changelog.md b/changelog.md
@@ -1,4 +1,11 @@
 # Changelog
+## v0.2.5 - 10-08-2023
+
+- New Probe parameters:
+  - Age range: Age of patient at visit
+  - Provenance source: Where the patient came from before the visit (emergency, consultation, etc.)
+  - stay source: Type of care (MCO, PSY, SSY)
+- Refacto the params type to make it more uniform.
 ## v0.2.4 - 28-07-2023
 
 - Viz: Simplify normalized probe plot

diff --git a/docs/assets/charts/estimates_densities.json b/docs/assets/charts/estimates_densities.json
diff --git a/docs/assets/charts/fitted_visit.json b/docs/assets/charts/fitted_visit.json
diff --git a/docs/assets/charts/interactive_fitted_visit.html b/docs/assets/charts/interactive_fitted_visit.html
diff --git a/docs/assets/charts/interactive_visit.html b/docs/assets/charts/interactive_visit.html
diff --git a/docs/assets/charts/normalized_probe.json b/docs/assets/charts/normalized_probe.json
diff --git a/docs/assets/charts/normalized_probe_dashboard.html b/docs/assets/charts/normalized_probe_dashboard.html
diff --git a/docs/assets/charts/visit.json b/docs/assets/charts/visit.json
diff --git a/docs/components/probe.md b/docs/components/probe.md
@@ -51,14 +51,10 @@ It must include one and only one time related column:
 
 - **`date`**: date of the event associated with the target variable (by default, the dates are truncated to the month in which the event occurs).
 
-It must include the following string type column :
+Then, it can have any other string type column such as:
 
 - **`care_site_level`**: care site hierarchic level (`uf`, `pole`, ``hospital``).
 - **`care_site_id`**: care site unique identifier.
-- **`care_site_short_name`**: care site short name used for visualization.
-
-Then, it can have any other string type column such as:
-
 - **`stay_type`**: type of stay (``hospitalisés``, ``urgence``, ``hospitalisation incomplète``, ``consultation externe``).
 - **`note_type`**: type of note (``CRH``, ``Ordonnance``, ``CR Passage Urgences``).
 
@@ -104,9 +100,19 @@ from edsteva.probes import BaseProbe
 
 # Definition of a new Probe class
 class CustomProbe(BaseProbe):
-    _index = ["my_custom_column_1", "my_custom_column_2"]
+    def __init__(
+        self,
+    ):
+        self._index = ["my_custom_column_1", "my_custom_column_2"]
+        super().__init__(
+            index=self._index,
+        )
 
-    def compute_process(self, data: Data):
+    def compute_process(
+        self,
+        data: Data,
+        **kwargs,
+    ):
         # query using Pandas API
         return custom_predictor
 ```

diff --git a/docs/index.md b/docs/index.md
@@ -101,11 +101,11 @@ color:green Successfully installed edsteva
 We recommend pinning the library version in your projects, or use a strict package manager like [Poetry](https://python-poetry.org/).
 
 ```
-pip install edsteva==0.2.4
+pip install edsteva==0.2.5
 ```
 ## Working example: administrative records relative to visits
 
-Let's consider a basic category of data: administrative records relative to visits. Visits are characterized by a stay type (full hospitalisation, emergency, consultation, etc.). In this example, the objective is to estimate the availability of visits records with respect to time, care site and stay type.
+Let's consider a basic category of data: administrative records relative to visits. A visit is characterized by a care site, a length of stay, a stay type (full hospitalisation, emergency, consultation, etc.) and other characteristics. In this example, the objective is to estimate the availability of visits records with respect to time, care site and stay type.
 
 ### 1. Load your [data][loading-data]
 
@@ -164,7 +164,7 @@ As detailled in [the dedicated section][loading-data], EDS-TeVa is expecting to
 !!! info "Probe"
     A [Probe][probe] is a python class designed to compute a completeness predictor $c(t)$ that characterizes data availability of a target variable over time $t$.
 
-In this example, $c(t)$ predicts the availability of administrative records relative to visits. It is defined for each care site and stay type as the number of visits $n_{visit}(t)$ per month $t$, normalized by the maximum number of records per month $n_{max} = \max_{t}(n_{visit}(t))$ computed over the entire study period:
+In this example, $c(t)$ predicts the availability of administrative records relative to visits. It is defined for each characteristic (care site, stay type, age range, length of stay, etc.) as the number of visits $n_{visit}(t)$ per month $t$, normalized by the maximum number of records per month $n_{max} = \max_{t}(n_{visit}(t))$ computed over the entire study period:
 
 $$
 c(t) = \frac{n_{visit}(t)}{n_{max}}
@@ -188,19 +188,29 @@ probe_path = "my_path/visit.pkl"
 visit = VisitProbe()
 visit.compute(
     data,
+    care_site_levels=["Hospital", "Pole", "UF"],  # (1)
     stay_types={
         "All": ".*",
-        "Urg_Hospit": "urgence|hospitalisés",  # (1)
+        "Urg_Hospit": "urgence|hospitalisés",  # (2)
     },
-    care_site_levels=["Hospital", "Pole", "UF"],  # (2)
+    care_site_specialties=None,  # (3)
+    stay_sources=None,  # (4)
+    length_of_stays=None,  # (5)
+    provenance_sources=None,  # (6)
+    age_ranges=None,  # (7)
 )
-visit.save(path=probe_path)  # (3)
+visit.save(path=probe_path)  # (8)
 visit.predictor.head()
 ```
 
-1. The stay_types argument expects a python dictionary with labels as keys and regex as values.
-2. The care sites are articulated into levels (cf. [AP-HP's reference structure](https://doc-new.eds.aphp.fr/donnees_dispo/donnees_par_domaine/R%C3%A9f%C3%A9rentielsStructures)).
-3. Saving the Probe after computation saves you from having to compute it again. You just use `VisitProbe.load(path=probe_path)`.
+1. The care sites are articulated into levels (cf. [AP-HP's reference structure](https://bigdata-pages.eds.aphp.fr/omop/dev/clinical_data/care_site/#schema-de-la-hierarchie-des-structures)). Here, as an example, we are only interested in those three levels.
+2. The ``stay_types`` argument expects a python dictionary with labels as keys and regex as values.
+3. In this example we want to ignore the care site specialty (e.g., Cardiology, Pediatrics).
+4. In this example we want to ignore the stay source (e.g., MCO, SSR, PSY).
+5. In this example we want to ignore the length of stay (e.g., $>=$ 7 days, $<=$ 2 days).
+6. In this example we want to ignore the provenance source (e.g., service d'urgence, d'une unité de soins de courte durée).
+7. In this example we want to ignore the age range (e.g., 0-18 years, 18-25 years, 25-30 years).
+8. Saving the Probe after computation saves you from having to compute it again. You just use `VisitProbe.load(path=probe_path)`.
 
 ``Saved to /my_path/visit.pkl``
 
@@ -214,7 +224,7 @@ visit.predictor.head()
 
 #### 2.2 Filter your Probe
 
-In this example, we consider the poles of three hospitals. We consequently filter data before any further analysis.
+In this example, we are interested in three hospitals. We consequently filter data before any further analysis.
 
 ```python
 from edsteva.probes import VisitProbe
@@ -238,7 +248,6 @@ from edsteva.viz.dashboards import probe_dashboard
 
 probe_dashboard(
     probe=filtered_visit,
-    care_site_level="Pole",
 )
 ```
 Interactive dashboard is available [here](assets/charts/interactive_visit.html)
@@ -331,7 +340,6 @@ from edsteva.viz.dashboards import probe_dashboard
 probe_dashboard(
     probe=filtered_visit,
     fitted_model=step_function_model,
-    care_site_level="Pole",
 )
 ```
 Interactive dashboard is available [here](assets/charts/interactive_fitted_visit.html).
@@ -391,7 +399,6 @@ from edsteva.viz.dashboards import estimates_dashboard
 estimates_dashboard(
     probe=filtered_visit,
     fitted_model=step_function_model,
-    care_site_level="Pole",
 )
 ```
 

diff --git a/edsteva/__init__.py b/edsteva/__init__.py
@@ -1,4 +1,4 @@
-__version__ = "0.2.4"
+__version__ = "0.2.5"
 
 
 import importlib

diff --git a/edsteva/models/rectangle_function/algos/loss_minimization.py b/edsteva/models/rectangle_function/algos/loss_minimization.py
@@ -42,6 +42,7 @@ def loss_minimization(
         $c(t)$ computed in the Probe.
     index : List[str]
         Variable from which data is grouped.
+
         **EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
     x_col : str, optional
         Column name for the time variable $t$.

diff --git a/edsteva/models/rectangle_function/rectangle_function.py b/edsteva/models/rectangle_function/rectangle_function.py
@@ -23,16 +23,20 @@ class RectangleFunction(BaseModel):
     Attributes
     ----------
     _algo: List[str]
-        Algorithm used to compute the estimates
+        Algorithm used to compute the estimates.
+
         **VALUE**: ``"loss_minimization"``
     _coefs: List[str]
-        Model coefficients
+        Model coefficients.
+
         **VALUE**: ``["t_0", "c_0", "t_1"]``
     _default_metrics: List[str]
-        Metrics to used by default
+        Metrics to used by default.
+
         **VALUE**: ``[error_between_t0_t1]``
     _viz_config: List[str]
         Dictionary of configuration for visualization purpose.
+
         **VALUE**: ``{}``
 
     Example
@@ -87,7 +91,8 @@ def fit_process(
         predictor : pd.DataFrame
             Target variable to be fitted
         index : List[str], optional
-            Variable from which data is grouped
+            Variable from which data is grouped.
+
             **EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
         """
         return algos.get(self._algo)(predictor=predictor, index=index, **kwargs)

diff --git a/edsteva/models/step_function/step_function.py b/edsteva/models/step_function/step_function.py
@@ -22,16 +22,20 @@ class StepFunction(BaseModel):
     Attributes
     ----------
     _algo: List[str]
-        Algorithm used to compute the estimates
+        Algorithm used to compute the estimates.
+
         **VALUE**: ``"loss_minimization"``
     _coefs: List[str]
-        Model coefficients
+        Model coefficients.
+
         **VALUE**: ``["t_0", "c_0"]``
     _default_metrics: List[str]
-        Metrics to used by default
+        Metrics to used by default.
+
         **VALUE**: ``[error_after_t0]``
     _viz_config: List[str]
         Dictionary of configuration for visualization purpose.
+
         **VALUE**: ``{}``
 
     Example
@@ -86,7 +90,8 @@ def fit_process(
         predictor : pd.DataFrame
             Target variable to be fitted
         index : List[str], optional
-            Variable from which data is grouped
+            Variable from which data is grouped.
+
             **EXAMPLE**: `["care_site_level", "stay_type", "note_type", "care_site_id"]`
         """
 

diff --git a/edsteva/probes/base/base.py b/edsteva/probes/base/base.py
@@ -1,6 +1,6 @@
 import datetime
 from abc import ABCMeta, abstractmethod
-from typing import ClassVar, Dict, List, Union
+from typing import ClassVar, List, Union
 
 import altair as alt
 import pandas as pd
@@ -21,9 +21,9 @@ class BaseProbe(metaclass=ABCMeta):
     Attributes
     ----------
     _schema: List[str]
-        The columns a predictor must have
+        The columns a predictor must have.
 
-        **VALUE**: ``["care_site_id", "care_site_level", "stay_type", "date", "c"]``
+        **VALUE**: ``["date", "c"]``
     predictor: pd.DataFrame
         Available with the [``compute()``][edsteva.probes.base.BaseProbe.compute] method
     _cache_predictor: pd.DataFrame
@@ -36,7 +36,7 @@ class BaseProbe(metaclass=ABCMeta):
         It describes the care site structure (cf. [``prepare_care_site_relationship()``][edsteva.probes.utils.prepare_df.prepare_care_site_relationship])
     """
 
-    _schema: ClassVar[List[str]] = ["care_site_level", "care_site_id", "date", "c"]
+    _schema: ClassVar[List[str]] = ["date", "c"]
 
     def __init__(
         self,
@@ -127,9 +127,6 @@ def compute_process(
         care_site_relationship: pd.DataFrame,
         start_date: datetime,
         end_date: datetime,
-        care_site_levels: List[str],
-        stay_types: Union[str, Dict[str, str]],
-        care_site_ids: List[int],
         **kwargs,
     ) -> pd.DataFrame:
         """Process the data in order to obtain a predictor table"""
@@ -139,9 +136,6 @@ def compute(
         data: Data,
         start_date: datetime = None,
         end_date: datetime = None,
-        care_site_levels: List[str] = None,
-        stay_types: Union[str, Dict[str, str]] = None,
-        care_site_ids: List[int] = None,
         with_cache: bool = True,
         **kwargs,
     ) -> None:
@@ -165,12 +159,6 @@ def compute(
             **EXAMPLE**: `"2019-05-01"`
         end_date : datetime, optional
             **EXAMPLE**: `"2021-07-01"`
-        care_site_levels : List[str], optional
-            **EXAMPLE**: `["Hospital", "Pole", "UF"]`
-        stay_types : Union[str, Dict[str, str]], optional
-            **EXAMPLE**: `{"All": ".*"}` or `{"All": ".*", "Urg_and_consult": "urgences|consultation"}` or `"hospitalisés`
-        care_site_ids : List[int], optional
-            **EXAMPLE**: `[8312056386, 8312027648]`
 
         Attributes
         ----------
@@ -215,9 +203,6 @@ def compute(
             care_site_relationship=care_site_relationship,
             start_date=start_date,
             end_date=end_date,
-            care_site_levels=care_site_levels,
-            stay_types=stay_types,
-            care_site_ids=care_site_ids,
             **kwargs,
         )
         self.is_computed_probe()