Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equipment pkgs #1378

Conversation

EvaJanouskova
Copy link
Collaborator

No description provided.

Copy link
Collaborator

@tbhallett tbhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these call (to lookup_item_codes_from_pkg_name()) are relatively "expensive" (string comparisons for every row in the pandas data frame), I think we want to avoid repeating them. However, as written currently, every time the HSI_Event is run, we have to do the look-up.

So my proposal would be to do this:

  1. In Equipment.__init__(), add something like:
self._pkg_lookup = self._create_pkg_lookup()
  1. In self._create_pkg_lookup(), use the ResourceFile to return a dict of the form: {PackageName: Set(item_codes_in_package)}.

  2. Re-write lookup_item_codes_from_pkg_name() so that it is looking inside that dict (self._pkg_lookup) rather than searching through the ResourceFile. e.g.

def lookup_item_codes_from_pkg_name(self, pkg_name: str) -> Set[int]:
   if pkg_name in self._pkg_lookup:
        return self._pkg_lookup.get(pkg_name)
   else:
        raise ValueError(f"Package Name not recognised: {pkg_name}")
  1. Lastly, we could rename lookup_item_codes_from_pkg_name to from_pkg_name to make it slicker.

What do you think?

@EvaJanouskova
Copy link
Collaborator Author

EvaJanouskova commented May 28, 2024

@tbhallett, I mostly agree. Only to call the pkgs, it is still too long:
self.add_equipment(self.healthcare_system.equipment.from_pkg_name('Major Surgery'))

I think, we could make it more user friendly, if we have either

  • two fncs: add_equipment() and add_equipment_from_pkg(), or
  • an input parametr for add_equipment() fnc, allowing to use pkg_names instead of item_names, smt like
    def add_equipment(self, item_codes: Union[int, str, Iterable[int], Iterable[str]], from_pkg=False) -> None:

Also, shouldn't we call the input item_codes differently since it can be code(s) or name(s) of the item(s)? Maybe just items or equip_items?

@tbhallett
Copy link
Collaborator

@tbhallett, I mostly agree. Only to call the pkgs, it is still too long:

self.add_equipment(self.healthcare_system.equipment.from_pkg_name('Major Surgery'))

I think, we could make it more user friendly, if we have either

  • two fncs: add_equipment() and add_equipment_from_pkg(), or

  • an input parametr for add_equipment() fnc, allowing to use pkg_names instead of item_names, smt like

def add_equipment(self, item_codes: Union[int, str, Iterable[int], Iterable[str]], from_pkg=False) -> None:

I know what you mean, but I think it's ok. That statement may be a bit long to write but is the most transparent and avoids us making big changes to the equipment module at this point.

(Originally the two steps were intended to be separate (like consumables) where the module looks up the item codes and save them, and the HSI EVENT just refers to the save information on the module. But now we have the cached look-up this is not necessary.)

I think the ultimate thing would be just to pass in a package name string as any other string and for it to be automatically detected as being a package.

Please could you raise this as an issue? (Could click "create issue from comment" on this box!)

Also, shouldn't we call the input item_codes differently since it can be code(s) or name(s) of the item(s)? Maybe just items or equip_items?

Yes, good point. We could call the argument where we accept int
or str just 'items'. This can be a separate PR to tidy that up if you'd like to raise it. Thanks.

@EvaJanouskova EvaJanouskova force-pushed the hallett/equipment_changes_and_structure_UpdateEJ branch from 7745176 to 6687db9 Compare May 30, 2024 22:22
@EvaJanouskova
Copy link
Collaborator Author

I did rebase this on hallett/equipment_changes_and_structure and updated the test_core_functionality_of_equipment_class in test_equipment along with that.

It seems my commits broke the tests. I'm looking into it but so far no success.

@EvaJanouskova
Copy link
Collaborator Author

EvaJanouskova commented May 30, 2024

I did rebase this on hallett/equipment_changes_and_structure and updated the test_core_functionality_of_equipment_class in test_equipment along with that.

It seems my commits broke the tests. I'm looking into it but so far no success.

I found one thing: I added two items to the RF_EquipmentCatalogue, but they also need to be added in the data_availability resource file. Feel free to proceed with that if you find time before I do. (I'll continue tomorrow in the late morning/early afternoon.)

@sakshi, @tbhallett, I added the new items (last 2 lines: Endoscope, item code 402; Electrocardiogram, item code 403) of equipment in the ResourceFile_EquipmentCatalogue.csv which were added in the original PR of equipment implementation but were lost in the process of creating the new PR.

Could you please add these items along with estimated availability into the script generating the ResourceFile_Equipment_Availability_Estimates.csv and regenerate this RF?

@joehcollins
Copy link
Collaborator

@EvaJanouskova - i've added ANC packages but the tests are failing due to an assert error at line 126 of equipment.py. Are test passing on your machine?

@EvaJanouskova
Copy link
Collaborator Author

EvaJanouskova commented May 31, 2024

@EvaJanouskova - i've added ANC packages but the tests are failing due to an assert error at line 126 of equipment.py. Are test passing on your machine?

Thank you Joe.

No, the tests are failing because of what I state in the comment above.

If you like to test it on your machine, you can remove last 2 lines in the ResourceFile_EquipmentCatalogue.csv and then all the tests in test_equipment should pass.

…t its functionality

diff --git src/tlo/methods/equipment.py src/tlo/methods/equipment.py
index 1759a994f..099226748 100644
--- src/tlo/methods/equipment.py
+++ src/tlo/methods/equipment.py
@@ -253,8 +253,17 @@ class Equipment:
         It is expected that this is used by the disease module once and then the resulting equipment item_codes are
         saved on the module."""
         df = self.catalogue
+        item_codes = set()

-        if pkg_name not in df['Pkg_Name'].unique().split(", "):
+        item_codes.update(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values)
+
+        all_pkg_names = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())])
+        all_multiple_pkg_names = [name for name in all_pkg_names if ", " in name]
+        for multiple_pkg_name in all_multiple_pkg_names:
+            if pkg_name in multiple_pkg_name.split(", "):
+                item_codes.update(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values)
+
+        if item_codes:
+            return item_codes
+        else:
             raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}')
-
-        return set(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values)
diff --git tests/test_equipment.py tests/test_equipment.py
index a02ea282f..a39124454 100644
--- tests/test_equipment.py
+++ tests/test_equipment.py
@@ -22,14 +22,18 @@ def test_core_functionality_of_equipment_class(seed):

     # Create toy data
     catalogue = pd.DataFrame(
+        # PkgWith0+1 stands alone or as multiple pkgs for one item; PkgWith1 is only as multiple pkgs
+        # for one item; PkgWith3 only stands alone
         [
             {"Item_Description": "ItemZero", "Item_Code": 0, "Pkg_Name": 'PkgWith0+1'},
-            {"Item_Description": "ItemOne", "Item_Code": 1, "Pkg_Name": 'PkgWith0+1'},
+            {"Item_Description": "ItemOne", "Item_Code": 1, "Pkg_Name": 'PkgWith0+1, PkgWith1'},
             {"Item_Description": "ItemTwo", "Item_Code": 2, "Pkg_Name": float('nan')},
+            {"Item_Description": "ItemThree", "Item_Code": 3, "Pkg_Name": float('PkgWith3')},
         ]
     )
     data_availability = pd.DataFrame(
-        # item 0 is not available anywhere; item 1 is available everywhere; item 2 is available only at facility_id=1
+        # item 0 is not available anywhere; item 1 is available everywhere; item 2 is available only at facility_id=1;
+        # availability not defined for item 3
         [
             {"Item_Code": 0, "Facility_ID": 0, "Pr_Available": 0.0},
             {"Item_Code": 0, "Facility_ID": 1, "Pr_Available": 0.0},
@@ -134,17 +138,22 @@ def test_core_functionality_of_equipment_class(seed):

     # Lookup the item_codes that belong in a particular package.
     # - When package is recognised
-    assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1')  # these items are in the same
-    #                                                                                     package
+    # if items are in the same package (once standing alone, once within multiple pkgs defined for item)
+    assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1')
+    # if the pkg within multiple pkgs defined for item
+    assert {1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith1')
+    # if the pkg only stands alone
+    assert {3} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith3')
+
     # - Error thrown when package is not recognised
     with pytest.raises(ValueError):
         eq_default.lookup_item_codes_from_pkg_name(pkg_name='')

-
 equipment_item_code_that_is_available = [0, 1, ]
 equipment_item_code_that_is_not_available = [2, 3,]

+
 def run_simulation_and_return_log(
     seed, tmpdir, equipment_in_init, equipment_in_apply
 ) -> Dict:
…pensive" version, lookup fnc renamed to from_pkg_names()

diff --git src/tlo/methods/equipment.py src/tlo/methods/equipment.py
index 099226748..3a4fb24ba 100644
--- src/tlo/methods/equipment.py
+++ src/tlo/methods/equipment.py
@@ -1,6 +1,6 @@
 import warnings
 from collections import defaultdict
-from typing import Counter, Iterable, Literal, Set, Union
+from typing import Counter, Dict, Iterable, Literal, Set, Union

 import numpy as np
 import pandas as pd
@@ -77,6 +77,7 @@ class Equipment:

         # - Data structures for quick look-ups for items and descriptors
         self._item_code_lookup = self.catalogue.set_index('Item_Description')['Item_Code'].to_dict()
+        self._pkg_lookup = self._create_pkg_lookup()
         self._all_item_descriptors = set(self._item_code_lookup.keys())
         self._all_item_codes = set(self._item_code_lookup.values())
         self._all_fac_ids = self.master_facilities_list['Facility_ID'].unique()
@@ -248,22 +249,37 @@ class Equipment:
                 data=row.to_dict(),
             )

-    def lookup_item_codes_from_pkg_name(self, pkg_name: str) -> Set[int]:
-        """Convenience function to find the set of item_codes that are grouped under a package name in the catalogue.
-        It is expected that this is used by the disease module once and then the resulting equipment item_codes are
-        saved on the module."""
-        df = self.catalogue
-        item_codes = set()
-
-        item_codes.update(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values)
-
-        all_pkg_names = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())])
-        all_multiple_pkg_names = [name for name in all_pkg_names if ", " in name]
-        for multiple_pkg_name in all_multiple_pkg_names:
-            if pkg_name in multiple_pkg_name.split(", "):
-                item_codes.update(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values)
-
-        if item_codes:
-            return item_codes
+    def from_pkg_names(self, pkg_names: Union[str, Iterable[str]]) -> Set[int]:
+        """Convenience function to find the set of item_codes that are grouped under requested package name(s) in the
+        catalogue."""
+        # Make into a set if it is not one already
+        if isinstance(pkg_names, (str, int)):
+            pkg_names = set([pkg_names])
         else:
-            raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}')
+            pkg_names = set(pkg_names)
+
+        item_codes = set()
+        for pkg_name in pkg_names:
+            if pkg_name in self._pkg_lookup.keys():
+                item_codes.update(self._pkg_lookup[pkg_name])
+            else:
+                raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}')
+
+        return item_codes
+
+    def _create_pkg_lookup(self) -> Dict[str, Set[int]]:
+        df = self.catalogue
+        pkg_lookup = dict()
+
+        pkg_names_raw = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())])
+        all_multiple_pkg_names = set(name for name in pkg_names_raw if ", " in name)
+        all_pkg_names = pkg_names_raw - all_multiple_pkg_names
+        for pkg_name in all_pkg_names:
+            pkg_lookup[pkg_name] = set(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values)
+        for multiple_pkg_name in all_multiple_pkg_names:
+            for pkg_name in multiple_pkg_name.split(", "):
+                if pkg_name not in all_pkg_names:
+                    pkg_lookup[pkg_name] = set()
+                    all_pkg_names.update({pkg_name})
+                pkg_lookup[pkg_name].update(set(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values))
+        return pkg_lookup
diff --git tests/test_equipment.py tests/test_equipment.py
index a39124454..0a462f08f 100644
--- tests/test_equipment.py
+++ tests/test_equipment.py
@@ -139,15 +139,15 @@ def test_core_functionality_of_equipment_class(seed):
     # Lookup the item_codes that belong in a particular package.
     # - When package is recognised
     # if items are in the same package (once standing alone, once within multiple pkgs defined for item)
-    assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1')
+    assert {0, 1} == eq_default.from_pkg_names(pkg_name='PkgWith0+1')
     # if the pkg within multiple pkgs defined for item
-    assert {1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith1')
+    assert {1} == eq_default.from_pkg_names(pkg_name='PkgWith1')
     # if the pkg only stands alone
-    assert {3} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith3')
+    assert {3} == eq_default.from_pkg_names(pkg_name='PkgWith3')

     # - Error thrown when package is not recognised
     with pytest.raises(ValueError):
-        eq_default.lookup_item_codes_from_pkg_name(pkg_name='')
+        eq_default.from_pkg_names(pkg_names='')

 equipment_item_code_that_is_available = [0, 1, ]
@EvaJanouskova EvaJanouskova force-pushed the hallett/equipment_changes_and_structure_UpdateEJ branch from b34fb01 to 7e5940f Compare May 31, 2024 11:13
@EvaJanouskova
Copy link
Collaborator Author

rebased on hallett/equipment_changes_and_structure branch

@tbhallett
Copy link
Collaborator

I did rebase this on hallett/equipment_changes_and_structure and updated the test_core_functionality_of_equipment_class in test_equipment along with that.
It seems my commits broke the tests. I'm looking into it but so far no success.

I found one thing: I added two items to the RF_EquipmentCatalogue, but they also need to be added in the data_availability resource file. Feel free to proceed with that if you find time before I do. (I'll continue tomorrow in the late morning/early afternoon.)

@sakshi, @tbhallett, I added the new items (last 2 lines: Endoscope, item code 402; Electrocardiogram, item code 403) of equipment in the ResourceFile_EquipmentCatalogue.csv which were added in the original PR of equipment implementation but were lost in the process of creating the new PR.

Could you please add these items along with estimated availability into the script generating the ResourceFile_Equipment_Availability_Estimates.csv and regenerate this RF?

The file is this one: src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py. It should run for you. It will automatically incorproate item_codes you have added to the catalogue.

@EvaJanouskova
Copy link
Collaborator Author

I did rebase this on hallett/equipment_changes_and_structure and updated the test_core_functionality_of_equipment_class in test_equipment along with that.
It seems my commits broke the tests. I'm looking into it but so far no success.

I found one thing: I added two items to the RF_EquipmentCatalogue, but they also need to be added in the data_availability resource file. Feel free to proceed with that if you find time before I do. (I'll continue tomorrow in the late morning/early afternoon.)

@sakshi, @tbhallett, I added the new items (last 2 lines: Endoscope, item code 402; Electrocardiogram, item code 403) of equipment in the ResourceFile_EquipmentCatalogue.csv which were added in the original PR of equipment implementation but were lost in the process of creating the new PR.
Could you please add these items along with estimated availability into the script generating the ResourceFile_Equipment_Availability_Estimates.csv and regenerate this RF?

The file is this one: src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py. It should run for you. It will automatically incorproate item_codes you have added to the catalogue.

Doesn't it require to define availability for those items somewhere? If so, where and to what values?

It gives me this error message:

Traceback (most recent call last):
  File "/home/eva/PycharmProjects/TLOmodel/src/scripts/data_file_processing/healthsystem/equipment/equipment_availability_estimation.py", line 337, in <module>
    assert final_equipment_availability_export_full.notnull().all()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

@tbhallett tbhallett merged commit 135fc2e into hallett/equipment_changes_and_structure May 31, 2024
10 of 11 checks passed
@tbhallett tbhallett deleted the hallett/equipment_changes_and_structure_UpdateEJ branch May 31, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants