-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
equipment pkgs #1378
equipment pkgs #1378
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As these call (to lookup_item_codes_from_pkg_name()
) are relatively "expensive" (string comparisons for every row in the pandas data frame), I think we want to avoid repeating them. However, as written currently, every time the HSI_Event
is run, we have to do the look-up.
So my proposal would be to do this:
- In
Equipment.__init__()
, add something like:
self._pkg_lookup = self._create_pkg_lookup()
-
In
self._create_pkg_lookup()
, use the ResourceFile to return adict
of the form:{PackageName: Set(item_codes_in_package)}
. -
Re-write
lookup_item_codes_from_pkg_name()
so that it is looking inside that dict (self._pkg_lookup
) rather than searching through the ResourceFile. e.g.
def lookup_item_codes_from_pkg_name(self, pkg_name: str) -> Set[int]:
if pkg_name in self._pkg_lookup:
return self._pkg_lookup.get(pkg_name)
else:
raise ValueError(f"Package Name not recognised: {pkg_name}")
- Lastly, we could rename
lookup_item_codes_from_pkg_name
tofrom_pkg_name
to make it slicker.
What do you think?
@tbhallett, I mostly agree. Only to call the pkgs, it is still too long: I think, we could make it more user friendly, if we have either
Also, shouldn't we call the input |
I know what you mean, but I think it's ok. That statement may be a bit long to write but is the most transparent and avoids us making big changes to the equipment module at this point. (Originally the two steps were intended to be separate (like consumables) where the module looks up the item codes and save them, and the HSI EVENT just refers to the save information on the module. But now we have the cached look-up this is not necessary.) I think the ultimate thing would be just to pass in a package name string as any other string and for it to be automatically detected as being a package. Please could you raise this as an issue? (Could click "create issue from comment" on this box!)
Yes, good point. We could call the argument where we accept int |
7745176
to
6687db9
Compare
I did rebase this on hallett/equipment_changes_and_structure and updated the It seems my commits broke the tests. I'm looking into it but so far no success. |
@sakshi, @tbhallett, I added the new items (last 2 lines: Endoscope, item code 402; Electrocardiogram, item code 403) of equipment in the Could you please add these items along with estimated availability into the script generating the |
@EvaJanouskova - i've added ANC packages but the tests are failing due to an assert error at line 126 of equipment.py. Are test passing on your machine? |
Thank you Joe. No, the tests are failing because of what I state in the comment above. If you like to test it on your machine, you can remove last 2 lines in the |
…within multiple pkgs
…t its functionality diff --git src/tlo/methods/equipment.py src/tlo/methods/equipment.py index 1759a994f..099226748 100644 --- src/tlo/methods/equipment.py +++ src/tlo/methods/equipment.py @@ -253,8 +253,17 @@ class Equipment: It is expected that this is used by the disease module once and then the resulting equipment item_codes are saved on the module.""" df = self.catalogue + item_codes = set() - if pkg_name not in df['Pkg_Name'].unique().split(", "): + item_codes.update(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values) + + all_pkg_names = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())]) + all_multiple_pkg_names = [name for name in all_pkg_names if ", " in name] + for multiple_pkg_name in all_multiple_pkg_names: + if pkg_name in multiple_pkg_name.split(", "): + item_codes.update(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values) + + if item_codes: + return item_codes + else: raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}') - - return set(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values) diff --git tests/test_equipment.py tests/test_equipment.py index a02ea282f..a39124454 100644 --- tests/test_equipment.py +++ tests/test_equipment.py @@ -22,14 +22,18 @@ def test_core_functionality_of_equipment_class(seed): # Create toy data catalogue = pd.DataFrame( + # PkgWith0+1 stands alone or as multiple pkgs for one item; PkgWith1 is only as multiple pkgs + # for one item; PkgWith3 only stands alone [ {"Item_Description": "ItemZero", "Item_Code": 0, "Pkg_Name": 'PkgWith0+1'}, - {"Item_Description": "ItemOne", "Item_Code": 1, "Pkg_Name": 'PkgWith0+1'}, + {"Item_Description": "ItemOne", "Item_Code": 1, "Pkg_Name": 'PkgWith0+1, PkgWith1'}, {"Item_Description": "ItemTwo", "Item_Code": 2, "Pkg_Name": float('nan')}, + {"Item_Description": "ItemThree", "Item_Code": 3, "Pkg_Name": float('PkgWith3')}, ] ) data_availability = pd.DataFrame( - # item 0 is not available anywhere; item 1 is available everywhere; item 2 is available only at facility_id=1 + # item 0 is not available anywhere; item 1 is available everywhere; item 2 is available only at facility_id=1; + # availability not defined for item 3 [ {"Item_Code": 0, "Facility_ID": 0, "Pr_Available": 0.0}, {"Item_Code": 0, "Facility_ID": 1, "Pr_Available": 0.0}, @@ -134,17 +138,22 @@ def test_core_functionality_of_equipment_class(seed): # Lookup the item_codes that belong in a particular package. # - When package is recognised - assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1') # these items are in the same - # package + # if items are in the same package (once standing alone, once within multiple pkgs defined for item) + assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1') + # if the pkg within multiple pkgs defined for item + assert {1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith1') + # if the pkg only stands alone + assert {3} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith3') + # - Error thrown when package is not recognised with pytest.raises(ValueError): eq_default.lookup_item_codes_from_pkg_name(pkg_name='') - equipment_item_code_that_is_available = [0, 1, ] equipment_item_code_that_is_not_available = [2, 3,] + def run_simulation_and_return_log( seed, tmpdir, equipment_in_init, equipment_in_apply ) -> Dict:
…pensive" version, lookup fnc renamed to from_pkg_names() diff --git src/tlo/methods/equipment.py src/tlo/methods/equipment.py index 099226748..3a4fb24ba 100644 --- src/tlo/methods/equipment.py +++ src/tlo/methods/equipment.py @@ -1,6 +1,6 @@ import warnings from collections import defaultdict -from typing import Counter, Iterable, Literal, Set, Union +from typing import Counter, Dict, Iterable, Literal, Set, Union import numpy as np import pandas as pd @@ -77,6 +77,7 @@ class Equipment: # - Data structures for quick look-ups for items and descriptors self._item_code_lookup = self.catalogue.set_index('Item_Description')['Item_Code'].to_dict() + self._pkg_lookup = self._create_pkg_lookup() self._all_item_descriptors = set(self._item_code_lookup.keys()) self._all_item_codes = set(self._item_code_lookup.values()) self._all_fac_ids = self.master_facilities_list['Facility_ID'].unique() @@ -248,22 +249,37 @@ class Equipment: data=row.to_dict(), ) - def lookup_item_codes_from_pkg_name(self, pkg_name: str) -> Set[int]: - """Convenience function to find the set of item_codes that are grouped under a package name in the catalogue. - It is expected that this is used by the disease module once and then the resulting equipment item_codes are - saved on the module.""" - df = self.catalogue - item_codes = set() - - item_codes.update(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values) - - all_pkg_names = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())]) - all_multiple_pkg_names = [name for name in all_pkg_names if ", " in name] - for multiple_pkg_name in all_multiple_pkg_names: - if pkg_name in multiple_pkg_name.split(", "): - item_codes.update(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values) - - if item_codes: - return item_codes + def from_pkg_names(self, pkg_names: Union[str, Iterable[str]]) -> Set[int]: + """Convenience function to find the set of item_codes that are grouped under requested package name(s) in the + catalogue.""" + # Make into a set if it is not one already + if isinstance(pkg_names, (str, int)): + pkg_names = set([pkg_names]) else: - raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}') + pkg_names = set(pkg_names) + + item_codes = set() + for pkg_name in pkg_names: + if pkg_name in self._pkg_lookup.keys(): + item_codes.update(self._pkg_lookup[pkg_name]) + else: + raise ValueError(f'That Pkg_Name is not in the catalogue: {pkg_name=}') + + return item_codes + + def _create_pkg_lookup(self) -> Dict[str, Set[int]]: + df = self.catalogue + pkg_lookup = dict() + + pkg_names_raw = set(df['Pkg_Name'].unique()[~pd.isnull(df['Pkg_Name'].unique())]) + all_multiple_pkg_names = set(name for name in pkg_names_raw if ", " in name) + all_pkg_names = pkg_names_raw - all_multiple_pkg_names + for pkg_name in all_pkg_names: + pkg_lookup[pkg_name] = set(df.loc[df['Pkg_Name'] == pkg_name, 'Item_Code'].values) + for multiple_pkg_name in all_multiple_pkg_names: + for pkg_name in multiple_pkg_name.split(", "): + if pkg_name not in all_pkg_names: + pkg_lookup[pkg_name] = set() + all_pkg_names.update({pkg_name}) + pkg_lookup[pkg_name].update(set(df.loc[df['Pkg_Name'] == multiple_pkg_name, 'Item_Code'].values)) + return pkg_lookup diff --git tests/test_equipment.py tests/test_equipment.py index a39124454..0a462f08f 100644 --- tests/test_equipment.py +++ tests/test_equipment.py @@ -139,15 +139,15 @@ def test_core_functionality_of_equipment_class(seed): # Lookup the item_codes that belong in a particular package. # - When package is recognised # if items are in the same package (once standing alone, once within multiple pkgs defined for item) - assert {0, 1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith0+1') + assert {0, 1} == eq_default.from_pkg_names(pkg_name='PkgWith0+1') # if the pkg within multiple pkgs defined for item - assert {1} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith1') + assert {1} == eq_default.from_pkg_names(pkg_name='PkgWith1') # if the pkg only stands alone - assert {3} == eq_default.lookup_item_codes_from_pkg_name(pkg_name='PkgWith3') + assert {3} == eq_default.from_pkg_names(pkg_name='PkgWith3') # - Error thrown when package is not recognised with pytest.raises(ValueError): - eq_default.lookup_item_codes_from_pkg_name(pkg_name='') + eq_default.from_pkg_names(pkg_names='') equipment_item_code_that_is_available = [0, 1, ]
b34fb01
to
7e5940f
Compare
rebased on |
The file is this one: |
Doesn't it require to define availability for those items somewhere? If so, where and to what values? It gives me this error message:
|
135fc2e
into
hallett/equipment_changes_and_structure
No description provided.