Implement filtering by variable depth (#886)

IAMconsortium · Nov 3, 2024 · 86faf0d · 86faf0d
1 parent 735c243
commit 86faf0d
Show file tree

Hide file tree

Showing 4 changed files with 86 additions and 29 deletions.
diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,7 +1,8 @@
 # Next release
 
+- [#886](https://github.com/IAMconsortium/pyam/pull/886) Implement filtering by variable `depth`
 - [#880](https://github.com/IAMconsortium/pyam/pull/880) Use `pd.Series.iloc[pos]` for forward-compatibility
-- [#877](https://github.com/IAMconsortium/pyam/pull/xxx) Support `engine` and other `pd.ExcelFile` keywords.
+- [#877](https://github.com/IAMconsortium/pyam/pull/877) Support `engine` and other `pd.ExcelFile` keywords.
 
 # Release v2.2.4
 
@@ -109,9 +110,11 @@ method. This feature is now implemented via the **nomenclature.RegionProcessor**
 - [#773](https://github.com/IAMconsortium/pyam/pull/773) Remove `map_regions()` and default mappings csv
 - [#772](https://github.com/IAMconsortium/pyam/pull/772) Show all missing rows for `require_data()`
 - [#771](https://github.com/IAMconsortium/pyam/pull/771) Refactor to start a separate validation module
-- [#766](https://github.com/IAMconsortium/pyam/pull/766) Use **ixmp4** for credentials to access a Scenario Explorer database
+- [#766](https://github.com/IAMconsortium/pyam/pull/766) Use **ixmp4** for credentials to access a Scenario Explorer
+  database
 - [#764](https://github.com/IAMconsortium/pyam/pull/764) Clean-up exposing internal methods and attributes
-- [#763](https://github.com/IAMconsortium/pyam/pull/763) Implement a fix against carrying over unused levels when initializing from an indexed pandas object
+- [#763](https://github.com/IAMconsortium/pyam/pull/763) Implement a fix against carrying over unused levels when
+  initializing from an indexed pandas object
 - [#759](https://github.com/IAMconsortium/pyam/pull/759) Excise "exclude" column from meta and add a own attribute
 - [#747](https://github.com/IAMconsortium/pyam/pull/747) Drop support for Python 3.7
 
@@ -172,10 +175,12 @@ Bump minimum version of **pandas** to v1.2.0 to support automatic engine selecti
 ## Individual updates
 
 - [#715](https://github.com/IAMconsortium/pyam/pull/715) Add a `require_data()` method
-- [#713](https://github.com/IAMconsortium/pyam/pull/713) Informative error when using lists for filter by level, `level` now a forbidden column.
+- [#713](https://github.com/IAMconsortium/pyam/pull/713) Informative error when using lists for filter by level, `level`
+  now a forbidden column.
 - [#709](https://github.com/IAMconsortium/pyam/pull/709) Hotfix ops to support `fillna=0`
 - [#708](https://github.com/IAMconsortium/pyam/pull/708) Remove 'xls' as by-default-supported file format
-- [#686](https://github.com/IAMconsortium/pyam/pull/686) Add support for (weighted) quantile timeseries as `df.compute.quantiles()` with a [tutorial](https://pyam-iamc.readthedocs.io/en/stable/tutorials/quantiles.html)
+- [#686](https://github.com/IAMconsortium/pyam/pull/686) Add support for (weighted) quantile timeseries as
+  `df.compute.quantiles()` with a [tutorial](https://pyam-iamc.readthedocs.io/en/stable/tutorials/quantiles.html)
 
 # Release v1.6.0
 
@@ -193,15 +198,18 @@ dependency for better performance.
 ## Individual updates
 
 - [#702](https://github.com/IAMconsortium/pyam/pull/702) Migrate `compute_bias()` to `compute` module
-- [#701](https://github.com/IAMconsortium/pyam/pull/701) Add **xlsxwriter** as dependency to improve `to_excel()` performance
-- [#699](https://github.com/IAMconsortium/pyam/pull/699) Add filter options to IIASA API `index()`, `meta()` and `properties()` methods
+- [#701](https://github.com/IAMconsortium/pyam/pull/701) Add **xlsxwriter** as dependency to improve `to_excel()`
+  performance
+- [#699](https://github.com/IAMconsortium/pyam/pull/699) Add filter options to IIASA API `index()`, `meta()` and
+  `properties()` methods
 - [#697](https://github.com/IAMconsortium/pyam/pull/697) Add warning if IIASA API returns empty result
 - [#696](https://github.com/IAMconsortium/pyam/pull/696) Added ability to load preferentially from a local cache
 - [#695](https://github.com/IAMconsortium/pyam/pull/695) Remove unused meta levels during initialization
 - [#688](https://github.com/IAMconsortium/pyam/pull/688) Remove ixmp as optional dependency
 - [#684](https://github.com/IAMconsortium/pyam/pull/684) Use new IIASA-manager API with token refresh
 - [#679](https://github.com/IAMconsortium/pyam/pull/679) `set_meta()` now supports pandas.DataFrame as an argument
-- [#674](https://github.com/IAMconsortium/pyam/pull/674) Support filtering data by model-scenario pairs with the `index` argument to `filter()` and `slice()`
+- [#674](https://github.com/IAMconsortium/pyam/pull/674) Support filtering data by model-scenario pairs with the `index`
+  argument to `filter()` and `slice()`
 
 # Release v1.5.0
 
@@ -213,7 +221,8 @@ class that allows faster filtering and inspection of an **IamDataFrame**.
 ## Individual updates
 
 - [#668](https://github.com/IAMconsortium/pyam/pull/668) Allow renaming of empty IamDataFrame objects
-- [#665](https://github.com/IAMconsortium/pyam/pull/665) Provide better support for IamDataFrame objects with non-standard index dimensions
+- [#665](https://github.com/IAMconsortium/pyam/pull/665) Provide better support for IamDataFrame objects with
+  non-standard index dimensions
 - [#659](https://github.com/IAMconsortium/pyam/pull/659) Add an `offset` method
 - [#657](https://github.com/IAMconsortium/pyam/pull/657) Add an `IamSlice` class
 
@@ -242,7 +251,8 @@ an empty **IamDataFrame**. Previously, this raised an error.
 
 ## Individual updates
 
-- [#651](https://github.com/IAMconsortium/pyam/pull/651) Pin `pint<=0.18` as a quickfix for a regression in the latest release
+- [#651](https://github.com/IAMconsortium/pyam/pull/651) Pin `pint<=0.18` as a quickfix for a regression in the latest
+  release
 - [#650](https://github.com/IAMconsortium/pyam/pull/650) Add IPCC AR6 WGIII colors to PYAM_COLORS
 - [#647](https://github.com/IAMconsortium/pyam/pull/647) Pin `unfccc-di-api` to latest release
 - [#634](https://github.com/IAMconsortium/pyam/pull/634) Better error message when initializing with invisible columns
@@ -263,13 +273,15 @@ pandas [v1.4.0](https://pandas.pydata.org/docs/whatsnew/v1.4.0.html).
 
 ## Individual updates
 
-- [#608](https://github.com/IAMconsortium/pyam/pull/608) The method `assert_iamframe_equals()` passes if an all-nan-col is present
+- [#608](https://github.com/IAMconsortium/pyam/pull/608) The method `assert_iamframe_equals()` passes if an all-nan-col
+  is present
 - [#604](https://github.com/IAMconsortium/pyam/pull/604) Add an annualized-growth-rate method
 - [#602](https://github.com/IAMconsortium/pyam/pull/602) Add a `compute` module/accessor and a learning-rate method
 - [#600](https://github.com/IAMconsortium/pyam/pull/600) Add a `diff()` method
 - [#592](https://github.com/IAMconsortium/pyam/pull/592) Fix for running in jupyter-lab notebooks
 - [#590](https://github.com/IAMconsortium/pyam/pull/590) Update expected figures of plotting tests to use matplotlib 3.5
-- [#586](https://github.com/IAMconsortium/pyam/pull/586) Improve error reporting for non-numeric data in any value column
+- [#586](https://github.com/IAMconsortium/pyam/pull/586) Improve error reporting for non-numeric data in any value
+  column
 
 # Release v1.2.0
 
@@ -287,10 +299,12 @@ was added as a dependency.
 
 ## Individual updates
 
-- [#585](https://github.com/IAMconsortium/pyam/pull/585) Include revisions to the ORE manuscript source code following acceptance/publication
+- [#585](https://github.com/IAMconsortium/pyam/pull/585) Include revisions to the ORE manuscript source code following
+  acceptance/publication
 - [#583](https://github.com/IAMconsortium/pyam/pull/583) Add profiler module for performance benchmarking
 - [#579](https://github.com/IAMconsortium/pyam/pull/579) Increase performance of IamDataFrame initialization
-- [#572](https://github.com/IAMconsortium/pyam/pull/572) Unpinned the requirements for xlrd and added openpyxl as a requirement to ensure ongoing support of both `.xlsx` and `.xls` files out of the box
+- [#572](https://github.com/IAMconsortium/pyam/pull/572) Unpinned the requirements for xlrd and added openpyxl as a
+  requirement to ensure ongoing support of both `.xlsx` and `.xls` files out of the box
 
 # Release v1.1.0
 

diff --git a/docs/api/filtering.rst b/docs/api/filtering.rst
@@ -23,8 +23,9 @@ Timeseries data coordinates
 - Any *column* of the :attr:`IamDataFrame.coordinates <pyam.IamDataFrame.coordinates>`
   ('**region**', '**variable**', '**unit**'): string or list of strings
 - '**measurand**': a tuple (or list of tuples) of '*variable*' and '*unit*'
-- '**level**': the "depth" of entries in the '*variable*' column (number of '|')
-  (excluding the strings in the '*variable*' argument, if given)
+- '**depth**': the "depth" of entries in the '*variable*' column (number of '|')
+- '**level**': the "depth" of entries in the '*variable*' column (number of '|'),
+  excluding the strings in the '*variable*' argument (if given)
 - '**year**': takes an integer (int/:class:`numpy.int64`), a list of integers or
   a range. Note that the last year of a range is not included,
   so ``range(2010, 2015)`` is interpreted as ``[2010, ..., 2014]``

diff --git a/pyam/core.py b/pyam/core.py
@@ -1905,7 +1905,7 @@ def filter(self, *, keep=True, inplace=False, **kwargs):
         if not inplace:
             return ret
 
-    def _apply_filters(self, level=None, **filters):  # noqa: C901
+    def _apply_filters(self, level=None, depth=None, **filters):  # noqa: C901
         """Determine rows to keep in data for given set of filters
 
         Parameters
@@ -1918,6 +1918,9 @@ def _apply_filters(self, level=None, **filters):  # noqa: C901
         regexp = filters.pop("regexp", False)
         keep = np.ones(len(self), dtype=bool)
 
+        if level is not None and depth is not None:
+            raise ValueError("Filter by `level` and `depth` not supported")
+
         if "variable" in filters and "measurand" in filters:
             raise ValueError("Filter by `variable` and `measurand` not supported")
 
@@ -2003,10 +2006,13 @@ def _apply_filters(self, level=None, **filters):  # noqa: C901
             keep = np.logical_and(keep, keep_col)
 
         if level is not None and not ("variable" in filters or "measurand" in filters):
-            # if level and variable/measurand is given, level-filter is applied there
+            # if level is given without variable/measurand, it is equivalent to depth
+            depth = level
+
+        if depth is not None:
             col = "variable"
             lvl_index, lvl_codes = get_index_levels_codes(self._data, col)
-            matches = find_depth(lvl_index, level=level)
+            matches = find_depth(lvl_index, level=depth)
             keep_col = get_keep_col(lvl_codes, matches)
 
             keep = np.logical_and(keep, keep_col)

diff --git a/tests/test_core.py b/tests/test_core.py
@@ -431,22 +431,46 @@ def test_filter_empty_df():
     assert len(obs) == 0
 
 
-def test_variable_and_measurand_raises(test_df):
-    pytest.raises(ValueError, test_df.filter, variable="foo", measurand=("foo", "bar"))
+def test_filter_variable_and_measurand_raises(test_df):
+    with pytest.raises(ValueError, match="Filter by `variable` and `measurand` not"):
+        test_df.filter(variable="foo", measurand=("foo", "bar"))
+
+
+def test_filter_level_and_depth_raises(test_df):
+    with pytest.raises(ValueError, match="Filter by `level` and `depth` not"):
+        test_df.filter(level=1, depth=2)
 
 
 @pytest.mark.parametrize(
     "filter_args",
     (dict(variable="*rimary*C*"), dict(measurand=("*rimary*C*", "EJ/*"))),
 )
-def test_filter_variable_and_depth(test_df, filter_args):
+def test_filter_variable_and_level(test_df, filter_args):
     obs = test_df.filter(**filter_args, level=0).variable
     assert obs == ["Primary Energy|Coal"]
 
+    obs = test_df.filter(**filter_args, level="0+").variable
+    assert obs == ["Primary Energy|Coal"]
+
     obs = test_df.filter(**filter_args, level=1).variable
     assert obs == []
 
 
+@pytest.mark.parametrize(
+    "filter_args",
+    (dict(variable="*rimary*C*"), dict(measurand=("*rimary*C*", "EJ/*"))),
+)
+def test_filter_variable_and_depth(test_df, filter_args):
+    obs = test_df.filter(**filter_args, depth=1).variable
+    assert obs == ["Primary Energy|Coal"]
+
+    obs = test_df.filter(**filter_args, depth="0+").variable
+    assert obs == ["Primary Energy|Coal"]
+
+    obs = test_df.filter(**filter_args, depth=0).variable
+    assert obs == []
+
+
 def test_filter_measurand_list(test_df):
     data = test_df.data
     data.loc[4, "variable"] = "foo"
@@ -460,18 +484,30 @@ def test_filter_measurand_list(test_df):
     assert obs.scenario == ["scen_b"]
 
 
-def test_variable_depth_0_keep_false(test_df):
-    obs = test_df.filter(level=0, keep=False).variable
+@pytest.mark.parametrize(
+    "filter_name",
+    ("level", "depth"),
+)
+def test_variable_depth_0_keep_false(test_df, filter_name):
+    obs = test_df.filter(**{filter_name: 0}, keep=False).variable
     assert obs == ["Primary Energy|Coal"]
 
 
-def test_variable_depth_raises(test_df):
-    pytest.raises(ValueError, test_df.filter, level="1/")
+@pytest.mark.parametrize(
+    "filter_name",
+    ("level", "depth"),
+)
+def test_variable_depth_raises(test_df, filter_name):
+    pytest.raises(ValueError, test_df.filter, **{filter_name: "1/"})
 
 
-def test_variable_depth_with_list_raises(test_df):
-    pytest.raises(ValueError, test_df.filter, level=["1", "2"])
-    pytest.raises(ValueError, test_df.filter, level=[1, 2])
+@pytest.mark.parametrize(
+    "filter_name",
+    ("level", "depth"),
+)
+def test_variable_depth_with_list_raises(test_df, filter_name):
+    pytest.raises(ValueError, test_df.filter, **{filter_name: ["1", "2"]})
+    pytest.raises(ValueError, test_df.filter, **{filter_name: [1, 2]})
 
 
 def test_timeseries(test_df):