Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add global config interface #149

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

RNKuhns
Copy link
Contributor

@RNKuhns RNKuhns commented Mar 13, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This adds a global config interface. It allows the retrieval of the global config via get_config, updating the global config via set_config, finding out what skbase's default config is via get_default_config, resetting the config to the default via reset_config, and a config context manager via config_context.

The PR also updates the local config interface's config retrieval (e.g., BaseObject().get_config()) so that it will retrieve the global config and then update it based on local config.

It also adds the __skbase_get_config__ extension point. If it is defined on a descendent class and returns a dict, then the returned dict is used to update the local copy of the global config. This is useful for downstream packages that use skbase. They can implement their own extension to skbase that reads their own package's global config, letting the local config retrieve skbase's config and the downstream package config. It also allows the local config (e.g. BaseObject.set_config or config defined in BaseObject._config) to override both the skbase config and the descendent package's config (which is what we want so that the interface works for downstream users).

Order of precedence is:

  1. Retrieve a copy of the global config
  2. If __skbase_get_config__ is defined and returns a dict, then it is used to update the copy of the global config
  3. Use local config interface to update the copy of the global config

One point I was not entirely sure on was how to treat the local configuration (override of global config) using invalid values for the skbase global configuration variables. For example, print_changed_only is a skbase global config (that can optionally be configured on an object or globally) to determine if the pretty printed representations of BaseObjects shoudl print only parameters that are different from their default or all parameters. It is boolean.

Suppose that someone tries to do:

BaseObject().set_config(print_changed_only = 7)

Should we stop that from occuring in the case where we know it is invalid (we only know that for skbase configurable parameters, not extension configurations)? Right now, I'm allowing the assignment to occur, but when calling get_config, I'm not letting the local config override the global config when the local value is invalid (in this case not boolean).

Does your contribution introduce a new dependency? If yes, which one?

There is not any new dependencies added.

What should a reviewer concentrate their feedback on?

Does design make sense? Do tests cover any cases we can think of?

Any other comments?

PR checklist

For all contributions
  • I've reviewed the project documentation on contributing
  • I've added myself to the list of contributors.
  • The PR title starts with either [ENH], [CI/CD], [MNT], [DOC], or [BUG] indicating whether
    the PR topic is related to enhancement, CI/CD, maintenance, documentation, or a bug.
For code contributions
  • Unit tests have been added covering code functionality
  • Appropriate docstrings have been added (see documentation standards)
  • New public functionality has been added to the API Reference

@RNKuhns RNKuhns changed the title Add global config interface [ENH] Add global config interface Mar 13, 2023
@@ -1014,33 +1046,184 @@
self.c = 84


def test_set_get_config():
"""Test logic behind get_config, set_config.
class AnotherConfigTester(BaseObject):

Check warning

Code scanning / CodeQL

`__eq__` not overridden when adding attributes

The class 'AnotherConfigTester' does not override ['__eq__'](1), but adds the new attribute [a](2). The class 'AnotherConfigTester' does not override ['__eq__'](1), but adds the new attribute [b](3). The class 'AnotherConfigTester' does not override ['__eq__'](1), but adds the new attribute [c](4).
self.c = 84


class ConfigExtensionInterfaceTester(BaseObject):

Check warning

Code scanning / CodeQL

`__eq__` not overridden when adding attributes

The class 'ConfigExtensionInterfaceTester' does not override ['__eq__'](1), but adds the new attribute [a](2). The class 'ConfigExtensionInterfaceTester' does not override ['__eq__'](1), but adds the new attribute [b](3). The class 'ConfigExtensionInterfaceTester' does not override ['__eq__'](1), but adds the new attribute [c](4).
skbase/tests/test_base.py Fixed Show fixed Hide fixed
skbase/tests/test_base.py Fixed Show fixed Hide fixed
skbase/config/__init__.py Fixed Show fixed Hide fixed
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2023

Codecov Report

Merging #149 (91cbe4f) into main (18435de) will increase coverage by 1.55%.
The diff coverage is 98.09%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #149      +/-   ##
==========================================
+ Coverage   82.68%   84.24%   +1.55%     
==========================================
  Files          32       37       +5     
  Lines        2327     2589     +262     
==========================================
+ Hits         1924     2181     +257     
- Misses        403      408       +5     
Impacted Files Coverage Δ
skbase/base/_base.py 81.78% <77.27%> (-0.44%) ⬇️
skbase/config/__init__.py 100.00% <100.00%> (ø)
skbase/config/_config.py 100.00% <100.00%> (ø)
skbase/config/_config_param_setting.py 100.00% <100.00%> (ø)
skbase/config/tests/__init__.py 100.00% <100.00%> (ø)
skbase/config/tests/test_config.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@RNKuhns RNKuhns requested a review from fkiraly March 13, 2023 06:45
@RNKuhns
Copy link
Contributor Author

RNKuhns commented Mar 13, 2023

@fkiraly I plan on adding a few more simple tests of uncovered edge cases to increase coverage. But this is ready for feedback in the interim.

@RNKuhns RNKuhns mentioned this pull request Mar 13, 2023
13 tasks
Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm - looks good but I am not sure I fully understand the design.

If this were a downstream package, I would be agreed.

But what ist the extension case here? That is, someone using skbase to make a package.

I think this is going to be problematic, as it now ties every descendant of BaseObject to the global config in skbase.

Instead, I would invert the pointer, say, add a method get_global_config or similar that by default does nothing in the downstream package, but can be linked to a global config in the downstream package (which in turn is templated by skbase).

@RNKuhns
Copy link
Contributor Author

RNKuhns commented Mar 15, 2023

@fkiraly good questions.

If this were a downstream package, I would be agreed.
But what ist the extension case here? That is, someone using skbase to make a package.
I think this is going to be problematic, as it now ties every descendant of BaseObject to the global config in skbase.

There are two things at play:

  1. Reliance of downstream packages to the config or options of an upstream package.
  2. Should we let users override global skbase functionality at the instance level via BaseObject.set_config

I've included more details on both below.

Reliance of downstream packages to the config or options of an upstream package.

There are multiple cases where downstream Python packages need to call an upstream package's config/options to change behavior in a way that affects downstream (I'm showing scikit-learn example, but other packages like pandas, dask, ray, etc. all have config/options/settings that change how things run in downstream packages that use them.

For example, packages that use scikit-learn's BaseEstimator already are tied to having to set the global config for scikit-learn to change the behavior of BaseEstimator. The code example below shows a stylized pretty printing example.

For example:

from sklearn import set_config
from sklearn.base import BaseEstimator

class SomeDownstreamPackageClass(BaseEstimator):
    """This class implements some awesome estimator."""
    
    def __init__(self, some_param=7):
        self.some_param = some_param

downstream_estimator = SomeDownstreamPackageClass()
print(downstream_estimator)
# prints SomeDownstreamPackageClass()

set_config(print_changed_only=False)  # True is default
print(downstream_estimator)
# prints SomeDownstreamPackageClass(some_param=7)

The first part of the global config interface (part importable from skbase.config) essentially adds the ability to global configure how skbase code artifacts should perform (for now it is just pretty printing but could be more in future).

But this has the downside that it means the local BaseObject.set_config does not affect the instances pretty printing behavior.
As it stands now, if an sktime or skbase class that has a local set_config method called it, that would not be able to override the scikit-learn global configuration. The second part of the PR that ties into BaseObject.get_config ensures that this is possible. There isn't really a cost here, since we get a copy of the skbase config and then potentially override it. The packages are already depending on the skbase config because it controls how the pretty printing works, so we are really just giving them extra flexibility to override it at the instance level.

Should we let users override global skbase functionality at the instance level via BaseObject.set_config

I think a skbase.BaseObject should be able to return all the configurations that impact how it behaves. All BaseObjects are affected by the skbase configuration for how they should be pretty printed. So any descendent of BaseObject should be able to report the dependency via BaseObject.get_config and override the behavior locally on the object via BaseObject.set_config. Users who want to override the behavior globally for all their descendent objects can use the skbase.config.set_config interface similar to how a user would override pretty printing behavior of sktime objects using scikit-learn's set_config.

The question about how downstream users can extend the BaseObject to allow overrides of their own global interface is discussed in the next section.

Instead, I would invert the pointer, say, add a method get_global_config or similar that by default does nothing in the downstream package, but can be linked to a global config in the downstream package (which in turn is templated by skbase).

This is similar to what the behavior of the __skbase__get_config__ dunder method does in the PR. If it is not present then nothing occurs. But if it is present (and returns a dictionary) then it is assumed to be the entry point for downstream users to provide their global config to the BaseObject. I'm open to it being a BaseObject method that we default to returning an empty dict and tell downstream users to override for this behavior. But I also think we should consider using a __skbase_some_functionality__ dunder method interface for this and other extension points (another point is letting downstream packages provide custom cloning logic to be used by clone method, for cases when special consideration need to be given to certain parameter values beyond that done for the random number generators. In this case a non-public method would also work).

But from a configuration perspective, all objects in a downstream class are potentially affected by:

  1. skbase global config (pretty printing currently. Potentially other behavior in the future)
  2. global config of downstream package
  3. Local user config on instance

The BaseObject.get_config as set up in the PR retrieves the configurations in that order. BaseObject.set_config lets users change the local intance config, but not affect the skbase global config or the downstream package config.

# When calling the method with invalid value it should raise user warning
# And the warning should start with `msg` if it is passed
with pytest.warns(UserWarning, match=r"some message.*"):
returned_value = some_config_param.get_valid_param_or_default(

Check notice

Code scanning / CodeQL

Unused local variable

Variable returned_value is not used.
set_config(do_something_else=True)

with pytest.raises(TypeError):
set_config(True)

Check failure

Code scanning / CodeQL

Wrong number of arguments in a call

Call to [function set_config](1) with too many arguments; should be no more than 0.
def test_set_config_invalid_keyword_argument():
"""Test set_config behavior when invalid keyword argument passed."""
with pytest.raises(TypeError):
set_config(do_something_else=True)

Check failure

Code scanning / CodeQL

Wrong name for an argument in a call

Keyword argument 'do_something_else' is not a supported parameter name of [function set_config](1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants