Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpret verbosity using logging #745

Merged
merged 12 commits into from
Nov 6, 2023

Conversation

GStechschulte
Copy link
Collaborator

@GStechschulte GStechschulte commented Oct 26, 2023

Resolves PR #740. I am opening as a draft PR to collect feedback on my initial idea and design.

In order to make interpret verbose about some of the "hidden" computations such as default values, I am creating a config instance (similar to formulae) that allows the user to optionally turn on or off the logging of various information. To log information to the console, I want to use the built in logging module to create a logger object specific to the interpret module.

Then, I create a log_interpret_functions decorator so I can simply decorate the desired functions in utils.py that compute default values. This allows me to extend the functionality of the existing util functions without changing the underlying code, and thereby separating the logging code into a different file logs.py.

Below is a demo:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import bambi as bmb

# logger is the config instance and we set messages to True (default is False)
bmb.interpret.logger.messages = True

data = bmb.load_data("mtcars")
data["am"] = pd.Categorical(data["am"], categories=[0, 1], ordered=True)
model = bmb.Model("mpg ~ hp * drat * am", data)
idata = model.fit(tune=250, draws=250, random_seed=1234)

Calling predictions:

# model includes hp, am, drat so defaults computed for am and drat
df = bmb.interpret.predictions(
    model,
    idata,
    conditional={
        "hp": [100, 150]
    }
)
INFO:root:Default values computed for: {'am', 'drat'}

Calling plot_predictions

# model includes hp, am, drat so defaults computed for am
bmb.interpret.plot_predictions(
    model,
    idata,
    conditional={
        "hp": [100, 150],
        "drat": [2.5, 3.5]
    }
);
INFO:root:Default values computed for: {'am'}

Calling predictions where no default values are computed means no messages.

df = bmb.interpret.predictions(
    model,
    idata,
    conditional={
        "hp": [100, 150],
        "drat": [2.5, 3.5],
        "am": [0, 1],
    }
)

I need to make sure the logger object I am creating and referencing is not messing with the "bambi" logger in bambi/__init__.py (which I think it is because of the INFO:root: tag in the console logs above). Using logging in multiple modules should be possible according to the logging cookbook. Thus, Bambi would have two loggers: one for the main module bambi, and then one for interpret.

Before I move forward with this implementation, I want to get feedback from the community and or anyone with more experience in logging. Thanks!



def log_interpret_defaults(func):
interpret_logger = logger.get_logger()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, you need to pass getLogger a name to get the same logger. As it's done here

_log = logging.getLogger("bambi")
.

So you could do logger.get_logger("bambi_interpret")

Copy link
Collaborator Author

@GStechschulte GStechschulte Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. When I did this, it wasn't returning the info logs to the console. Once I removed the name, it returned the info logs. I will look into it more.


def log_interpret_defaults(func):
interpret_logger = logger.get_logger()
interpret_logger.setLevel(logging.INFO)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you need this

bambi/bambi/__init__.py

Lines 32 to 36 in d001e06

if not logging.root.handlers:
_log.setLevel(logging.INFO)
if len(_log.handlers) == 0:
handler = logging.StreamHandler()
_log.addHandler(handler)

I don't remember the details of when it was added. But I think it makes sure the messages are sent once to the place where you want to send them.

if not logger.messages:
return func(*args, **kwargs)

if func.__name__ == "set_default_values":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should handle specific cases here. I think it would be better to allow the decorator to receive arguments and then determine what to do from those.

Also, in this case, the function receives a dictionary and returns a dictionary. So you compare keys. Are there any other cases we may be interested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should handle specific cases here. I think it would be better to allow the decorator to receive arguments and then determine what to do from those.

I agree. That would generalize better.

Also, in this case, the function receives a dictionary and returns a dictionary. So you compare keys. Are there any other cases we may be interested?

There are two functions that compute default values. Both in utils.py.

  1. set_default_values that computes default values for covariates specified in the model, but not passed to conditional.
  2. set_default_variable_values that computes default contrast or wrt values for comparisons and slopes, respectively if the user does not pass a value (but they obviously need to pass the variable name of interest).

These are the functions that I initially wanted to decorate. However, in other scenarios, e.g. a user passes a list of covariate names to conditional, but no values, then the following functions are called to compute default values:

  1. make_main_values that computes main values based on original data using a grid of evenly spaced values for numeric predictors and unique levels for categoric predictors.
  2. make_group_values that computes group values based on original data using unique levels for categoric predictors and quantiles for numeric predictors.

Knowing that a np.linspace or np.quantile would also be informative for the user. I will decorate these functions also since they compute default values as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great

@tomicapretto
Copy link
Collaborator

I didn't have the decorator idea in my mind when we discussed logging. But I do think it's a very neat idea. I'm not 100% sure if it generalizes well. Maybe, it's enough for it to generalize to the cases we care about. Do we know if we're going to decorate functions with different a wider variety of input/output types?

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Oct 27, 2023

I didn't have the decorator idea in my mind when we discussed logging. But I do think it's a very neat idea. I'm not 100% sure if it generalizes well. Maybe, it's enough for it to generalize to the cases we care about. Do we know if we're going to decorate functions with different a wider variety of input/output types?

That is also what I am questioning myself on after replying to your comment above with the list of functions that compute default values. Most function parameters accept an array and return an array. Generally speaking, you will see the following:

set_default_values(dict) -> dict
set_default_variable_values(self) -> array (this is a method in class VariableInfo)
make_main_values(array) -> array
make_group_values(array) -> array

Unless I added parameters to the decorator or had different decorators for different functions, then I don't see how it will generalize. I am open to other ideas. I don't have too much time sunk into this PR. hah

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GStechschulte GStechschulte marked this pull request as ready for review November 1, 2023 09:39
@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Nov 1, 2023

Based on the discussion above, the following functions that compute default values

  • set_default_values
  • set_default_variable_values
  • make_main_values
  • make_group_panel_values: this is decorated instead of make_group_values as this function is called within make_group_panel_values

now have a decorator @log_interpret_defaults that log a message(s) to console if the user sets bmb.interpret.logger.messages = True.

I decided to keep the handling of specific cases, i.e., if func.__name__ == <some func that computes default values> as this allows more detailed messages to be logged as the logger can determine if a default is computed for the contrast or wrt variable, or conditional covariates. For example (using the model from above):

bmb.interpret.comparisons(
    model,
    idata,
    contrast="hp",
    conditional="drat"
)
Default computed for contrast variable: hp
Default computed for main variable: drat
Default computed for unspecified variable: am

I am open to changing the specific cases if anyone has a better design idea. I have also added tests test_interpret_messages.py that tests comparisons, predictions, and slopes for different scenarios of default computations and the expected log messages.

@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2023

Codecov Report

Merging #745 (eaf14d7) into main (602bcc6) will increase coverage by 0.05%.
The diff coverage is 92.18%.

@@            Coverage Diff             @@
##             main     #745      +/-   ##
==========================================
+ Coverage   89.78%   89.84%   +0.05%     
==========================================
  Files          43       45       +2     
  Lines        3604     3664      +60     
==========================================
+ Hits         3236     3292      +56     
- Misses        368      372       +4     
Files Coverage Δ
bambi/__init__.py 78.94% <100.00%> (+1.16%) ⬆️
bambi/interpret/create_data.py 100.00% <100.00%> (+1.75%) ⬆️
bambi/interpret/utils.py 88.37% <100.00%> (+0.27%) ⬆️
bambi/config.py 95.00% <95.00%> (ø)
bambi/interpret/logs.py 96.29% <96.29%> (ø)
bambi/interpret/__init__.py 66.66% <50.00%> (-33.34%) ⬇️

... and 1 file with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@tomicapretto
Copy link
Collaborator

You successfully implemented a solution and I think that's great. As far as I understand, the decorator doesn't have arguments that determine how to behave with that function. Instead, the decorator "knows" about some functions, and the functionality for those is hardcoded. I think it's fine and I wouldn't invest a lot of effort in modifying it.

The only thing I want to raise is that I'm not sure why we need an InterpretLogger class. My points are the following

  • Its only method, get_logger(), is the same as doing logging.getLogger("interpret"). I would replace it with logging.getLogger("bambi-interpret") which identifies the logger in a more unique manner. All the initialization stuff related to the handler could be handled in bambi/interpret/__init__.py in a similar way to the logger we have in bambi/__init__.py.
  • It also provides configuration facilities, the messages attributes. But not much is guaranteed. I could set it to whatever I wanted and Bambi will not complain. This is Python and it's very flexible. But as this is a configuration variable, I think it would be nice to handle this better. What I have in mind for this configuration variable is something like the configuration in formulae, which is the single place for configuration variables in formulae and it also checks users pass values that are allowed. https://github.com/bambinos/formulae/blob/master/formulae/config.py

Regarding these last two points, I'm more than happy to add an implementation to this existing PR as you have already done a lot of effort @GStechschulte, just let me know what you think

@GStechschulte
Copy link
Collaborator Author

Thank you @tomicapretto, and your interpretation is correct. The two points you bring up are great, and I think they would both enhance this PR. Since you have already implemented this in formulae, and if you don't think it will take you long, that would be greatly appreciated 😄

Copy link
Collaborator

@tomicapretto tomicapretto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to be approving my own changes haha. @GStechschulte I think this is good to go. Let me know if you have questions or suggestions about the changes.

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Nov 6, 2023

Feels weird to be approving my own changes haha. @GStechschulte I think this is good to go. Let me know if you have questions or suggestions about the changes.

LGTM 👍🏼 Thanks a lot! I was testing it with some of our docs notebooks and it works great. However, we log messages to the console by default when using interpret functions. Do we want to keep the default as True, or set to False? To achieve this, we just need to make sure False is the first element of the tuple in the following:

__FIELDS = {"INTERPRET_VERBOSE": (False, True)}

I am updating docs related to interpret to reflect this feature request. Then, I will merge.

@tomicapretto
Copy link
Collaborator

@GStechschulte i would set it to True by default. Otherwise, I think people not familiar with how it works will not realize this is happening. This way, the message will be a signal telling something is happening.

@GStechschulte
Copy link
Collaborator Author

GStechschulte commented Nov 6, 2023

@tomicapretto I updated the three notebooks related to interpret to inform the user the purpose of this logging and how to turn it off. I will merge once the checks are ✅ Many thanks!

@GStechschulte GStechschulte merged commit ee266d9 into bambinos:main Nov 6, 2023
4 checks passed
@GStechschulte GStechschulte deleted the interpret-verbose branch January 21, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants