Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openml registers handler on root logger #736

Closed
PGijsbers opened this issue Jul 11, 2019 · 5 comments · Fixed by #861
Closed

openml registers handler on root logger #736

PGijsbers opened this issue Jul 11, 2019 · 5 comments · Fixed by #861

Comments

@PGijsbers
Copy link
Collaborator

PGijsbers commented Jul 11, 2019

openml registers a streamhandler with the root logger, because it uses logging.basicConfig. That functions registers a logging.StreamHandler with the root logger if it does not yet have a handler. This means that logging of other packages may automatically be processed by this streamhandler. An example to demonstrate:

import logging

if __name__ == '__main__':
    #  Commenting out the line below prevents `basicConfig` from registering a streamhandler
    #  logging.getLogger().addHandler(logging.NullHandler())
    
    log = logging.getLogger('mypackage')
    log.setLevel(logging.DEBUG)

    log.info("Message one")  # first call, no log handler registered
    import openml
    log.info("Message two")  # second call, log handler registered by openml

output:

[INFO] [14:47:46:mypackage] Message two

Due to the logging.basicConfig behavior, if a handler already is registered (such as is done in the commented out line), it will not add a logging.StreamHandler. So there is no output if the relevant line of code is commented out.

I also see a verbosity in the config defaults, but I don't see it used anywhere. Am I missing something?

I think adding a logging.StreamHandler to the root logger is undesirable behavior. Developers should have full autonomy over their logging handlers without resorting to adding logging.NullHandler. I think we should avoid using logging.basicConfig to set our streamhandler. Instead, we should register a streamhandler to an openml specific logger instead of the root (i.e. log = logging.getLogger('openml') or log = logging.getLogger(__name__)). We should make sure to take into account the verbosity level as defined in the config when doing so.

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 12, 2019

Thanks for bringing that up. Actually, the package should have a good default logging behavior, but should allow flexible configuration and fully disabling it, too. I currently don't know the best way to achieve this unfortunately.

@PGijsbers
Copy link
Collaborator Author

What do you consider 'good default logging behavior'? For me it would be:

  • Have a package's root logger (e.g. logging.getLogger('openml'), not logging.getLogger()) configured at logging.DEBUG. This means any logging.Handler registered will by default pick up all messages, limited by the logging.Handler.level.
  • Have each module log to its own logger (i.e. logging.getLogger(__name__)).
    As the name will be openml. ... any logging.Handler registered with the package root logger will also pick up on messages sent by the package root logger (it keeps a hierarchy based on name). But if we're alternatively only interested in just registering for messages from submodules, we can directly register a logging.Handler with their logger.
  • My preference would be that any logging.Handler registers only with the package root logger, and only if the openml config file specifies it (but this can be a default). By default, I would only consider using a logging.FileHandler to write the log to a file. Perhaps writing to a new file each day, and removing any files older than a week automatically (again exposing freedom for how long (if at all) to retain logs through the configuration file).

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 15, 2019

Have a package's root logger (e.g. logging.getLogger('openml'), not logging.getLogger()) configured at logging.DEBUG. This means any logging.Handler registered will by default pick up all messages, limited by the logging.Handler.level

I would not show debug messages by default, but I agree on the logger name.

Have each module log to its own logger (i.e. logging.getLogger(name)).
As the name will be openml. ... any logging.Handler registered with the package root logger will also pick up on messages sent by the package root logger (it keeps a hierarchy based on name). But if we're alternatively only interested in just registering for messages from submodules, we can directly register a logging.Handler with their logger.

Agreed.

My preference would be that any logging.Handler registers only with the package root logger, and only if the openml config file specifies it (but this can be a default). By default, I would only consider using a logging.FileHandler to write the log to a file. Perhaps writing to a new file each day, and removing any files older than a week automatically (again exposing freedom for how long (if at all) to retain logs through the configuration file).

I didn't think of logging to the openml directory by default, but that appears to be a great idea. The user could still configure the package to output to the command line.

@PGijsbers
Copy link
Collaborator Author

I would not show debug messages by default, but I agree on the logger name.

I'm not advocating showing (to console?) debug messages by default either. There's two levels to set, that of the logger and those of the logger.Handler. No messages would be shown if no handlers are set to the logger. A message is only processed if the level is met by both the log-object and the handler. By default no handlers would be set. To illustrate:

>>> import logging
>>> openml_log = logging.getLogger('openml')
>>> openml_log.setLevel(logging.DEBUG)
>>> openml_log.error("With the exception of error-level messages")
With the exception of error-level messages
>>> openml_log.info("No messages are written to console.")
>>> # if you want to change this default behavior, and have not even error messages
... # written to stdout, you can add a `logging.NullHandler()`.
... # Now we add a 'logging.StreamHandler' to actually write to stdout
...
>>> import sys
>>> sh = logging.StreamHandler(sys.stdout)
>>> sh.setLevel(logging.INFO)
>>> openml_log.addHandler(sh)
>>> openml_log.info("Now info messages are also written to stdout.")
Now info messages are also written to stdout.
>>> openml_log.debug("but not debug messages.")  # because `sh.level == logging.INFO`
>>> sh.setLevel(logging.DEBUG)
>>> openml_log.debug("Now even DEBUG messages are written to stdout.")
Now even DEBUG messages are written to stdout.
>>> openml_log.setLevel(logging.ERROR)
>>> openml_log.warning("By setting the log level of the log differently, handlers wont receive all messages.")  # because `openml_log.level == logging.ERROR`
>>> # Nothing written to stdout

By setting the log level of the log (not the handler!) to logging.DEBUG, we allow any handler to subscribe to any level of messages it is interested in.


I think the proposed changes will be relatively easy to implement. I don't have the time currently (when I do, I'll first finish the 'short names' PR). It's a low priority issue but if it stays open I can tackle it end of the month/in August.

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 17, 2019

Got it, that totally makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants