-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve/fix glue job logs printing #30886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glue logging is a mess, I like this solution. Left a couple comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Even if this is closed, I would like to make a note on Glue logging. DEFAULT_LOG_FORMAT = "%(levelname)s:%(name)s:%(module)s:%(message)s"
def get_logger(name: str = None, level: Any = logging.INFO, log_format: str = DEFAULT_LOG_FORMAT) -> logging.Logger:
"""Returns a logger configured for Glue jobs"""
formatter = logging.Formatter(fmt=log_format)
# glue sets its own handlers by default, but they suck.
# this handler redirects INFO, WARNING and DEBUG to sys.stdout
stdout_handler = logging.StreamHandler(sys.stdout)
stdout_handler.setLevel(logging.DEBUG)
stdout_handler.addFilter(lambda record: record.levelno < logging.ERROR)
stdout_handler.setFormatter(formatter)
# this handler redirects ERROR to sys.stderr
stderr_handler = logging.StreamHandler(sys.stderr)
stderr_handler.setLevel(logging.ERROR)
stderr_handler.setFormatter(formatter)
logger = logging.getLogger(name=name)
logger.handlers.clear()
logger.setLevel(level)
logger.addHandler(stdout_handler)
logger.addHandler(stderr_handler)
return logger In effect, this will log all INFO, WARNING and DEBUG to /output and all ERROR to /error. |
@IAL32 Neat. Where does that helper live? Is it added to the script that the job executes? |
Exactly. As far as I know, this is the only way to get logging to work on Glue Jobs. Also note: when grabbing the root logger ( |
there was several problems with the current implementation:
Regarding the fact that we now display both streams, I hesitated between interleaving messages from both, sorting by timestamp, or leaving them separated. I ended up choosing to have them separated to keep the experience consistent with what the user would see in cloudwatch, but I'd be happy to change that to chronological order if people think it's better.
Also: added it to the system test (for better visibility for users) + added utest
cc @ferruzzi