-
-
Notifications
You must be signed in to change notification settings - Fork 19
Issue 155: log metadata prior to verification #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,14 +20,18 @@ | |
from __future__ import annotations | ||
|
||
import datetime | ||
import io | ||
import logging | ||
import pathlib | ||
import re | ||
import weakref | ||
from collections.abc import Callable, Iterable | ||
from typing import Any | ||
|
||
import libzim.writer # pyright: ignore | ||
import PIL.Image | ||
|
||
from zimscraperlib import logger | ||
from zimscraperlib.constants import ( | ||
DEFAULT_DEV_ZIM_METADATA, | ||
FRONT_ARTICLE_MIMETYPES, | ||
|
@@ -62,6 +66,9 @@ | |
) | ||
|
||
|
||
TUPLE_SIZE_2D = 2 | ||
|
||
|
||
def mimetype_for( | ||
path: str, | ||
content: bytes | str | None = None, | ||
|
@@ -146,7 +153,39 @@ def config_indexing( | |
self.__indexing_configured = True | ||
return self | ||
|
||
def _is_illustration_metadata_name(self, name: str) -> bool: | ||
"""Return True if name is a valid illustration metadata name""" | ||
return name.startswith("Illustration_") | ||
|
||
def _get_illustration_metadata_details( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this function is not very appealing. You want to know the image format and its dimensions. Why not returning this instead of building a string in some util function? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, now it won't crash quite as easily. Agreed that this could return a richer data-structure. Doing so would complicate the caller. Avoiding the complication in the caller by pushing the logging into this function smears the knowledge of the logging format in use around. It also prevents converting this whole thing from N logging calls to 1 call contains N items; the latter may be preferable if functions that do other logging end up being called, causing the metadata logging to be spread out rather than in a block. |
||
self, | ||
value: bytes, | ||
) -> str | None: | ||
"""Return image format for debug logging of illustration metadata""" | ||
try: | ||
with PIL.Image.open(io.BytesIO(value)) as img: | ||
if ( | ||
img is not None | ||
and img.size is tuple | ||
and len(img.size) >= TUPLE_SIZE_2D | ||
): | ||
return f"{img.format} {img.size[0]}x{img.size[1]}" | ||
except BaseException as e: | ||
return f"Image format issue: {e}" | ||
return f"Unknown image format, {len(value)} bytes" | ||
|
||
def _log_metadata(self): | ||
for name, value in sorted(self._metadata.items()): | ||
if self._is_illustration_metadata_name(name) and isinstance(value, bytes): | ||
illus_md = self._get_illustration_metadata_details(value) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not logging directly? Then you have all the flexibility you want There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. logging illus_md inside of get_illustration_metadata_details? I could, but it would spread out knowledge of the metadata logging format. It's preferable to keep that format info in one function. |
||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will log everything that's not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would indeed be unfortunate. The preferred ways to set up metadata, though, go through typed handlers that ensure it will be bytes. Worst case, IIUC, is that it gets logged as is. That might be annoying if it's a 47K character string or somesuch, but it's not terrible. |
||
illus_md = None | ||
logger.debug(f"Metadata: {name} = {(illus_md if illus_md else value)}") | ||
|
||
def start(self): | ||
if logger.isEnabledFor(logging.DEBUG): | ||
self._log_metadata() | ||
|
||
if not all(self._metadata.get(key) for key in MANDATORY_ZIM_METADATA_KEYS): | ||
raise ValueError("Mandatory metadata are not all set.") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using it once and it's a very simple check ; why the extra function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving it a name is a way of making the code a little more self-documenting. Really the prefix should be put somewhere shared and reused as well, but I didn't want to edit all the other places the constant is used in this change.