-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 155: log metadata prior to verification #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to give you as much help as possible. Hope it will be enough.
Now passing tests and precommits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, good start, of course few things left to fix.
Please also add an entry to the CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! @rgaudin could you please do a final review should I have missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you ; see my few suggestions.
Here's what I have in zimls
. Goal is different than yours but you get the idea.
print("Metadata:")
for name in zim.metadata_keys:
item = zim.get_metadata_item(name)
if item.mimetype.startswith("text/plain"):
try:
preview = bytes(item.content).decode("UTF-8")
except UnicodeDecodeError as exc:
preview = bytes(item.content)
elif as_base64:
preview = (
f"{item.mimetype} ({item.size} bytes) binary: {to_base64(item.content)}"
)
else:
preview = f"{item.mimetype} binary ({item.size} bytes)"
print(f" - {name}: {preview}")
@richterdavid I see you've resolved all conversations. Is it ready for re-review? Also, our convention is that the PR submitter only resolves a conversation should it have been fixed in code directly. |
Yes, please do re-review. I think my resolves complied with that convention. Feel free to reopen any you see fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, see my inline comments
@@ -146,7 +150,35 @@ def config_indexing( | |||
self.__indexing_configured = True | |||
return self | |||
|
|||
def _is_illustration_metadata_name(self, name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using it once and it's a very simple check ; why the extra function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving it a name is a way of making the code a little more self-documenting. Really the prefix should be put somewhere shared and reused as well, but I didn't want to edit all the other places the constant is used in this change.
def _log_metadata(self): | ||
for name, value in sorted(self._metadata.items()): | ||
if self._is_illustration_metadata_name(name) and isinstance(value, bytes): | ||
illus_md = self._get_illustration_metadata_details(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not logging directly? Then you have all the flexibility you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging illus_md inside of get_illustration_metadata_details? I could, but it would spread out knowledge of the metadata logging format. It's preferable to keep that format info in one function.
"""Return True if name is a valid illustration metadata name""" | ||
return name.startswith("Illustration_") | ||
|
||
def _get_illustration_metadata_details( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is not very appealing. You want to know the image format and its dimensions. Why not returning this instead of building a string in some util function?
As per error handling, you have the choice to handle it when calling or inside the function but anyway, better safe than sorry and handle every exception. Currently if PIL cannot read (and it's not UnidentifiedImageError
or if size could not be set to a tuple, it will crash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now it won't crash quite as easily. Agreed that this could return a richer data-structure. Doing so would complicate the caller. Avoiding the complication in the caller by pushing the logging into this function smears the knowledge of the logging format in use around. It also prevents converting this whole thing from N logging calls to 1 call contains N items; the latter may be preferable if functions that do other logging end up being called, causing the metadata logging to be spread out rather than in a block.
for name, value in sorted(self._metadata.items()): | ||
if self._is_illustration_metadata_name(name) and isinstance(value, bytes): | ||
illus_md = self._get_illustration_metadata_details(value) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will log everything that's not bytes
. What if it cannot be represented as string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed be unfortunate. The preferred ways to set up metadata, though, go through typed handlers that ensure it will be bytes. Worst case, IIUC, is that it gets logged as is. That might be annoying if it's a 47K character string or somesuch, but it's not terrible.
@richterdavid we would like to release 3.4.0 in the coming days, we would appreciate if you could achieve fix the few remaining conversations since 3.4.0 ; otherwise this PR will be postponed to 3.5, which is not a concern. |
Please take another look. |
@rgaudin could you maybe propose a commit with what seems reasonable to fix your points. It looks like we are stuck in a silly conversation on "best design" which I've never seen leading to anywhere but lengthy debates. Since we are the maintainers of this lib on the long term (we will have to "pay" for improper design) and since you have more experience on this project, I think that you can deserve to have the "last word". |
Will do |
See #172 as I can't push here |
Issue 155: log metadata prior to verification (PR #160 fixes)
Superseeded by #172 |
Fix #155
There's a few things not working right yet, and this isn't passing pre-commits yet, so I'm pushing this just for advice at this time.