-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly enhance the safety of metadata manipulation #221
Conversation
945daa8
to
c064b54
Compare
2bccb8d
to
6033f68
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #221 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 38 39 +1
Lines 2227 2447 +220
Branches 426 335 -91
==========================================
+ Hits 2227 2447 +220 ☔ View full report in Codecov by Sentry. |
e759a36
to
298beef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's a lot of change!
See inline comments ; maybe we should discuss it live once you've looked at it
Here's my attempt at making those explicit, dedicated metadata easier to apprehend and use. I wrote this quickly at that time then a loong time passed before I looked at it and had to fix the tests to make it work.
I chose an approach in which the flexibility is built into the base Metadata class, using Class variables that gets overriden by the subclasses and can be shadowed by instances should need be. One thing that's a tad annoying is that Another take was to spot accepting bytes as inputs as well as strings where there's no good reason to. For instance, DateMetadata takes a date or datetime only now. Supporting those extra stuff is an additional burden and there's no real value in our usecase. Major changes that has to be introduced (we'll work on CHANGELOG if we pursue with this)
We need to validate this but this frees us from a lot of meaningless tests and really simplifies dev and maintenance.
I wonder if we should offer a way to create the StandardMetadataSet with values directly (to make it less verbose). Would be quite easy now with something like (assuming the expects_* decorator add that type: class StandardMetadataList:
...
@classmethod
def from_values(
cls,
Name: NameMetadata.input_type,
Language: LanguageMetadata.input_type,
Title: TitleMetadata.input_type,
Creator: CreatorMetadata.input_type,
Publisher: PublisherMetadata.input_type,
Date: DateMetadata.input_type,
Illustration_48x48_at_1: DefaultIllustrationMetadata.input_type,
Description: DescriptionMetadata.input_type,
LongDescription: LongDescriptionMetadata.input_type | None = None,
Tags: TagsMetadata.input_type | None = None,
Scraper: ScraperMetadata.input_type | None = None,
Flavour: FlavourMetadata.input_type | None = None,
Source: SourceMetadata.input_type | None = None,
License: LicenseMetadata.input_type | None = None,
Relation: RelationMetadata.input_type | None = None,
): ... All tests are passing. I did not write new onesbut I had to update |
- Explicit callback definition - simplified delete_callback to be a dumb callback (not chaining)
Reasoning: coverage reported a lot of missing lines on zim/metadata.py with previous version Also includes auto linting where new ruff complained
In order to properly expose input type in __init__ (for pyright and user assit), use one base class (subclassing Metadata) per input type. Cant get rid of the `Any` on `Metadata` init (otherwise would me re-implement the init everywhere). Used the opportunity to remove the `expecting` classvar and modified tests accordingly - Also fixed a minor issue in bytes reading by seeking back to previous position and not zero. - Also shared binary reading logic inside main base class (was already there) so it can be reused in illustration - Now explicitly says the type of stored data (can be different to inputs in somewhat flexible ones)
a2c1456
to
7e2efa1
Compare
Thanks a lot, nothing left to add, I like it! Glad we've made this "not-negligible" move. I just force-push to fixup commits and rebase on main. |
Fix #205
This is a full rewrite of #217, so I've opened a new PR since changes since last review made no more sense from my PoV.
zim.metadata.check_metadata_conventions
zim.creator.Creator.config_metadata
by using these types and been more strict:StandardMetadata
class for standard metadata, including list of mandatory oneX-
prefixfail_on_missing_prefix
argumentadd_metadata
, use same metadata typeszim.creator.Creator.start
with new types, and drop all metadata from memory after being passed to the libzimzim.creator.convert_and_check_metadata
(not usefull anymore, simply use proper metadata type)MANDATORY_ZIM_METADATA_KEYS
andDEFAULT_DEV_ZIM_METADATA
fromconstants
tozim.metadata
to avoid circular dependenciesinputs.unique_values
utility function to compute the list of uniques values from a given list, but preserving initial list order__init__
ofzim.creator.Creator
, renamedisable_metadata_checks
tocheck_metadata_conventions
for clarity and brevityzim.metadata.check_metadata_conventions
, so if you have many creator running in parallel, they can't have different settings, last one initialized will "win"Nota:
tests/zim/test_zim_creator.py
totests/zim/test_metadata.py
since most checks are now done at metadata initialization instead of whenconfig_metadata
orstart
are called, but coverage is similar