-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display in DEBUG all the important ZIM metadata #123
Comments
Should this issue be moved to the repo for warc2zim? "Confirming output is writable using" is at Line 267 in 367844d
"Found WARC record for favicon" is at Line 563 in 367844d
|
@richterdavid Yes, it is something to implement in warc2zim. |
I'd like to work on this. |
I don't think this issue is relevant anymore, at least not as it is phrased / oriented today. What has already been decided for all scrapers is that we must use the This still has to be implemented in warc2zim, I've just opened #123 Do we consider it would still help to display all metadata once validated, e.g. for the case where the validation checks are improperly implemented? If yes, this would anyway be better to do this in python-scraperlib so that all scrapers benefit from this enhancement. |
Oups, I'm wrong, the validation does not display the offending value. Maybe we can consider to also display the offending value. |
Yes, this is so important! |
I guess this is the wrong issue number
Yes and yes |
Yes, proper issue is #235
Closing this issue then, we will implement the necessary in python scraperlib and next upgrade of the dependency in warc2zim will deploy it "automatically", I don't think we need to track this here. |
I've just opened it: openzim/python-scraperlib#155 |
@benoit74 Few points regarding the process
|
Last comment mostly invalidated by newly created ticket at scraperlib. @benoit74 thx |
Next time I will open the ticket before closing the other one ^^ #235 is indeed a different thing. I will most probably implement it, but it is not necessary. It is not urgent because we anyway validate metadata now with recent scraperlib, the goal of 235 is to do it as early as possible, i.e. enhance current behavior where validation is done a bit late since it is not done in the "early check" done by zimit but after the crawl and after some processing of warc2zim. |
If I look to a log I see something like this:
I would like to list here all the important ZIM metadata, in particular the
Description
and theLongDescription
which might just after make the whole process die (if bigger than the maximal size). See https://wiki.openzim.org/wiki/Metadata for the whole list of ZIM metadata.The text was updated successfully, but these errors were encountered: