Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log validated metadata. #154

Closed
wants to merge 1 commit into from

Conversation

richterdavid
Copy link

Add logging for validated metadata per discussion on openzim/warc2zim#123 and openzim/warc2zim#233

@richterdavid
Copy link
Author

richterdavid commented Apr 18, 2024

This is probably fine but I haven't run local tests yet. The instructions I found for testing are problematic. #153

@richterdavid richterdavid marked this pull request as draft April 18, 2024 22:10
@benoit74
Copy link
Collaborator

See #155 ; I don't think you've implemented this in the right place, but let's wait a bit for discussions to settle

@rgaudin
Copy link
Member

rgaudin commented Apr 19, 2024

  • won't work. this custom logger is not configured to display debug
  • why creating a logger from scratch?
  • how will this be toggled?
  • Do we want this key-value log without context?
  • Do we want to print validated metadata only?

@richterdavid
Copy link
Author

won't work. this custom logger is not configured to display debug
why creating a logger from scratch?

With respect to accessing and configuring the logger, should everything in scraperlib be using this?

logger = getLogger(NAME, level=stdlogging.DEBUG if debug else stdlogging.INFO)

It's evident that folks want unvalidated metadata logged, and presumably referenced by validation failures later. Consider: "Validation of Scrape URL failed" vs "Validation of something failed". Even better f"Validation of Scrape URL failed: {value}" so you don't need to look back to the earlier logging of the relevant metadata.

how will this be toggled?

Ideas?

Do we want this key-value log without context?

Context makes sense in the new proposal to log all metadata early in start(). Given how metadata is added incrementally in this class it didn't make much sense here.

Do we want to print validated metadata only?

Current proposals say no.

Okay, I'm not going to work on this branch further; once there's a plan I'll send a new PR for issue #155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants