Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated Flag for Uncrawlable Publishers #534

Merged
merged 12 commits into from
Jun 19, 2024
Merged

Conversation

addie9800
Copy link
Collaborator

This PR adds a flag for uncrawlable publishers. The Publisher Coverage Script is updated to skip publishers marked as deprecated. Also, deprecated publishers are marked as such by strike-through in the supported_publishers.md

Copy link
Collaborator

@MaxDall MaxDall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding! Looks awesome 👍

src/fundus/scraping/crawler.py Outdated Show resolved Hide resolved
src/fundus/scraping/crawler.py Outdated Show resolved Hide resolved
src/fundus/scraping/crawler.py Outdated Show resolved Hide resolved
@MaxDall
Copy link
Collaborator

MaxDall commented Jun 18, 2024

@addie9800 I moved the ignore_deprecated flag from the crawl method to Crawler. The paradigm of crawl is to only include parameters affecting all crawlers (forward, backward). Since the depreciation flag does not serve any purpose regarding backward crawling, I decided to move it to the crawler. What do you think about that?

@addie9800
Copy link
Collaborator Author

@addie9800 I moved the ignore_deprecated flag from the crawl method to Crawler. The paradigm of crawl is to only include parameters affecting all crawlers (forward, backward). Since the depreciation flag does not serve any purpose regarding backward crawling, I decided to move it to the crawler. What do you think about that?

I think it's a good idea. I also updated the docs a bit. Maybe you can also say what you think about that

@MaxDall
Copy link
Collaborator

MaxDall commented Jun 19, 2024

@addie9800 thanks for updating the documentation. Unfortunately the latest documentation commit breaks a lot of links. Also tutorial 4 now seems completely overwhelming and out of focus while tutorial 5 looks like it was cannibalized. The purpose of tutorial 4 is to enable the user to filter for specific articles. search or deprecatrd isn't as nearly as important here. The latter ones are mostly for convenience. I would let tutorial 4 as it was and add the deprecation mechanica to the former tutorial 5.

@addie9800
Copy link
Collaborator Author

Oh, and I thought I had caught the broken links. Well, I undid the previous changes and just appended it to section 5 of the tutorial

Copy link
Collaborator

@MaxDall MaxDall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Looks really great :)

@addie9800 addie9800 merged commit e0d1708 into master Jun 19, 2024
5 checks passed
@addie9800 addie9800 deleted the add-deprecated-flag branch June 19, 2024 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants