Skip to content

v0.4.3

Latest
Compare
Choose a tag to compare
@MaxDall MaxDall released this 04 Sep 12:07
· 29 commits to master since this release
ccf5a80

Introducing New Publishers from Canada, Germany, and India ๐Ÿš€

This release includes:

  • Support for five new publishers (three from Canada, one from India, and one from Germany)
  • Article filtering based on robots.txt

New Features

With this update, we've implemented article filtering using robots.txt. Each URL fetched is now evaluated against the path and user-agent restrictions specified by publishers in their robots.txt files. This feature is enabled by default, but users can disable it by setting ignore_robots=True in the Crawler constructor.

New Publishers

Canada (CA)

India (IND)

Germany (DE)

Updates

We've updated our APNews parser to accurately parse authors once more and applied additional fixes.

Bug Fixes

Full Changelog: v0.4.2...v0.4.3