You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Only "*/article*" urls should be shown. Instead there are urls containing "*/video*" or "*/slideshow*". (Depending on if the last 20 news are even containing videos or slideshows=
@Benjamin2107 When I run the code snippet above, only the filter specified in the Kicker publisher enum works as intended, but that got only fixed recently with #459. Could you confirm if this is also the case for you?
You can get the debugging logging messages enabled with
Describe the bug
While working on #464 I had trouble filtering some regex in the url_filter of PublisherSpec.
All unit tests are working fine but after testing the crawler myself I recognized videos and slideshows from my selected newspaper don't get filtered.
Is this a bug or is this my fault?
How to reproduce
Expected behavior.
Only "*/article*" urls should be shown. Instead there are urls containing "*/video*" or "*/slideshow*". (Depending on if the last 20 news are even containing videos or slideshows=
Logs and Stack traces
No response
Screenshots
No response
Additional Context
No response
Environment
OS: Windows 11 Fundus master branch + my new feature (See #464) python 3.9 Installed packages: attrs==23.2.0 black==23.1.0 Brotli==1.1.0 certifi==2024.2.2 chardet==5.2.0 charset-normalizer==3.3.2 click==8.1.7 colorama==0.4.6 cssselect==1.2.0 dill==0.3.8 exceptiongroup==1.2.1 FastWARC==0.14.6 feedparser==6.0.11 -e (fundus) idna==3.7 iniconfig==2.0.0 isort==5.12.0 langdetect==1.0.9 lxml==4.9.4 more-itertools==9.1.0 mypy==1.9.0 mypy-extensions==1.0.0 packaging==24.0 pathspec==0.12.1 platformdirs==4.2.0 pluggy==1.5.0 pytest==7.2.2 python-dateutil==2.9.0.post0 requests==2.31.0 tqdm==4.66.2 types-beautifulsoup4==4.12.0.20240229 types-colorama==0.4.15.20240311 types-html5lib==1.1.11.20240228 types-lxml==2024.4.14 types-python-dateutil==2.9.0.20240316 types-requests==2.31.0.20240406 typing_extensions==4.11.0 urllib3==2.2.1 validators==0.28.1
The text was updated successfully, but these errors were encountered: