-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for "Washington Post" #467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good. Thanks for adding this 👍
Everything you have implemented so far looks good. Now what still remains open is a function for the topics. If you add that you also need to run |
Unfortunately, I am unsure on how to specifically extract the values of the "keywords" tag with the methods Fundus provides or without causing the topics method to be huge. I have tried several options but was unsuccessful so far. An alternative would be to just extract the "article:section" value from the meta section. However, this would be extremely broad and only return one topic per article, which is not ideal. Additionally, adding the additional RSS Feeds you provided seems to have caused the main page of the Washington Post ( https://www.washingtonpost.com/ ) to be considered as an article as well. When this occurs, no article text or publishing date is returned obviously. Fundus will say "--missing plaintext--" In the meantime, I have fixed the tests. They should run fine now. |
You are right, for some reason the RSS Feeds sometimes don't contain the actual link to the article and just lead to the homepage. I don't know why. For this we have the url_filter attribute and since it was just something small, I added it to the PR. |
I have added support for the US-Publisher "Washington Post" (https://www.washingtonpost.com/)
I have ran the tests as instructed and no errors were produced.