-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read individual articles for Atom and RSS 1.0 feeds #10
Comments
Doesn't work for this RSS v1.0 feed as well - http://feeds.bbci.co.uk/news/rss.xml |
Hmm, this is an interesting case. These feeds are just links to the articles without containing the content. We could attempt to fetch the content from the URL but the markdown conversion from a full page html is likely to be funky. One option here would be to just open the links that have no content in the browser using Would that suit your usecase? I can try and look at parsing but this opens up a large can of worms, even the times site in your example requires a times account to actually get the content. |
Hi @guyfedwards
Yes, it might be a bit of a challenge. You'll want to find a library to strip
Not quite. I'm interested in reading the article in the terminal via the RSS reader without leaving the app.
I'm able to get the article content via |
Well, Fetching ONLY an article from a web page is a little tricky. I'm not sure if there is any good way of scraping only the article contents from HTML, stripping out headers, navigations, sidebars, footer etc as the structure of a web page containing an article isn't standardized. In fact, I can build a whole page with any of the blocks out of styled Returning back to the solution on how to fetch article contents from a webpage... One way I can think of finding an element with the highest word density (after all the tags are stripped) maybe? All in all, this is quite a hefty task. If something like this is implemented that'd be great. I personally have some feeds like that in my newsboat config, and those are displayed with only a link to the article. |
I think short-term, opening the link is a sufficient solution, longer term we can look at adding html parsing capability but will be a bit more of a challenge. |
The circumflex program allows you to read Hacker News articles in the terminal. They accomplish that by using the Go-Readability package to Maybe a similar approach would work here? |
nom
successfully lists the feed items, but attempting to read an individual article only shows the title and date:Tested with https://rss.nytimes.com/services/xml/rss/nyt/World.xml
The text was updated successfully, but these errors were encountered: