-
Notifications
You must be signed in to change notification settings - Fork 53
Please add support for "Web View" #14
Comments
I've found that rendering the entire webpage into an epub creates nearly unreadable content, which is why we clean and reflow content in the first place. Because this is from several random sources instead of something predictable like the Article Viewer, or for example the feed from a single website, trying to clean and render the article cleanly would be very nontrivial. I would be willing to review and push a PR but I am not willing to write this myself due to it being a edge case that will generate a LOT of code. |
I definitely wouldn't call it an edge case. The issue is that rendering the article view into an epub creates nearly unreadable content. I used this recipe to download ~30 articles and nearly all of them had major errors or were missing key parts of the article I had saved. No Wikipedia article rendered properly. Articles from other websites had strange non-printing characters showing up. All of them render perfectly readably on my phone in web view and would certainly render much more readably than the article view on my kindle. Maybe I'm underestimating the complexity of adding this option. I can look into the code at some point in the future, but I'm currently swamped with other projects. But I'm surprised that it would take more than a few lines to take each link in the rss feed and download the content of that link. If this change is really going to generate a lot of code, would it be possible to change the script to download all images in an article instead? This would fix about 80% of the readability issues for Wikipedia articles. |
Once complaint in a few years is edge to me, I understand this affects you and that's not good but I've not had the problems you've had, but I don't pocket wiki links, just news.
Likewise
Making it just pull the source website instead of the Article View is EASY. I have accidentally done it a few times in dev builds. The problem is reliably reflowing nearly random source content and making that readable on the nook/kindle. If you just take the raw page and cram that into an epub it will be nightmarish because it will grab EVERYTHING on that page, and I mean everything. The power of the calibre recipe is it's reflowing of the page content to a proper epub, not just taping the page into the file.
Yes and I have this on my list of things to do but it's not high priority, sorry. |
For anyone interested in reading content that Pocket's Article View doesn't work for, you can download the web view as follows. In the recipe, set articles_are_obfuscated to False and in the parse_index function, change 'url': u'{0}/a/read/{1}'.format(self.index_url, pocket_article[0]),
'real_url': pocket_article[1]['resolved_url'], to
I did this and all of the articles are now readable, which is a big improvement for my needs. The primary downside is the presence of navigation code in the articles. Perhaps in a few months when I have time to learn python's html libraries I'll add functionality to strip nav elements to do away with these. And I'll make web view optional, whereas the changes above permanently replace Article View with Web View.
|
It might be interesting to add webview to the branch that pulls by tag. Use the webview on articles tagged with 'webview' ... |
Pocket's "Article View" generally mangles most of what I want to read into an unreadable format by deleting images and sometimes only grabbing a fraction of the article I added. On Android's pocket app there's a way to read "Web View" that isn't subject to these bugs.
Since it seems to be non-trivial to scrape Pocket's Article View and in particular since that scraping frequently removes essential images from the article, it would be very useful to have a way to set Calibre to download the article the way it was meant to be viewed in the browser.
The text was updated successfully, but these errors were encountered: