-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some websites don't have feeds #222
Labels
Comments
Some thoughts about how to implement this in the parser: If there are multiple things to be retrieved, we can't return them as a single file object; also, we may fabricate "composite" caching headers. I see two options:
The first one would look something like this: # RetrieveResult is renamed to FileResult, and in its place there's an union.
# RetrieverType continues to return ContextManager[Optional[RetrieveResult]]
RetrieveResult = Union[FileResult, ParsedFeed]
# class Parser:
def __call__(self, url, http_etag, http_last_modified):
parser = self.get_parser_by_url(url)
...
# Must be able to match schemes like magic+http://.
# Note that prefix match is not enough,
# magic+file.txt == file:///magic+file.txt;
# normalizing the URL beforehand could work.
retriever = self.get_retriever(url)
with retriever(url, http_etag, http_last_modified, ...) as result:
if not result:
return None
# Parsing already done, return the result (this is new).
if isinstance(result, ParsedFeed):
return result
# Continue with the old logic.
if not parser:
...
feed, entries = parser(url, result.file, result.headers)
return ParsedFeed(feed, entries, result.http_etag, result.http_last_modified) |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Examples:
It should be relatively easy to have a retriever/parser pair that handles URLs like (newlines added for clarity):
to mean:
entries anchor CSS selector
content CSS selector
Instead of
magic-content
, we could also use some library that guesses what the content is (there must be some out there).In its best form, this should also cover the functionality of the sqlite_releases plugin. Of note is that
magic-content
wouldn't work here, since there's no container for the whole content; also, some of the old versions don't actually have a link.This will also be a good test of the internal retriever/parser API we implemented in #205.
Open questions:
The text was updated successfully, but these errors were encountered: