-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Feed support #205
Comments
OK, to implement this in a modular way, we'll split the current "subparsers" (HTTPParser/FileParser) into a Retriever and a (Sub)Parser. The Retriever:
The (Sub)Parser:
Here's pseudo-code of how they all fit together in the (Meta)Parser (the current Parser class): # input
url: str = ...
# currently http_etag and http_last_modified
caching_headers: dict = ...
# actually stored on a Parser instance
RETRIEVERS = [HTTPRetriever(), FileRetriever()]
PARSERS = [JSONFeedParser(), FeedparserParser()]
# actually a Parser method
retriever = get_retriever(url)
http_accept = merge_accept_headers(p.accept_headers for p in PARSERS)
file, mime_type, caching_headers, headers = retriever.get(
url, caching_headers, http_accept
)
if not mime_type:
mime_type = mimetype.guess_type(url)
# actually a Parser method
parser = get_parser(mime_type)
parsed_feed = parser(url, file, headers)
rv = parsed_feed, caching_headers Here's how (sub)parser selection works: from werkzeug.datastructures import MIMEAccept
from werkzeug.http import parse_accept_header, parse_options_header
# the accept headers come from parser.accept_header,
# except for the wildcard, which is added manually;
# in practice, feedparser and feedparser (catch-all) are the same object
PARSERS = [
(parse_accept_header(a, MIMEAccept), parser)
for a, parser in [
# everything in feedparser.http.ACCEPT, except the wildcard (*/*);
# only a few included for brevity
("application/atom+xml,application/xml;q=0.9", "feedparser"),
("application/feed+json,application/json;q=0.9", "jsonfeed"),
# for backwards compatibility
("*/*;q=0.1", "feedparser (catch-all)"),
]
]
def get_parser(mime_type):
for accept, parser in PARSERS:
if accept.best_match([mime_type]):
return parser
def merge_accept_headers():
values = []
for accept, _ in PARSERS:
values.extend(accept)
return MIMEAccept(values).to_header()
print(merge_accept_headers())
content_types = [
"application/xml; charset=ISO-8859-1",
"application/xml",
"application/whatever+xml",
"application/json",
"unknown/type",
]
for content_type in content_types:
mime_type, _ = parse_options_header(content_type)
print(content_type, '->', get_parser(mime_type))
"""
application/atom+xml,application/feed+json,application/xml;q=0.9,application/json;q=0.9,*/*;q=0.1
application/xml; charset=ISO-8859-1 -> feedparser
application/xml -> feedparser
application/whatever+xml -> feedparser (catch-all)
application/json -> jsonfeed
unknown/type -> feedparser (catch-all)
""" |
To do:
|
OK, I added / updated all the feeds below:
Most things look fine: authors, dates, attachments, HTML, titles. The only issue is that feed.updated isn't set (the spec doesn't specify one); we should use the newest entry for that. Update: This is not only specific to JSON feeds, cut #214 for it. |
Time spent:
|
https://en.m.wikipedia.org/wiki/JSON_Feed
https://jsonfeed.org/
Asked about in https://www.reddit.com/r/selfhosted/comments/kioq3g/comment/ggs3kuk?context=3
Question: Is this worth supporting, or a case of featuritis?
The Wikipedia page mentions NPR as a publisher that supports it, and the latest version of the spec mentions about 10 other websites.
Update: Here's some more users: https://indieweb.org/JSON_Feed
We could make it a plug-in.
Regardless of the support required, this is an interesting use case, since to implement it as a separate parser we'd need a way of delegating by extension and/or MIME type.
At the moment, we can only delegate to a parser by feed URL prefix (and making people add "json+http://..." to their feeds is not exactly user friendly).
The text was updated successfully, but these errors were encountered: