-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate processing of IETF (group) drafts #1135
Conversation
The code now recognizes IETF draft documents that have a `datatracker.ietf.org` URL: - It associates them with the IETF organization - It can compute a useful shortname (that code can in theory return a truncated shortname because there is no direct way to validate that the Internet Draft name contains a group ID). - It extracts the group's ID from the nightly URL (that code could further be improved to fetch the actual group name, right now the code only knows about the "HTTP" working group). - It associates IETF documents from the HTTP WG to the right repository. - It computes the better-looking nightly URL at `www.ietf.org` or at `httpwg.org` for HTTP WG documents. This allows to simplify IETF data in `specs.json` a bit. Note that the code still cannot process drafts that have been submitted by individuals automatically, even when these drafts at targeted at a group. Such drafts should be associated with the individuals that submitted them and not with any group. A couple of spec entries, which incorrectly referenced the Network WG or the HTTP WG, were fixed accordingly in `specs.json`. This fixes #1122, but note that the code does not need to fetch the datatracker page for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but while researching this PR, I discovered the datatracker API that I think would make a lot of this more reliable:
https://datatracker.ietf.org/api/
See e.g. https://datatracker.ietf.org/api/v1/doc/document/?name=draft-ietf-httpbis-digest-headers&format=json https://datatracker.ietf.org/api/v1/group/group/1718/?format=json
Ah, I was looking into datatracker docs and was about to make the same point :) Perhaps even simpler, there is a simplified JSON file that contains all the information we need and that would be more readily ingestable (if we use the API, we'll have to hop through a few endpoints to collect the info), e.g.: https://datatracker.ietf.org/doc/draft-zern-webp/doc.json Also, I don't really understand how to get information about a document published as an RFC with the main API document endpoint, e.g. https://datatracker.ietf.org/api/v1/doc/document/?name=rfc6797&format=json does not work. It seems that one has to know the draft name instead, as in https://datatracker.ietf.org/api/v1/doc/document/?name=draft-ietf-websec-strict-transport-sec&format=json and the easiest way to map the rfc to the draft name would be to fetch the related (Edit: The datatracker API way to map the rfc to the draft name seems to be through the /doc/docalias endpoint: https://datatracker.ietf.org/api/v1/doc/docalias/rfc6797/) |
The code now fetches all the information it needs for IETF drafts and RFCs from the IETF datatracker using the Simplified Documents API: https://datatracker.ietf.org/api/#simplified-documents This makes it possible to retrieve the latest revision of a document to build the nightly URL, and to fetch information about the group that standardizes the document, if any. IETF documents may be linked to a group, an area, or be part of what IETF calls individual submissions. Areas and individual submissions still link to a "group" page at IETF, so the code just takes that info from datatracker as-is. As a result, individual submissions are no longer associated with the author who submitted the document, but that does not seem needed in any case. The code throws when an IETF document that it knows under a certain name got published under a different name to alert us that the canonical URL needs to change in browser-specs. Name changes typically happen when a document transitions to a working group, or when it gets published as an RFC.
Take 3 :) PR #1135 actually had a couple of issues that made the code essentially useless because it only ran on a handful of IETF specs: - the code favored info from Specref over info from IETF - the code only really applied to drafts due to a buggy RegExp Fixing these problems yielded a new issue: the assumption that HTTP WG specs are always available under `httpwg.org` turns out to be wrong. Also, there are other specs that are not published by the HTTP WG but that still have an `httpwg.org` version. The code now looks at the actual list of specs in the underlying GitHub repository: https://github.com/httpwg/httpwg.github.io. As a result, the nightly URL of all IETF specs that have an `httpwg.org` version now targets that version, implementing the suggestion in #933 (see that issue for the list of affected specs). A companion PR was sent to Specref to implement a similar switch there: tobie/specref#766 The code also looks at the obsolescence data in datatracker and sets the `standing` and `obsoletedBy` properties accordingly. This fixes #327.
Take 3 :) PR #1135 actually had a couple of issues that made the code essentially useless because it only ran on a handful of IETF specs: - the code favored info from Specref over info from IETF - the code only really applied to drafts due to a buggy RegExp Fixing these problems yielded a new issue: the assumption that HTTP WG specs are always available under `httpwg.org` turns out to be wrong. Also, there are other specs that are not published by the HTTP WG but that still have an `httpwg.org` version. The code now looks at the actual list of specs in the underlying GitHub repository: https://github.com/httpwg/httpwg.github.io. As a result, the nightly URL of all IETF specs that have an `httpwg.org` version now targets that version, implementing the suggestion in #937. A companion PR was sent to Specref to implement a similar switch there: tobie/specref#766 The code also looks at the obsolescence data in datatracker and sets the `standing` and `obsoletedBy` properties accordingly. This fixes #327.
The code now recognizes IETF draft documents that have a
datatracker.ietf.org
URL:www.ietf.org
or athttpwg.org
for HTTP WG documents.This allows to simplify IETF data in
specs.json
a bit.Note that the code still cannot process drafts that have been submitted by individuals automatically, even when these drafts at targeted at a group. Such drafts should be associated with the individuals that submitted them and not with any group. A couple of spec entries, which incorrectly referenced the Network WG or the HTTP WG, were fixed accordingly in
specs.json
.This fixes #1122, but note that the code does not need to fetch the datatracker page for the time being.