Skip to content

Commit

Permalink
Automate processing of IETF drafts - Take 2
Browse files Browse the repository at this point in the history
The code now fetches all the information it needs for IETF drafts and RFCs from
the IETF datatracker using the Simplified Documents API:
https://datatracker.ietf.org/api/#simplified-documents

This makes it possible to retrieve the latest revision of a document to build
the nightly URL, and to fetch information about the group that standardizes the
document, if any.

IETF documents may be linked to a group, an area, or be part of what IETF calls
individual submissions. Areas and individual submissions still link to a "group"
page at IETF, so the code just takes that info from datatracker as-is. As a
result, individual submissions are no longer associated with the author who
submitted the document, but that does not seem needed in any case.

The code throws when an IETF document that it knows under a certain name got
published under a different name to alert us that the canonical URL needs to
change in browser-specs. Name changes typically happen when a document
transitions to a working group, or when it gets published as an RFC.
  • Loading branch information
tidoust committed Nov 22, 2023
1 parent 4ee73b6 commit f37b62b
Show file tree
Hide file tree
Showing 8 changed files with 154 additions and 146 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,7 @@ The `excludePaths` property is seldom set.
The provenance for the `title` and `nightly` property values. Can be one of:
- `w3c`: information retrieved from the [W3C API](https://w3c.github.io/w3c-api/)
- `specref`: information retrieved from [Specref](https://www.specref.org/)
- `ietf`: information retrieved from the [IETF datatracker](https://datatracker.ietf.org)
- `spec`: information retrieved from the spec itself

The `source` property is always set.
Expand Down
2 changes: 1 addition & 1 deletion schema/definitions.json
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@

"source": {
"type": "string",
"enum": ["w3c", "specref", "spec"]
"enum": ["w3c", "specref", "spec", "ietf"]
},

"release": {
Expand Down
52 changes: 4 additions & 48 deletions specs.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,41 +18,21 @@
"https://console.spec.whatwg.org/",
{
"url": "https://datatracker.ietf.org/doc/html/draft-cutler-httpbis-partitioned-cookies",
"groups": [
{
"name": "Dylan Cutler (individual)",
"url": "https://datatracker.ietf.org/person/dylancutler@google.com"
}
],
"nightly": {
"url": "https://dcthetall.github.io/CHIPS-spec/draft-cutler-httpbis-partitioned-cookies.html",
"repository": "https://github.com/DCtheTall/CHIPS-spec"
}
},
{
"url": "https://datatracker.ietf.org/doc/html/draft-davidben-http-client-hint-reliability",
"groups": [
{
"name": "David Benjamin (individual)",
"url": "https://datatracker.ietf.org/person/davidben@google.com"
}
],
"nightly": {
"repository": "https://github.com/davidben/http-client-hint-reliability",
"sourcePath": "draft-davidben-http-client-hint-reliability.md"
}
},
"https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-digest-headers",
"https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis",
{
"url": "https://datatracker.ietf.org/doc/html/draft-zern-webp/",
"groups": [
{
"name": "James Zern (individual)",
"url": "https://datatracker.ietf.org/person/jzern@google.com"
}
]
},
"https://datatracker.ietf.org/doc/html/draft-zern-webp/",
"https://dom.spec.whatwg.org/",
"https://drafts.css-houdini.org/css-typed-om-2/ delta",
{
Expand Down Expand Up @@ -586,15 +566,7 @@
"https://wicg.github.io/webpackage/loading.html",
"https://wicg.github.io/webusb/",
"https://wicg.github.io/window-controls-overlay/",
{
"url": "https://www.rfc-editor.org/rfc/rfc2397",
"groups": [
{
"name": "Network Working Group",
"url": "https://datatracker.ietf.org/group/app/"
}
]
},
"https://www.rfc-editor.org/rfc/rfc2397",
{
"url": "https://www.rfc-editor.org/rfc/rfc4120",
"shortTitle": "Kerberos"
Expand All @@ -604,15 +576,7 @@
"url": "https://www.rfc-editor.org/rfc/rfc6266",
"shortTitle": "Content-Disposition in HTTP"
},
{
"url": "https://www.rfc-editor.org/rfc/rfc6386",
"groups": [
{
"name": "Independent Submission",
"url": "https://datatracker.ietf.org/stream/ise/"
}
]
},
"https://www.rfc-editor.org/rfc/rfc6386",
"https://www.rfc-editor.org/rfc/rfc6454",
"https://www.rfc-editor.org/rfc/rfc6797",
"https://www.rfc-editor.org/rfc/rfc7034",
Expand All @@ -632,15 +596,7 @@
"https://www.rfc-editor.org/rfc/rfc7725",
"https://www.rfc-editor.org/rfc/rfc7838",
"https://www.rfc-editor.org/rfc/rfc8246",
{
"url": "https://www.rfc-editor.org/rfc/rfc8288",
"groups": [
{
"name": "Mark Nottingham (individual)",
"url": "https://datatracker.ietf.org/person/mnot@mnot.net"
}
]
},
"https://www.rfc-editor.org/rfc/rfc8288",
"https://www.rfc-editor.org/rfc/rfc8297",
"https://www.rfc-editor.org/rfc/rfc8470",
"https://www.rfc-editor.org/rfc/rfc8942",
Expand Down
7 changes: 4 additions & 3 deletions src/compute-shortname.js
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,10 @@ function computeShortname(url) {
}

// Handle IETF individual drafts, stripping group name
// (NB: there is no sure way to tell that the first token is a group name,
// it may be the beginning of the shortname. Code below could return a name
// that is truncated as a result)
// TODO: retrieve the list of IETF groups to make sure that the group name
// is an actual group name and not the beginning of the shortname:
// https://datatracker.ietf.org/api/v1/group/group/
// (multiple requests needed due to pagination, "?limit=1000" is the max)
const ietfIndDraft = url.match(/\/datatracker\.ietf\.org\/doc\/html\/draft-[^\-]+-([^\/]+)/);
if (ietfIndDraft) {
if (ietfIndDraft[1].indexOf('-') !== -1) {
Expand Down
78 changes: 26 additions & 52 deletions src/fetch-groups.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,63 +44,37 @@ module.exports = async function (specs, options) {
for (const spec of specs) {
const info = parseSpecUrl(spec.url);
if (!info) {
// There is no direct way to find the name of the group behind an IETF
// document. The name of the draft document must follow the rules in:
// https://authors.ietf.org/naming-your-internet-draft
// If the document has already been published as an RFC, we can retrieve
// the name of the Internet Draft from the "draft" property in:
// https://www.rfc-editor.org/in-notes/rfcXXX.json
if (spec.url.match(/rfc-editor\.org/) ||
spec.url.match(/datatracker\.ietf\.org/)) {
// For IETF documents, retrieve the group info from datatracker
const ietfName =
spec.url.match(/rfc-editor\.org\/rfc\/([^\/]+)/) ??
spec.url.match(/datatracker\.ietf\.org\/doc\/html\/([^\/]+)/);
if (ietfName) {
spec.organization = spec.organization ?? "IETF";
let wgName = null;
let wgId = null;
if (spec.groups) continue;
if (spec.url.match(/rfc-editor\.org/)) {
const rfcNumber = spec.url.slice(spec.url.lastIndexOf('/') + 1);
const rfcJSON = await fetchJSON(`https://www.rfc-editor.org/in-notes/${rfcNumber}.json`);
if (!rfcJSON.draft) {
throw new Error(`Cannot derive IETF group for ${spec.url}.
No draft URL found. Is it an individual submission?`);
}
wgId = rfcJSON.draft.split('-')[2];
if (!wgId) {
throw new Error (`Cannot derive IETF group for ${spec.url}.
Draft URL ${rfcJSON.draft} does not seem to contain a group ID.`);
}
wgName = rfcJSON.source;
if (!wgName) {
throw new Error (`The RFC info for ${spec.url} does not contain a group name.`);
}
const ietfJson = await fetchJSON(`https://datatracker.ietf.org/doc/${ietfName[1]}/doc.json`);
if (ietfJson.group?.type === "WG") {
spec.groups = [{
name: `${ietfJson.group.name} Working Group`,
url: `https://datatracker.ietf.org/wg/${ietfJson.group.acronym}/`
}];
continue;
}
else if ((ietfJson.group?.type === "Individual") ||
(ietfJson.group?.type === "Area")) {
// Document uses the "Individual Submissions" stream, linked to the
// "none" group in IETF: https://datatracker.ietf.org/group/none/
// or to an IETF area, which isn't truly a group but still looks like
// one. That's fine, let's reuse that info.
spec.groups = [{
name: ietfJson.group.name,
url: `https://datatracker.ietf.org/wg/${ietfJson.group.acronym}/`
}];
continue;
}
else {
const draftName = spec.url.match(/\/(draft-ietf-[^\/]+)/);
if (!draftName) {
throw new Error(`Cannot derive IETF group for ${spec.url}. Individual submission?`);
}
wgId = draftName[1].split('-')[2];
wgName = wgId;
if (wgId === 'http') {
// Someone forgot to update their reference...
wgId = 'httpbis';
}
if (wgId === 'httpbis') {
wgName = 'HTTP';
}
else {
// TODO: fetch actual group name from https://datatracker.ietf.org/wg/${wgId}/
throw new Error(
`Found unknown IETF group ID "${wgId}" for ${spec.url}.
Group name should appear in https://datatracker.ietf.org/wg/${wgId}/`
);
}
throw new Error(`Could not derive IETF group for ${spec.url}.
Unknown group type found in https://datatracker.ietf.org/doc/${ietfName[1]}/doc.json`);
}

spec.groups = [{
name: `${wgName} Working Group`,
url: `https://datatracker.ietf.org/wg/${wgId}/`
}];
continue;
}
if (!spec.groups) {
throw new Error(`Cannot extract any useful info from ${spec.url}`);
Expand Down
85 changes: 65 additions & 20 deletions src/fetch-info.js
Original file line number Diff line number Diff line change
Expand Up @@ -240,11 +240,69 @@ async function fetchInfoFromSpecref(specs, options) {
}


async function fetchInfoFromIETF(specs, options) {
const info = await Promise.all(specs.map(async spec => {
// IETF can only provide information about IETF specs
if (!spec.url.match(/\.ietf\.org/)) {
return;
}

// Retrieve information about the spec
const draftName =
spec.url.match(/rfc-editor\.org\/rfc\/([^\/]+)/) ??
spec.url.match(/datatracker\.ietf\.org\/doc\/html\/([^\/]+)/);
if (!draftName) {
throw new Error(`IETF document follows an unexpected URL pattern: ${spec.url}`);
}
const url = `https://datatracker.ietf.org/doc/${draftName[1]}/doc.json`;
const res = await throttledFetch(url, options);
if (res.status !== 200) {
throw new Error(`IETF datatracker returned an error, status code is ${res.status}`);
}
let body;
try {
body = await res.json();
}
catch (err) {
throw new Error(`IETF datatracker returned invalid JSON for ${url}`);
}

const lastRevision = body.rev_history.pop();
if (lastRevision.name !== body.name) {
throw new Error(`IETF spec ${spec.url} published under a new name "${lastRevision.name}". Canonical URL must be updated accordingly.`);
}

// Prefer the httpwg.org version for HTTP WG drafts
const nightly = (body.group?.acronym === 'httpbis') ?
`https://httpwg.org/http-extensions/${lastRevision.name}.html` :
`https://www.ietf.org/archive/id/${lastRevision.name}-${lastRevision.rev}.html`;

return {
title: body.title,
nightly: nightly,
state: body.state
};
}));

// TODO: use "state" to return a better status than "Editor's Draft".
const results = {};
specs.forEach((spec, idx) => {
if (info[idx]) {
results[spec.shortname] = {
nightly: { url: info[idx].nightly, status: "Editor's Draft" },
title: info[idx].title
};
}
});
return results;
}


async function fetchInfoFromSpecs(specs, options) {
const browser = await puppeteer.launch();

async function fetchInfoFromSpec(spec) {
let url = spec.nightly?.url || spec.url;
const url = spec.nightly?.url || spec.url;
const page = await browser.newPage();

// Inner function that returns a network interception method for Puppeteer,
Expand Down Expand Up @@ -338,24 +396,6 @@ async function fetchInfoFromSpecs(specs, options) {
}
}

// For IETF drafts, look at the front matter to extract the name of the
// Internet Draft and compute the nightly URL from that name
if (!spec.nightly?.url && spec.url.match(/datatracker\.ietf\.org/)) {
const draftName = await page.evaluate(_ => {
const el = document.querySelector('.internet-draft');
if (el) {
return el.innerText.trim();
}
});
if (draftName) {
url = `https://www.ietf.org/archive/id/${draftName}.html`;
if (draftName.match(/^draft-ietf-http(bis)?-/)) {
// Prefer the httpwg.org version for HTTP WG drafts
url = `https://httpwg.org/http-extensions/${draftName.replace(/-\d+$/, '')}.html`;
}
}
}

const titleAndStatus = await page.evaluate(_ => {
// Extract first heading when set
let title = document.querySelector("h1");
Expand Down Expand Up @@ -453,15 +493,20 @@ async function fetchInfo(specs, options) {
remainingSpecs = remainingSpecs.filter(spec => !w3cInfo[spec.shortname]);
const specrefInfo = await fetchInfoFromSpecref(remainingSpecs, options);

// Extract information directly from the spec for remaining specs
// Extract information from IETF datatracker for remaining specs
remainingSpecs = remainingSpecs.filter(spec => !specrefInfo[spec.shortname]);
const ietfInfo = await fetchInfoFromIETF(remainingSpecs, options);

// Extract information directly from the spec for remaining specs
remainingSpecs = remainingSpecs.filter(spec => !ietfInfo[spec.shortname]);
const specInfo = await fetchInfoFromSpecs(remainingSpecs, options);

// Merge results
const results = {};
specs.map(spec => spec.shortname).forEach(name => results[name] =
(w3cInfo[name] ? Object.assign(w3cInfo[name], { source: "w3c" }) : null) ||
(specrefInfo[name] ? Object.assign(specrefInfo[name], { source: "specref" }) : null) ||
(ietfInfo[name] ? Object.assign(ietfInfo[name], { source: "ietf" }) : null) ||
(specInfo[name] ? Object.assign(specInfo[name], { source: "spec" }) : null));

// Add series info from W3C API
Expand Down
18 changes: 18 additions & 0 deletions test/fetch-groups.js
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,24 @@ describe("fetch-groups module (without API keys)", function () {
}]);
});

it("handles IETF individual drafts", async () => {
const res = await fetchGroupsFor("https://datatracker.ietf.org/doc/html/draft-cutler-httpbis-partitioned-cookies");
assert.equal(res.organization, "IETF");
assert.deepStrictEqual(res.groups, [{
name: "Individual Submissions",
url: "https://datatracker.ietf.org/wg/none/"
}]);
});

it("handles IETF area drafts", async () => {
const res = await fetchGroupsFor("https://datatracker.ietf.org/doc/html/draft-zern-webp");
assert.equal(res.organization, "IETF");
assert.deepStrictEqual(res.groups, [{
name: "Applications and Real-Time Area",
url: "https://datatracker.ietf.org/wg/art/"
}]);
});

it("preserves provided info", async () => {
const spec = {
url: "https://url.spec.whatwg.org/",
Expand Down
Loading

0 comments on commit f37b62b

Please sign in to comment.