Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish crawled data with fixes to NPM? #63

Closed
foolip opened this issue Sep 23, 2020 · 9 comments · Fixed by #68
Closed

Publish crawled data with fixes to NPM? #63

foolip opened this issue Sep 23, 2020 · 9 comments · Fixed by #68

Comments

@foolip
Copy link
Member

foolip commented Sep 23, 2020

I just did an experiment in to package up webref/ed/idl/*.idl with fixes from a branch and publish it as https://www.npmjs.com/package/webidl-dfns.

IDL with fixes applied is needed for mdn-bcd-collector, and I wanted to have it as a regular dependency rather than pulling it from a branch that I keep changing.

However, having done that experiment, I wonder if there would be interest in publishing webref directly to NPM? In addition to include ed/ and tr/ as-is, we would also provide fixed versions of ed/idl/* which we guarantee will parse with a given version of webidl2.js.

One could go further if one wanted to with global consistency guarantees such as:

  • No duplicate definitions
  • No unknown types used
  • No circular inheritance

mdn-bcd-collector already does some of this, but if it could be centralized, I think WPT would also be better off making use of released data with some guarantees.

@tidoust
Copy link
Member

tidoust commented Sep 29, 2020

Totally fine publishing webref to NPM! We hadn't done it until now simply because no one had asked ;)
Happy to work on it, time permitting as usual. Feel free to beat me to it!

Also OK to include additional views with IDL guarantees in that package. I suppose that requires maintaining:

Code could either go in Reffy, in this repo directly (for instance as an action that runs any time a push is made), or perhaps in a separate "webgap" repo whose final goal is to disappear...

@foolip, would you like to propose some code to create the fixed version?

@foolip
Copy link
Member Author

foolip commented Sep 29, 2020

We have most of the ingredients needed for this, the question is mainly how to maintain the fixes and what should be included in the NPM package. While spec updates will often break things, it would be great if significant manual work is only needed each time broken IDL is added or changed. So the ingredients of a solution might be:

  • Detailed consistency tests, which if they pass we will trust that we can make a release.
  • A release branch with fixes, where we'll merge in updates from the main branch.
  • A weekly action that tries to merge in the main branch, run tests and cut a release. If that fails, human intervention will be needed.

I'm not sure what should be included in the package, but would suggest starting small, so that we don't include crawl.json, or study.json in particular.

@tidoust how does that sound? Does the w3c already have a NPM account, or whose secret would we add to the repo?

@tidoust
Copy link
Member

tidoust commented Sep 30, 2020

That seems good.

If we release one package with both the raw IDL and the fixed IDL, we could have something like the following:

|_ ed
|  |_ index.json
|  |_ css
|  |_ dfns
|  |_ headings
|  |_ idl
|  |_ idlfixed
|  |_ idlparsed
|  |_ links
|  |_ refs
|_ tr
   |_ index.json
   |_ css
   |_ dfns
   |_ headings
   |_ idl
   |_ idlparsed
   |_ links
   |_ refs

We probably don't want to provide fixed IDL extracts for specs under tr/, unless someone specifically asks for them.

The idlparsed folder contains the result of parsing the raw IDL in idl with webidl2.js, along with additional info about globals, functions, etc. It may be useful to create an idlfixedparsed folder as well with the results of parsing the fixed IDL in idlfixed. Or perhaps we simply shouldn't include these folders, so that the final package only contains data directly extracted from specifications, and not data derived from it.

Alternatively, we could release two packages:

  • one that contains raw IDL extracts, that gets automatically generated whenever a new crawl gets made, and that does not make any guarantee whatsoever.
  • one that contains fixed IDL extracts (and not the raw ones), that gets prepared as you suggest in a separate branch, automatically when possible, manually when not.

W3C already has an NPM account, although I still need to figure out who actually runs it. We may start with my account. I don't think NPM has any real notion of ownership for packages, merely of collaborators entitled to publish the package, so we can easily change accounts later on.

Folder names are open for comments as usual :)

@foolip
Copy link
Member Author

foolip commented Oct 6, 2020

@tidoust do you think we strictly speaking need, idlparsed, or could we have webidl2.js as a peerDependency, and provide a parse() method somewhere on the objects inside ed.idl and tr.idl? It would only be for the fixed IDL that we guarantee that parsing will work at all.

@tidoust
Copy link
Member

tidoust commented Oct 8, 2020

No, idlparsed is not needed.

Discussing this again with @dontcallmedom, we rather propose to publish a minimal set of dedicated packages, that only change when relevant info changes, instead of a single package that contains everything and that is basically going to be updated each time a spec gets a minor update, making it harder to track relevant updates.

In other words, we're more considering something like:

  • A package that contains (fixed versions, when needed, of) ed/css. Such a package would be useful for WPT.
  • A package that contains ed/dfns and possibly ed/headings. Such a package could be useful for the MDN browser compat data project.
  • A package that contains (fixed versions, when needed, of) ed/idl, and that could have webidl2.js as a peer dependency. Such a package would be useful for WPT and other projects that need to access Web IDL definitions.
  • Additional packages when needed, meaning when projects ask for them.

Packages would likely integrate an index file, generated from index.json.

A project that needs to mix info would import multiple packages. We could also create a global package, we're just unaware of projects that need to have all the info at once for now (or that would a package with tr data).

@foolip
Copy link
Member Author

foolip commented Oct 8, 2020

That sounds great actually! Do you think we should have the code for building and releasing these packages here in webref, or as separate repos? I'd like to work on the ed/idl bit first, is there a name you have in mind? Would a @webref org on NPM make sense, so that we can be sure we can get any future package names we want? @webref/idl would then be the thing I work on.

@foolip
Copy link
Member Author

foolip commented Oct 21, 2020

I'd still like to work on this. Can I just go ahead with the assumption that the package name will be @webref/idl and that it will be produced from this very repo, as opposed to some other repo which more closely resembles what will be in the published package?

@saschanaz
Copy link
Member

saschanaz commented Oct 25, 2020

foolip added a commit that referenced this issue Oct 26, 2020
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
foolip added a commit that referenced this issue Oct 26, 2020
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
@foolip
Copy link
Member Author

foolip commented Oct 26, 2020

I've suggested a setup for this in #68. It's still missing a mechanism to actually maintain the fixes, and the tests to ensure consistency of all of the IDL when interpreted together.

foolip added a commit that referenced this issue Feb 10, 2021
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
foolip added a commit that referenced this issue Feb 10, 2021
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
foolip added a commit that referenced this issue Feb 10, 2021
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
foolip added a commit that referenced this issue Feb 10, 2021
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
foolip added a commit that referenced this issue Feb 10, 2021
Lerna is a tool for managing monorepos, and while @webref/idl is the
only package here now, the packages/* structure makes sense.

Fixes #63.
tidoust pushed a commit that referenced this issue Feb 12, 2021
Package gets prepared and published from the `packages/*` subfolder with Lerna.
Further packages are expected to be created.

Note this introduces a couple of line ending fixes for scripts to run on Windows:
- The .gitattributes file tells Git not to convert LF line endings of the bash
script. The script cannot run on typical Git configs on Windows otherwise.
- The `--binary` option tells `patch` to disable any CRLF to LF conversion
when applying the patches. That is also needed for patches to run smoothly on
Windows.

Fixes #63.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@foolip @tidoust @saschanaz and others