-
-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(aria_metadata): generate ARIA metadata from specification #4055
Conversation
Parser conformance results onjs/262
jsx/babel
symbols/microsoft
ts/babel
ts/microsoft
|
CodSpeed Performance ReportMerging #4055 will improve performances by 6.28%Comparing Summary
Benchmarks breakdown
|
fa546d0
to
a290a03
Compare
a290a03
to
61061da
Compare
@ematipico I chose |
24b0bc0
to
a3aeba5
Compare
A DOM library like happy-dom or jsdom is perfectly fine and gets the job done, but just wanted to make a note that for most scraping needs of mostly static content you don't really need a DOM, and you can get by with a simple XML/HTML parser like cheerio. This is normally at least some orders of magnitude faster, and there are some tools that simplify this further with a scraping-focused API like surgeon. All of this said, if this is something that just needs to run manually every once in a while, a DOM library is perfectly fine. However, if there are plans to run this frequently (e.g. in CI for every PR), then I'd strongly suggest switching to a parser approach. |
a3aeba5
to
1b975f4
Compare
We will rarely run this job, considering that it's computed from source. |
Thanks for the pointers. |
Summary
Our ARIA metadata are currently handwritten and mostly based on ARIA 1.1 (some of them are in ARIA 1.2).
Some of these metadata are erroneous or incomplete.
I first started to update them manually. However, this took too much time and it is error-prone.
The aria-query NPM package provides a JSON file with the ARIA metadata.
This seems to be manually updated.
I also noticed several issues and some lack of fidelity with the specification (they mix related and base concepts for example).
Thus, I decided to write a script that extracts ARIA metadata directly from the specification.
The script is written in JavaScript and relies on
happy-dom
NPM package that parse the HTML page of the ARIA specification.I chose
happy-dom
because it has less dependencies thanjsdom
andlinkedom
. It is also more used thanlinkedom
.This is a best effort approach, however the generated data seems pretty good (I tried for the three available versions of ARIA).
I placed the script in
packages/aria-data
.I updated the
biome_aria_metadata
build.rs
script to generate its code fromaria-data.json
.For now,
aria-data.json
is symlinked topackages/aria-data/aria-data-1-2.json
.I used the
prettyplease
crate (widely used in the community) to format the output ofbuild.rs
.I think it is a better alternative to starting a process to run
rustfmt
.Now that we have a pretty reliable ARIA metadata source, we will be able to transfer more stuff from
biome_aria
tobiome_aria_metadata
.Test Plan
CI must pass.