-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser: Add a JEST test to build and compare language agnostic parsers #6030
Conversation
64b52cc
to
615f179
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for jumping on this!
binary
is somewhat misleading. we're not asking for a binary and we'll rarely provide one. what we're asking for is an executable path and the goal is to run a parser, right? why not call it something like that?
second, I guess there's value in running this against our fixtures but I find the need to store the results of the parse in those files a bit dubious. these tests are for comparing parser output, so maybe we just run them both each time we test and forget saving the results. the provided parser need only accept the input document on stdin
and produce the parse on stdout
. Note that if we rely on the snapshots we add needless obstacles when someone wants to change the parser and is making sure they can replicate that in their home-built parser; we'd have to update the snapshot every time we change the official PEG.
to that end we can eliminate the new test/parser/bin/peg-parser.js
file and the Jest test, or at least simplify it (no file access, no rimraf
, no snapshots, etc…).
further with snapshot testing we're not testing that the parses produce equal output in the type of equality we care about, but rather testing that they produce byte-for-byte equality that doesn't matter. we don't care if the output objects for each block have keys in differing orders or if we produce an empty attrs
object vs. omitting it.
I propose that we focus on testing the semantic equality of the parsers then so that we don't break trust in the tests.
did we consider providing this outside of the Gutenberg repo? I can see several benefits to doing that in terms of keeping the parser development cycle lean and the tests easy to understand and discover. I think ideally we could expose this on a web service where someone paste in their code and we execute it then provide an easily navigable results panel. think of something like REPL.it with a diff output.
anyway I think having a web service is down the road, but having a separate repo could be valuable now.
thanks!
Agreed, maybe I can change it to
They don't need to replicate anything, the idea is that they'll run the tests with the tool provided here and it will automatically check against the provided snapshots.
Yes and that's fine because we want to validate that the output is similar in all parsers. I guess the alternative would be to consider the peg parser as the source of truth instead of the snapshots, I think it's not a great idea, I think we should look at the peg parser as just another parser and avoid running it when running other parsers test because it can have an impact on performance. (misleading the results)
The fact that I'm running
Sounds like a good idea, let's not create another repository right now though. Let's focus on getting the tool right and we could extract later if needed (the same way we did for other packages). |
Yeah the point is that I don't see any reason to make this as complicated as we have made it. We're not incorporating other parsers into the master of this project - we're just providing a tool to compare "is my parser or version of the parser identical in its output to the official parser?"
I think we have to make the PEG parser the source of truth because we have no other specification. That parser is the specification right now whereas snapshot tests not only don't specify anything but also don't present themselves in a readable way. They can change unexpectedly too and we'd be none the wiser unless we catch it in review. If the parser itself gets changed we would all know right away when the app breaks. |
The advantage of keeping the snapshots is that it also serves as tests for the peg parser but open to change. Let's see what the others think |
yeah that seems fine but also seems like something for the project itself to test for unanticipated regressions, but not as an implicit and poorly-defined specification and not as a way of supporting third-party parsers. I think they are separate things. |
Worth noting that the environment variable isn't properly propagated, likely because the
Without changing the script, what works for me is exporting the variable: export BINARY=test/parser/bin/peg-parser.js
npm run test-parser (Secondly, I'd also prefer a term like parser or executable over binary) |
Weird, I just tested and it works for me in both cases (npx and npm run) |
I removed the snapshots and added the Gutenberg demo post as a fixture. Feel free to add other long posts as fixtures if needed. |
Related: I built an in-browser test that accepts three inputs…
The parsers are JS source code that will be // length parser - not entirely spec-compliant to gutenberg
;(function() {
return function(content) {
return content.length;
}
)(); The post list is a list of URLs, one per line, where The page will randomize trails of each parser against the input documents and updated the report as it goes. It will continue to iterate until told to stop. |
7edaf2b
to
4e88683
Compare
See the latest commits from WordPress/gutenberg#6030.
const inputFile = path.resolve( fixturesDir, f + '.html' ); | ||
const start = Date.now(); | ||
exec( `cat ${ inputFile } | ${ parserExecutable }`, ( err, parserOutput ) => { | ||
const end = Date.now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach only counts the boot time for the parserExecutable
. In the case of a script written with NodeJS, whatever you use inside your NodeJS file (native module, JS, or WASM) it makes almost no difference.
A difference is visible when you use another executable, like native Rust binary for instance.
I think these numbers can mislead the user. Should we remove them, and come with a more robust in-browser approach?
4e88683
to
e4ac586
Compare
Rebased to merge in updates from master |
While working on this I came across some issues I felt may not end up working well given this approach. Although quite incomplete I have written down my thoughts on how I think one solution could look to handle our concerns. Right now do we think this is still worth merging? |
Closing as superseded by other efforts |
The idea of this PR is to provide the tools (validation and performance) to build block parsers.
This adds a jest test accepting a binary (using env variable) as entry and parsing a list of fixtures using the given binary and comparing the results with snapshots.
To test this, you can generate the pegjs based binary by running
npm run build-parser
and then run the test like thisBINARY=test/parser/bin/peg-parser.js npm run test-parser
To build and test your own custom parser, you need to build a binary (using any language, php, js ...). This binary should accept to args the first one is the input file (an HTML file) and the second one the file where to write the JSON output. Something like this:
mybinary myfixture.html myfixture.json
And then you can test your parser like this :
BINARY=mybinary npm run test-parser
Right now, I just added a small file as a fixture, we'd have to add a bunch of examples including long posts to compare performances properly.