Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Ajv.compile speed #995

Open
mcollina opened this issue Apr 18, 2019 · 8 comments
Open

Improve Ajv.compile speed #995

mcollina opened this issue Apr 18, 2019 · 8 comments

Comments

@mcollina
Copy link

What version of Ajv you are you using?

v6

What problem do you want to solve?

We are investigating slow startup time in Fastify, and we identified this library as our primary bottleneck. We have been able to squeeze 30% by using ajv-pack in a couple of places.

Our current issue it is on how Ajv resolves URIs with https://www.npmjs.com/package/uri-js in https://github.com/epoberezkin/ajv/blob/bc993deceada5cc152ba0fd3b2e300012b2330a0/lib/compile/resolve.js#L235 and https://github.com/epoberezkin/ajv/blob/bc993deceada5cc152ba0fd3b2e300012b2330a0/lib/compile/resolve.js#L207-L216.

This is a flamegraph of the startup time. Apparently, the bottleneck is normalizing IPv6 addresses (which we are not using) in https://github.com/garycourt/uri-js/blob/4f6f600fade03398c08adf2755c3a2ad66d31b3c/src/uri.ts#L175.

Note that that RegExp is:

/^\[?((?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){6}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){5}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){4}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,1}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){3}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,2}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:[0-9A-Fa-f]{1,4})\:){2}(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,3}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:[0-9A-Fa-f]{1,4})\:(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,4}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:(?:(?:[0-9A-Fa-f]{1,4})\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9])\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9][0-9])|(?:0?[1-9][0-9])|0?0?[0-9]))))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,5}(?:[0-9A-Fa-f]{1,4}))?\:\:(?:[0-9A-Fa-f]{1,4}))|(?:(?:(?:(?:[0-9A-Fa-f]{1,4})\:){0,6}(?:[0-9A-Fa-f]{1,4}))?\:\:)))(?:(?:\%25|\%(?![0-9A-Fa-f]{2}))((?:(?:[A-Za-z0-9\-\.\_\~\xA0-\u200D\u2010-\u2029\u202F-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]|(?:(?:%[EFef][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f])|(?:%[89A-Fa-f][0-9A-Fa-f]%[0-9A-Fa-f][0-9A-Fa-f])|(?:%[0-9A-Fa-f][0-9A-Fa-f])))+)))?\]?$/

This PR is costly to evaluate at a 5-10ms each time on my machine (Node 10)

What do you think is the correct solution to problem?

Faster startup time.

Will you be able to implement it?

Yes and no. We can do the work, but we need to identify where and what to do. Would you be open to a refactor of https://github.com/epoberezkin/ajv/blob/bc993deceada5cc152ba0fd3b2e300012b2330a0/lib/compile/resolve.js#L235 and https://github.com/epoberezkin/ajv/blob/bc993deceada5cc152ba0fd3b2e300012b2330a0/lib/compile/resolve.js#L207-L216 so that we just extract the relevant part?
Alternatively do you recommend opening up an issue on URI.js to simplify/rework that regexp?

@epoberezkin
Copy link
Member

Hi Matteo. I think opening an issue in uri-is can be helpful, regardless what we decide to do here. Could you do it to see what Gary says?

A possible solution could be to provide a way to replace uri-js with the core node Legacy url api that has the same api (https://nodejs.org/api/url.html#url_legacy_url_api) or with any custom implementation.

I am ok to refactor here but not to reimplement URI parsing - could you maybe show in the fork how it would look?

@mcollina
Copy link
Author

Hi Matteo. I think opening an issue in uri-is can be helpful, regardless what we decide to do here. Could you do it to see what Gary says?

I'll open an issue there, thanks.

I am ok to refactor here but not to reimplement URI parsing - could you maybe show in the fork how it would look?

I'm just wondering how much of the validation that URI.parse() does is needed in Ajv and why you are doing a full parse/serialize cycle instead of simply id.split('#')[0] + '#'.
An alternative can be to use WHATWG URL: https://nodejs.org/api/url.html#url_the_whatwg_url_api. It's implemented in C++ and pretty fast overall.

@epoberezkin
Copy link
Member

The problem is that JSON schema spec requires support of URIs that include URNs rather than just URLs, that was the reason to switch to uri-js: #423

Could you try in the fork if it still passes the tests if you change like you suggest? I don’t remember whether there was the reason...

@epoberezkin
Copy link
Member

Probably it would work, I don’t see why not :)

@epoberezkin
Copy link
Member

@mcollina a simple implementation with split passes the tests, so it should work in most cases:

function getFullPath(id) {
  return (id ? id.split('#')[0] : '') + '#';
}

But it would produce different results with IDs that are IRI: https://www.npmjs.com/package/uri-js#iri-support

So probably you should change it in a fork and ask to optimise in uri-js.

@mcollina
Copy link
Author

Thanks for the verification, and for the full check about IRIs.

I’m currently on vacation, I plan to take a look again when I came back.

@jimmywarting
Copy link

Can't we rely on just new URL() instead of depending on some external dependency?

@epoberezkin
Copy link
Member

@jimmywarting if it provides all the functionality and the tests pass - it can be replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants