-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIMEType Performance #38
Comments
The implementation is different in how they parse the Among some other differences in how the util API is made and exported, we can use the I'll dedicate time to this during the week 🙂 |
cc @nodejs/undici |
Undici implements its own content-type parser, which like the nodejs one, is 100% spec compatible. As pointed out above it's unlikely that the other userland modules are totally compliant. |
Giving some updates, I've been doing some optimizations at both ends, starting with Undici. Results:
One important thing that might be good to think about is that being spec compliant might limit the room for optimizations, meaning that most likely any optimization applied at Undici or Node will be able to fully reach or surface I'll be aiming to have another round of optimizations, now including Out of Scope |
Thanks @metcoder95 for the awesome work. I didnt get any chance to look over the implementation but curious: would it be faster if we implemented the same exact code using C++? |
It's not a performance concern in undici - it's only used for parsing the content-type in |
The
About specification compliant can be either follow the exact step or optimize the step with output compliant.
|
Sure thing!
+1
I'll continue with the research and open subsequent PRs 🙂 |
Opened PR as discussed, from the insights gathered of it, I'll proceed with the Reference: Undici#1871 |
The PR is merged, I'll use the insights from the assessment to continue with |
After having played with the
Baseline:
-> With string
Scenario - 1Over the same string
It showed ~11% improvement. Scenario - 2Over the same string
Similar improvements in the results. The biggest difference is how Undici parses the parameters of the MIME string, as Node.js uses more a direct approach by doing constant usage of My take is that an iterative approach will make things easier and improve the performance, as will also reduce the usage of Primordials which are well known for adding slight overhead. I'll prepare a PR next week with the proposal so this is more graphic and will make it easier to provide feedback and discuss 🙂 |
Hey! To give some heads-up about the progress of the work 🙂
After several evaluations I was able to come up with the following results:
confidence improvement accuracy (*) (**) (***)
util/mime-parser.js n=100000 strings='application/json; charset="utf-8"' *** 12.07 % ±2.78% ±3.72% ±4.87%
util/mime-parser.js n=100000 strings='text/html ;charset=gbk' *** 8.13 % ±2.18% ±2.91% ±3.79%
util/mime-parser.js n=100000 strings='text/html; charset=gbk' *** 4.20 % ±2.35% ±3.13% ±4.08%
util/mime-parser.js n=100000 strings='text/html;charset= "gbk"' *** 10.10 % ±2.02% ±2.69% ±3.50%
util/mime-parser.js n=100000 strings='text/html;charset=GBK' *** 11.53 % ±2.61% ±3.48% ±4.57%
util/mime-parser.js n=100000 strings='text/html;charset=gbk' *** 8.37 % ±2.23% ±2.96% ±3.86%
util/mime-parser.js n=100000 strings='text/html;charset=gbk;charset=windows-1255' *** 14.77 % ±1.66% ±2.21% ±2.88%
util/mime-parser.js n=100000 strings='text/html;x=(;charset=gbk' *** 12.14 % ±3.32% ±4.41% ±5.75% The improvements were between ~8-12% on average, which is good but still not sure if enough to call it successful.
the new baseline varied to:
Almost ~50%, which seems to be quite out compared to the results of Node benchmarks. I wanted to put them on the table but wouldn't ensure they are precise as they were varying by ~20% on several iterations. Looking forward to your feedback and seeing what points can be improved that I could missed 🙂 |
really good! good job @metcoder95 |
Hi guys! Is there a way to move nodejs/node#46607 forward? 🙂 |
Hmm. Didnt know this thread exists. Didnt think that fast-content-type-parse is not spec compliant. Can somebody point to a test suite for spec compliance? |
thx |
BTW, I believe that when referring to |
LOL, I forgot about this issue and created #120 Dang. Here my post from the other issue: Maybe we should check how we can improve this overall? I think that MIMEType can be improved significantly. I created rn a PR regarding lazily parsing the MimeParams. Also i could improve toASCIILower with this snippet: const ASCII_LOOKUP = new Array(127).fill(0).map((v, i) => {
if (i >= 65 && i <= 90) {
return StringFromCharCode(i + 32);
}
return '';
})
function toASCIILower(str) {
let result = '';
for (let i = 0; i < str.length; ++i) {
const code = StringPrototypeCharCodeAt(str, i);
if (code > 90 || code < 65) {
result += str[i];
} else {
result += ASCII_LOOKUP[code];
}
}
return result;
} Also why do we use SafeStringPrototypeSearch in encode? Cant we just use RegexPrototypeExec? Cant we manually inline the encode function into toString? Is there a faster way to iterate the SafeMap? Why do we use a SafeMap and not a NullObject (because of the generator functions? Can we avoid the generator fns? Does it make sense to avoid the generator fns? Can we optimize parseTypeAndSubtype? Maybe also lazily parse the values? We need to throw errors in special cases, but other than that, we just have to store the string. Why do we call in MIMEType toString Questions over questions... |
No worries, happy to see there is progress on this front. I really want to continue my work over nodejs/node#46607, maybe now that @Uzlopak is a member, we can make it move forward with reviews and adjustments? 👀
I think it was mostly commodity, didn't find counters while working on the PR of using a plain object. Can be sealed with
Which generators?
I think we can lazy it at much until is required to provide the type and subtype back; but that won't help much as usually, it is within the hot path so it cannot be delayed too much the parsing.
Sealing the call I'd say? |
nodejs/node#49889 got merged. @metcoder95 |
Sure thing, let me put it up-to-date and run the benchmarks agreed. Didn't had the time yet to take another look, I'd try to do it this week 👍 |
I know that
MIMEType
is recently added and underExperimental
flag. But I want to draw a attention on it's performance.It is not the slowest compare to the userland module, but it can do better.
I am considering replacing
MIMEType
for theContent-Type
parsing infastify
but the performance is the main drawback.The main reason behind using
MIMEType
is the.essence
property did a great job in unifying theBrowser
andServer
in terms ofContent-Type
guessing / matching and prevent the similar security issue like GHSA-3fjj-p79j-c9hhRefs fastify/fastify#4502
The text was updated successfully, but these errors were encountered: