Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace C HTTP parser with JS HTTP parser #1457

Closed
wants to merge 67 commits into from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Apr 18, 2015

Background

There had been brief mention during a TC meeting some time back about the
possibility of having a js HTTP parser for io.js core. I was looking for a new
and interesting project to work on in my free time, and I'm a protocol
enthusiast, so for the past week or so I've been working on writing a js HTTP
parser.

This parser uses regular expressions for most of the parsing. Others have
expressed concerns about using regular expressions due to the (negative)
performance impacts they would most likely bring. However I wanted something
that would be easy to get right (since the RFCs provide the appropriate
grammars) and even though the resulting regular expressions would turn out to be
pretty large (for request line parsing), they are broken up in the source code
to make it easier to follow. I was also curious to see just how much of a
negative performance hit there would be.

Right now all io.js tests pass, including the (relevant) tests I ported from
joyent/http-parser's tests.

Non-breaking Behavioral Changes

  • Only one "onHeaders" callback (no separate onHeaders and onHeadersComplete
    callbacks)

  • pause() and resume() are no-ops. The only place these methods are used
    currently is in the http server and even there they are not used correctly
    (as far as I can tell). From what I have seen in the C parser implementation,
    pause() and resume() only change the current state and nothing more
    (e.g. no saving the existing chunk or automatically re-parsing the existing
    chunk on resume or anything like that). As far as I can tell it's up to the
    end user (http server in this case) to re-execute() the same, previous
    chunk after calling resume() in order for the rest of the original data to
    be processed.

    If any of this is incorrect, please let me know.

Backwards Incompatibilities

  • Currently there are no .code or .bytesParsed properties added to Error
    objects returned by the parser. This should be trivial to re-add though.

  • Because of the difference in parsing strategies, the js parser cannot
    terminate as quickly because it tries to buffer until a CRLF is seen for
    request/response and header lines for example. I've attempted to workaround
    this somewhat when parsing the first bytes for request/response lines since
    there is one test (test-https-connecting-to-http) where TLS handshaking is
    written to the parser. Basically I check that the first byte is a printable
    ASCII byte before continuing. This will keep out binary TLS data, but it is
    obviously not foolproof. I am open to suggestions to remedy this problem.

  • Folding whitespace behavior is conformant with RFC 7230:

    "A user agent that receives an obs-fold in a response message that is
    not within a message/http container MUST replace each received
    obs-fold with one or more SP octets prior to interpreting the field
    value."

    It should also be noted that RFC 7230 now deprecates line folding for HTTP
    parsing, FWIW. This parser replaces folds with a single SP octet.

  • Optional whitespace is removed before interpreting a header field value, as
    suggested by RFC 7230:

    "A field value might be preceded and/or followed by optional
    whitespace (OWS); a single SP preceding the field-value is preferred
    for consistent readability by humans. The field value does not
    include any leading or trailing whitespace: OWS occurring before the
    first non-whitespace octet of the field value or after the last
    non-whitespace octet of the field value ought to be excluded by
    parsers when extracting the field value from a header field."

    joyent/http-parser keeps trailing whitespace. This parser keeps neither
    preceding nor trailing whitespace.

  • Enforces CRLF for line endings instead of additionally allowing just LF.

  • Does not allow spaces (which are invalid) in header field names.

  • Smaller maximum chunk/content length (2^53-1 vs 2^64-2). Obviously it's
    not impossible to handle a full 64-bit length, but it would mean adding
    some kind of "big integer" library for lengths > 2^53-1.

  • No special handling for Proxy-Connection header. The reason for this is
    that Proxy-Connection was an experimental header for HTTP 1.0 user agents
    that ended up being a bad idea because of the confusion it can bring. You
    can read a bit more about this in
    RFC 7230 A.1.2

Various Performance Improvement Techniques

  • Custom buffer.indexOf() that only looks for CRLFs and avoids trips to
    C++ land.
  • Manual parsing of hex numbers instead of using parseInt(..., 16). This
    consists of using a lookup table and multiplying and adding. This technique
    is borrowed from joyent/http-parser.
    jsperf
  • Use faster case-insensitive exact string matching instead of
    /^foo$/i.test(val), val.toLowerCase().indexOf('foo') === 0, or something
    similar. This technique is again borrowed from joyent/http-parser, which
    does charcode | 0x20 to get the "lowercase" version of the character.
    Additionally, initial string length comparisons allow early terminations
    since we look for exact matches.
    jsperf
  • Use faster string trim() that loops on the string twice, once to find the
    first non-whitespace character and second to find the last non-whitespace
    character. Returning the str.slice() between these two indexes turns out
    to be much faster than str.trim(). The performance difference is much
    larger when the string has no leading or trailing whitespace.
    jsperf
  • Use val !== val instead of isNaN(val). NaN is the only value in
    JavaScript that does not equal itself or anything else.
    jsperf

Benchmarks

Using the new (included) parser benchmark, here are the results I'm seeing:

(units are execute()s per second)

js C
small-req 837,811 613,590
small-res 16,910,279 3,703,975
medium-req 19,511,418 124,117
medium-res 19,477,258 128,464
medium-req-chunked 74,515,156 124,903
medium-res-chunked 74,412,044 126,669
large-req 9,696 8,851
large-res 784,058 776,047
large-req-chunked 90,463,242 DNF
large-res-chunked 90,894,841 DNF

DNF = terminated after waiting a few minutes to finish.

I was kind of suspicious about the large differences in some of the benchmarks,
so I attempted to explicitly call parser.reinitialize(...) after
execute()ing each full message for the C parser, thinking maybe that was needed to
get correct data. When I made that change I got these numbers:

C
small-req 609,880
small-res 653,104
medium-req 292,589
medium-res 247,354
medium-req-chunked 2,769,117
medium-res-chunked 2,623,138
large-req 8,870
large-res 762,286
large-req-chunked 127,153
large-res-chunked DNF

As you can see that resulted in speedups for most benchmarks, but a reduction in
the small-res benchmark. Either way, the numbers still are not close to those
for the js parser. I tried to use varied types of inputs and tried to make the
benchmarks as fair as I could tell.

If I am missing something or measuring wrong, please let me know.

Final Notes

  • HTTP/0.9 responses are not supported. This is barely worth mentioning since
    I think it's safe to say nobody is using HTTP/0.9 anywhere anymore. HTTP/0.9
    responses simply consist of a body (no response line or headers or anything),
    so there is no way to differentiate an HTTP/0.9 response from bad input.
  • While making necessary changes to the existing _http_*.js code, I found
    that the old binding API was still being used
    (e.g. .execute(data, 0, data.length)). I have now corrected that, which
    may bring some speed improvements by itself.
  • The request line regexp is pretty large and I'm almost certain it may be
    possible to further improve upon them (performance-wise) while still being
    correct. For example, catastrophic backtracking could be minimized by
    emulating atomic groups by using positive lookaheads with capturing groups
    (this means more data is saved because of the capture groups, but maybe that
    won't be as bad as it might sound): (?=(atomic pattern))\1
  • Right now make test will time out for the ported durability tests, but I'm
    not sure how to handle that, especially since it would take even longer on
    some of the embedded devices we currently test on.

@@ -0,0 +1,890 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be best to make this an internal module, a la freelist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things

  1. this is deprecating process.binding('http_parser') which means we're gonna break some people, maybe too many to be able to remove this right away.
  2. if we don't expose this parser for use externally then I think it should be it's own library and vendored in like readable-stream is so that people can require it on npm. if we do that then I think it's reasonable to make it internal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is deprecating process.binding('http_parser') which means we're gonna break some people, maybe too many to be able to remove this right away.

process.binding('http_parser'), like other bindings, is and always has been internal. If people are using it directly and removal breaks their code, the onus is on them.

Besides, they can probably substitute it with https://github.com/creationix/http-parser-js.

if we don't expose this parser for use externally then I think it should be it's own library and vendored in like readable-stream is so that people can require it on npm. if we do that then I think it's reasonable to make it internal.

I'm not sure I follow this logic. If someone feels strongly enough about it, they can always break out the code into a standalone module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be it's own library and vendored in like readable-stream is

readable-stream still pulls from iojs itself, because the library has different goals than iojs itself (it favors broad compatibility where io.js favors performance.) It seems to me like the JS parser is likely to make similar tradeoffs and thus run into the same problems with vendoring.

@mikeal
Copy link
Contributor

mikeal commented Apr 18, 2015

Crazy excited about this.

@jbergstroem
Copy link
Member

Test-parser-durability seems to fail on multiple platforms: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/nodes=freebsd101-32/543/console

@mscdex
Copy link
Contributor Author

mscdex commented Apr 18, 2015

@jbergstroem Yes, I noted the timeout issue in my Final Notes section.

I'm not sure what to do about it...

@Qard
Copy link
Member

Qard commented Apr 18, 2015

Nice! I'd love to see this ready for 2.0.0.

success

@mscdex mscdex added http Issues or PRs related to the http subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. labels Apr 18, 2015
@tellnes
Copy link
Contributor

tellnes commented Apr 18, 2015

Great work. I will review this when I have some time.

+1 to move this to its own repository like readable-stream and bundle it.

I should also mention that I've done a little experimentation on a http/2 implementation in pure js. It does somewhat works, but is not nearly finished. But my hpack implementation should be complete and is available here.

I have not read your code yet, but when we are doing a big change to the http implementation, we should make it ready for a http/2 addition later. I'll come back to you on that.

As @mikeal said, this is crazy exiting.

@jbergstroem
Copy link
Member

@mscdex sorry, I missed that note. Suggesting we move test-parser-durability to pummel/ since that's where most "slow" tests belong.

@brendanashworth
Copy link
Contributor

Oh man! This is really great. Can't wait for this to be in 👍

@yosuke-furukawa
Copy link
Member

Great!! 👍

I benchmarked using benchmark/http.sh. this is the result (requests/sec).

execution time iojs js-parser iojs v1.x
1 17405.99 16658.43
2 18003.14 17295.41
3 17615.15 17422.55
4 17548.23 17303.44
5 18256.06 16768.77
6 18554.80 17416.20
7 18398.98 17422.73
8 18330.15 17505.53
9 18197.24 17439.78
10 18205.21 17467.43
11 18140.70 17466.35
12 18146.29 17342.32
13 18105.43 17468.26
14 18233.52 17552.08
15 18163.47 17516.12
16 18060.45 17384.31
17 18232.71 17454.08
18 18188.39 17399.44
19 18211.15 17493.88
20 18174.35 17313.13
avg 18108.57 17354.51

1.04 % faster !

machine spec:
CPU : 2.3 GHz Intel Core i7
Memory : 16GB
OSX Yosemite


var RE_CLOSE = /close/i;
var RE_KEEPALIVE = /keep\-alive/i;
var RE_UPGRADE = /upgrade/i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be /^upgrade$/i instead? ( #828 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the Connection header is a comma-separated list, so there could be more than one value. I'm not sure how often that occurs in practice though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just pushed a commit to improve multi-value matching.

@bnoordhuis
Copy link
Member

Nice work, Brian. You mention that the parser buffers and rescans? How does it handle slowloris attacks?

Enforces CRLF for line endings instead of additionally allowing just LF.

Requests with only LF line endings happens in the wild. It must be supported.

No special handling for Proxy-Connection header.

Proxy-Connection is rare but it is used in the wild.

On the other hand, I believe support for it in the current parser is broken as well. I remember raising a couple of issues about it at joyent/node and I don't think they have been fixed.

HTTP/0.9 responses are not supported.

I remember there was one guy that raised a bug about some network appliance that only spoke HTTP/0.9. Probably an extreme fringe case, though.

@monsanto
Copy link
Contributor

For what it is worth, RFC 7230 explicitly says there is no longer any obligation to support HTTP/0.9.

I think it is wise to not pretend to support it in io.js. Whatever code paths there are for HTTP/0.9 will likely only be exercised by the test suite, or by people looking for bugs to exploit. Seems like a magnet for code rot. One can always put nginx in front of io.js if this capability is needed.

@mscdex
Copy link
Contributor Author

mscdex commented Apr 18, 2015

@bnoordhuis

Nice work, Brian. You mention that the parser buffers and rescans? How does it handle slowloris attacks?

The js parser currently only executes the regexp once a full line is received. I was not aware of the "slowloris" attack beforehand, but I imagine one mitigation might be to add max line lengths for start lines, header lines, chunk size lines, etc. in addition to the header bytes limit that covers both start line + header lines.

EDIT: I should add that having additional max line lengths will only help so much. It won't prevent someone from opening a ton of connections and trickling in valid requests to use up server resources though.

Requests with only LF line endings happens in the wild. It must be supported.

I've never seen this in reality. Do you have any modern examples?

@connor4312
Copy link
Contributor

This is awesome, great work @mscdex!

* {Array}

A list of the HTTP methods that are supported by the parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a decent amount of code that accesses this. Maybe we should deprecate in docs first?

There is a full replacement available [1], but if we are still using the list in core, is there a reason to not export it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move to a model where we just support any method? I argued with ry back in the day about this, but people at the IETF add HTTP methods for crazy new protocols all the time. Ryan hated this and so he intentionally limited the number of methods supported but there's no real technical reason for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here's one of my favorites http://greenbytes.de/tech/webdav/draft-dusseault-caldav-05.html#METHOD_MKCALENDAR CalDAV's MKCALENDAR event which "makes a calendar" and totally needs its own HTTP method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the reasons for limiting the supported verbs is to be able to fail earlier on bad input. That is just a guess though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as the spec is concerned new verbs can be added at any time and it's not the parsers job to fail on any specific verb except where semantics are defined specific to that verb and do not occur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C parser verbs are hardcoded into state machine, so we didn't really have a choice.

But the new parser should support any valid verbs imho, it's not its job to enforce any limitations like this.

PS: I want BREW method, 'cause hey it's RFC :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expose a way for user-land to maybe add verbs they explicitly want to support? For all verbs that aren't hard coded (i.e. common verbs) I'd want them to just fail as bad input. If someone has a specific need for an unsupported verb they can make a PR or if we have a way to add new verbs in user-land, use that. We (at Grooveshark) get crazy verb requests all the time (for instance thousands of LOAD requests per day) and drop them since they're not defined anywhere and we have no idea how we'd even respond.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how standard verb like MKCALENDAR is any different from non-standard one like HELLOWORLD. Your application will likely respond the same way for both of them.

And since treating all uncommon methods as bad input is undesirable, we could just allow all methods.

Seriously, what is the benefit of enforcing any limits on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how standard verb like MKCALENDAR is any different from non-standard one like HELLOWORLD.

One you need to make a calendar for and the other you should probably respond with a HELLO. If you're a CalDAV app you should respond to MKCALENDAR by making a calendar and respond to HELLOWORLD with a 405 Invalid Method. But majority of people (if not all) using this will respond to MKCALENDAR by NOT making a calendar and also probably not responding with 405, therefore not doing what the client that made the request expected.

@vincentbernat
Copy link

A few remarks:

  1. The original parser was memory-bound (no dynamic memory allocation was done outside the heap and the stack depth was quite limited). I believe this was an essential characteristic over speed. Of course, a user would have to allocate memory to be able to extract some information, but a large bogus header would be handled just fine by the parser (and could be just ignored by the upper layer for example). I also believe that a side effect would be to make the V8 stop-the-world garbage collector triggers more often and degrade performance on concurrent queries.
  2. The bundled benchmark always use the same request to be parsed. I believe this gives a large advantage to JS thanks to JIT compilation.

I don't say that the JS parser is worse (or better), I just wanted to shed some light on those two aspects. Unfortunately, I don't have time to investigate them.

RE_AUTHORITY.source + '|\\*)');
var RE_REQUEST_LINE = new RegExp('^([!#$%\'*+\\-.^_`|~0-9A-Za-z]+) (' +
RE_REQUEST_TARGET.source +
')(?: HTTP\\/1\\.([01]))?$');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using a template string be more sensible for some of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I'm not up on the ES6 stuff.

@whizzter
Copy link

  • The RegExp(s) needs to be vetted for exponential behavior since the V8 regexp engine works by backtracking and this could lead to the parser becoming a source for denial of service attacks due to exponential behaviour. See this article for details, https://swtch.com/~rsc/regexp/regexp1.html

As an example on my computer the following regexp takes 23 seconds! to execute:

/a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaa/.exec("aaaaaaaaaaaaaaaaaaaaaaaaaaaaa")
  • Tried doing a straight port of the C parser to JS working on Buffers? plain array accesses are quite fast these days and the main gain of the regexp parser compared to the C code is probably the penalty of crossing over from the JIT code to a native binding. (Afaik V8 regexps are fast since they use the same JIT backend, so regular JS code should have a fighting chance to be as faster also)

@mscdex
Copy link
Contributor Author

mscdex commented Apr 18, 2015

@vincentbernat

  1. Right, I recognized this difference in parsing strategy as the second note in my backwards incompatibilities. I am open to suggestions for any improvements of course.
  2. You may be right, I am open to adding more/better benchmarks.

@whizzter

  1. I already noted this in the 3rd bullet point in my final notes.
  2. I started to do a straight port about 1/2 to 3/4 of the way through my initial implementation to see how it would compare, but I did not get very far code-wise. It's quite a large task (larger than creating the one I wrote from scratch IMHO) and I'm not sure I will continue working on the port. If someone else is interested in porting it, it sure would be interesting to compare performance. One of the main goals with this parser is that I wanted to be able to make it both easy to follow and easy to get correct (by making regexps from the RFC grammars).

@rlidwka
Copy link
Contributor

rlidwka commented Apr 18, 2015

Enforces CRLF for line endings instead of additionally allowing just LF.

That's a bad idea, because it does not allow simple testing of an http server with "netcat".

Does not allow spaces (which are invalid) in header field names.

I'm not sure why this test was included in joyent/http-parser, but we'd better make sure it doesn't break backward compatibility.

HTTPParser.RESPONSE = 1;
module.exports = HTTPParser;

function indexOfCRLF(buf, buflen, offset) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchusovitin As mentioned in my performance improvement techniques, the custom indexOfCRLF() is faster than buffer.indexOf().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds about right. Buffer#indexOf() is a naive implementation ATM.

@jbergstroem
Copy link
Member

@dcousens keyword being a WG vs all WG's.

@trevnorris
Copy link
Contributor

A reason I would like this to be held in a separate repo is for the utility. One example is that businesses have asked me how they can inject themselves into the data stream in between the TCP and HTTP parser. They need to do this for a variety of reasons. For example, decrypting cookies. Unfortunately this is impossible to do today without horribly hacking into what should be core internal functionality.

Most of the cases like the one above could be solved by allowing devs the ability to use the parser similar to a transform stream. Then it could just be plugged into the pipeline. Though at that point it seemed to me like it would live more appropriately in its own repo and io.js would simply bring it in.

@mikeal
Copy link
Contributor

mikeal commented Jul 8, 2015

+1 to the code living in its own repo.
+1 to publishing the parser as a module with its own semver.
-1 to not bundling and shipping with core, we should treat this similar to how we treat readable-stream.

I don't think this will get its own WG yet, there aren't enough people actively engaged in working on it for one thing. Perhaps if this module starts to take on HTTP2 support the contributors would go and it would be a good idea to get its own WG.

mscdex added a commit to mscdex/io.js that referenced this pull request Jul 9, 2015
See this[1] comment for more details and benchmark results.

Notable changes:
 * Removed all regexps. Only request URLs are not validated
currently.
 * Added globally configurable start line length limits
 * Chunk size parsing no longer line buffers
 * Lines can end in just LF too
 * Faster than the original regexp-based implementation

[1] nodejs#1457 (comment)
@ronkorving
Copy link
Contributor

One concern: even if faster than the C version, what about GC? Will we see increased GC freezes because of this?

return false;
}

if ((flags & FLAG_CHUNKED) > 0 || this._contentLen !== null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition already yields a boolean, so you shouldn't need an if-statement (and potential branch misprediction). Micro-optimization, but I guess that's where you're at at this point :)

return (flags & FLAG_CHUNKED) > 0 || this._contentLen !== null;

@Fishrock123
Copy link
Contributor

Among other things, @nathan7 noted to me at CascadiaJS, that, because of how this uses strings, this would have left us much more vulnerable to the recent ut8 bug.

@mscdex is this reasonable to do with buffers in js?

@mscdex
Copy link
Contributor Author

mscdex commented Jul 12, 2015

@Fishrock123 I'm not sure I understand how that issue could have been avoided in a js parser like this. At some point there has to be Buffer-to-string conversion in order to easily work with (not to mention backwards compatibility) things like headers, request urls, request methods, response reason text, etc.

@edef1c
Copy link
Contributor

edef1c commented Jul 19, 2015

@mscdex Sure, but that should take place at the last moment, just before the surface API. Everything before the HTTP body in a legit request is pure 7-bit ASCII, so there's no real complication in handling this correctly. (no multi-byte codepoints, etc)
I haven't had the time to review this in full, but I bet there's a whole lot of allocation occurring that's unnecessary. The C version is zero-copy, and the JS bridge allocates strings minimally.

@edef1c
Copy link
Contributor

edef1c commented Jul 19, 2015

There's also the fact that this retains HTTP_BOTH, which is used nowhere.
It serves only to complicate the code and obscure bugs.

@mscdex
Copy link
Contributor Author

mscdex commented Jul 19, 2015

@nathan7 Actually request URLs are treated as UTF-8 in node/io.js, since node/io.js utilizes the less-strict parsing option of joyent/http-parser. Also technically HTTP header values are supposed to be latin1, but that's still a single byte encoding.

I'm not sure what you mean about HTTP_BOTH. This JS parser does not implement that because as you said, node/io.js does not use that feature of joyent/http-parser.

@edef1c
Copy link
Contributor

edef1c commented Jul 19, 2015

Oh, whoops — I misread the diff apparently. Goodbye, HTTP_BOTH 😄

@brendanashworth
Copy link
Contributor

My thoughts on a separate repo are:

  • we could get cool things like code coverage and travis-automatic-CIs (we don't have to worry about cross platform compat and weigh down core CIs, it's all js!)
  • we don't have to run the monolithic test suite for every change, and we can use a different testing framework (ex, tape): faster for changing and testing code, easier to organize tests; otherwise we'll end up with another thousand plus line file
  • it is faster to iterate on changes; landing this parser would be less of a massive "one flick change" but instead reviewed and fully unit tested before it lands in core
  • people that want to use it outside of core don't have to require internals - they have to deal with whatever design decisions the maintainers make (which are core-oriented), but its supported
  • the parser can eventually get more strict and follow its own semver, which syncs with core semver when core feels ready to move forward
  • we don't have to put the parser into one single file for development
  • better benchmarks and profiling!

@Matt-Esch
Copy link

Is the C parser actually slower or is the boundary the problem?

@mscdex
Copy link
Contributor Author

mscdex commented Nov 17, 2015

@Matt-Esch I don't recall now. I haven't ran any benchmarks since changes were made to the way the http parser receives incoming data from the socket (it reads directly from the socket in C++ land instead of going through JS). It's entirely possible these changes could have made a considerable improvement in the benchmarks.

@trevnorris
Copy link
Contributor

@Matt-Esch The boundry is the problem. As you can see by my pending PR #3780 that I was able to increase the performance of the header parser using some v8 cheats. There is more we can do there as well.

@benjamingr
Copy link
Member

Status?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 22, 2016
@estliberitas estliberitas force-pushed the master branch 2 times, most recently from 7da4fd4 to c7066fb Compare April 26, 2016 05:22
@benjamingr
Copy link
Member

I'm going to close this as no activity has been had here for about half a year. Feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. semver-major PRs that contain breaking changes and should be released in the next major version. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.