Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undeprecate _stream_wrap #34296

Open
pimterry opened this issue Jul 10, 2020 · 25 comments
Open

Undeprecate _stream_wrap #34296

pimterry opened this issue Jul 10, 2020 · 25 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@pimterry
Copy link
Member

pimterry commented Jul 10, 2020

Is your feature request related to a problem? Please describe.

_stream_wrap was recently deprecated because it "doesn't have a reasonable use outside of node".

I think I have two reasonable use cases, and I'd really like this to be undeprecated. As far as I can tell it's going to exist internally anyway, since the underlying JSStreamSocket implementation is actively used by TLSSocket, it works perfectly, and it already supports my use cases, so this is purely a question of whether it should be accessible externally.

My use cases:

  1. I'm trying to sniff packets to detect HTTP/2 before the http2 server takes over, as described in http2: read from connection stream before session create #17132 (in my case: because I want to support HTTP/1, HTTP/2, HTTPS/1 and HTTPS/2 all on the same port). As described in that issue that's not normally possible, understandably, because sockets are tightly integrated with the stream internals in a way that disallows these kinds of interactions, and changing that has performance implications for the common case. However, this is possible if you provide some psuedo-socket, backed by a separate stream, such as _stream_wrap.

    There are other users with this use case, including the original author of that issue, and every user of httpolyglot, which does this for HTTP/1 (but which can't HTTP/2 using the native http2 module because of this issue). Issues like http2: cannot handle session destroy error emitted by Http2Session.onGoawayData #20999 exist because this isn't available, so people have to implement it externally for themselves.

  2. I'm trying to tunnel HTTP connections through a stream (in this case, an Http2Stream). I'm building a mitm proxy, which accepts h2 CONNECT requests and then locally handles them. That means after the CONNECT I get an h2 stream, which I need to treat as a fresh net.Socket connection. _stream_wrap does this perfectly.

    This is less common, afaik, but it's an example of wanting the networking APIs to be generally more composeable, so that socket-based code can fully interoperate with streams or stream transformations.

Describe the solution you'd like

Make _stream_wrap a public API again, removing the deprecations.

Maybe even remove the underscore too, and make it a 1st class API with docs etc?

Describe alternatives you've considered

The alternative I can see is to reimplement _stream_wrap from scratch externally.

That's not really straightforward, as implementation here requires native code (https://github.com/nodejs/node/blob/master/src/js_stream.cc), which I'd also need to duplicate, and my pipeline to usefully build & distribute all this ends up being pretty complicated, especially to get it working across node versions.

It would also be far less reliable (I'd much rather use the real & well-maintained version of this, especially since it touches lots of node internals), and it's annoying to have to do all this when a fully working implementation is shipping within node anyway.

@Trott Trott added the stream Issues and PRs related to the stream subsystem. label Jul 10, 2020
@Trott
Copy link
Member

Trott commented Jul 10, 2020

@nodejs/streams

@mcollina
Copy link
Member

This is really outside of the streams domain.

@Trott
Copy link
Member

Trott commented Jul 11, 2020

This is really outside of the streams domain.

OK, thanks. In that case, I guess maybe I'll try pinging the people who authored/approved the deprecation in the first place.

cc @sam-github @BridgeAR @mcollina @jasnell @addaleax @tniessen

@pimterry
Copy link
Member Author

Any thoughts on this, pinged people?

Right now I'm still shipping new code dependent on this API, because there's no easy alternative. It'd be really useful to get an idea of whether there's hope that it'll be undeprecated, or whether I seriously need to start looking at the practicality of extracting the existing built-in module as a native node addon.

@mcollina
Copy link
Member

I think it's possible to undeprecate it and make it a public module. However this is significant work: from API design, hardening against user input, docs and tests.

@pimterry Would you like to work on this?

@pimterry
Copy link
Member Author

Yes, I can do that.

I'm very happy to write docs & tests for the existing implementation as I understand it. I can start prepping an initial PR for that straight away, if there's appetite for it.

Is there specific hardening & API design work you have in mind? Happy to work on that if somebody can point me in the direction of outstanding issues there.

@jasnell
Copy link
Member

jasnell commented Jul 21, 2020

No particular opinions on this. If it's useful to userland then ok by me

@mcollina
Copy link
Member

@addaleax wdyt?

@addaleax
Copy link
Member

I don’t see any problem with this – I’d probably put it onto the net module instead of having it be its own public module.

That being said: I’m not sure whether this really helps covering the listed use cases – at least HTTP/2, HTTP/1 and TLS should all accept any kind of Duplex stream as the underlying connection, whether it’s wrapped into a net.Socket or not, because they use _stream_wrap internally to achieve that. Maybe I’m missing something here, though.

@pimterry
Copy link
Member Author

I’d probably put it onto the net module instead of having it be its own public module.

Absolutely, makes sense to me.

I’m not sure whether this really helps covering the listed use cases

For the CONNECT case, you're actually completely right @addaleax, I'm wrong - the stream wrapper isn't required with node's built-in servers. In my case at the time I was using node-spdy for HTTP/2 with Express compatibility. That builds on net.Server directly, and does make strong assumptions that connections are sockets, and does need a wrapper. I suspect that's true for other similar userspace networking libraries.

For the socket-peeking case, this code requires the stream wrapper:

const net = require('net');
const http = require('http');
const http2 = require('http2');
const Wrapper = require('_stream_wrap');

const rawConnListener = (socket) => {
    const data = socket.read(3);

    if (!data) { // Repeat until data is available
        socket.once('readable', () => rawConnListener(socket));
        return;
    }

    // Put the data back, so the real server can handle it:
    socket.unshift(data);

    if (data.toString('ascii') === 'PRI') { // Very dumb preface check
        console.log('emitting h2');
        h2Server.emit('connection', new Wrapper(socket)); // <-- here
    } else {
        console.log('emitting h1');
        h1Server.emit('connection', socket);
    }
}

const rawServer = net.createServer(rawConnListener);

const h1Server = http.createServer(() => {
    console.log('Got h1 request');
});

const h2Server = http2.createServer(async () => {
    console.log('Got h2 request');
});

rawServer.listen(8080);

This creates a net.Server, which peeks at the incoming data, and then directs the connection to a different server depending on the first byte. This is effectively what httpolyglot does, to support HTTP 1 & 2, in plain text & HTTPS, all on one port, and similar to the use case described by another user in #17132.

Testing this with the wrapper:

➜ curl http://localhost:8080 
# Server prints 'emitting h1', 'Got h1 request'

➜ curl --http2-prior-knowledge http://localhost:8080
# Server prints 'emitting h2', 'Got h2 request'

Without the wrapper:

➜ curl http://localhost:8080 
# Server prints 'emitting h1', 'Got h1 request'

➜ curl --http2-prior-knowledge http://localhost:8080
curl: (16) Error in the HTTP2 framing layer # <- curl fails
# Server prints 'emitting h2', never prints anything else

This case is not handled by the support in the built-in libraries because the stream is actually a socket, so the servers never wrap it. We need the wrapper here to force the server to treat it as a plain stream, because the user code wants to use it as a stream too.

@mcollina
Copy link
Member

@pimterry is that the only use case? If so it might be better to just expose that somehow. I think it'd be a pretty nice feature to have.

@addaleax
Copy link
Member

Okay, two things:

  • node-spdy is doing things that Node.js cannot reasonably be considered supportable by us, and, from my experience, breaks quite frequently after Node.js updates.
  • HTTP/2 should support h2Server.emit('connection', socket), just like HTTP/1 does. I think the main problem here is the unshifted data? That’s something that we can (and should) take care of, yes.

@addaleax
Copy link
Member

i.e.: I think this issue might be an X-Y problem, and all that might be needed here are bug fixes in order to account for the above use case.

@pimterry
Copy link
Member Author

node-spdy is doing things that Node.js cannot reasonably be considered supportable by us, and, from my experience, breaks quite frequently after Node.js updates.

I'm not using node-spdy (it is indeed quite broken in places right now), so that fine by me.

That said, I can imagine cases of other libraries that expect sockets, where as a user it'd be useful to be able to provide a fake socket from a stream instead, or where the library itself wants to support both easily.

For example, if you wanted to implement any other application protocol server on top of TCP in userspace, with similar behaviour to the built-in http/http2/tls servers, then that requires this stream wrapper, for the exact same reasons that the built-in implementations use it. It seems a shame to keep a working implementation of such a feature internal-only.

I'm personally not doing that though, so happy to ignore that case until somebody who specifically needs it appears, if that's preferable.

HTTP/2 should support h2Server.emit('connection', socket), just like HTTP/1 does. I think the main problem here is the unshifted data? That’s something that we can (and should) take care of, yes.

The conclusion of #17132 seemed to suggest that this is not something the core implementation would support directly. My impression is that emitting streams manually and similar tricks works, but isn't officially encouraged, to focus the servers on keeping common socket case simple & fast instead.

If that can be fully supported though then that's great! That'd be extremely useful from my pov. I do a lot of that, and fixing this issue specifically would indeed solve my problem.

@addaleax
Copy link
Member

My impression is that emitting streams manually and similar tricks works, but isn't officially encouraged, to focus the servers on keeping common socket case simple & fast instead.

I do consider it supported (and it’s tested). Maybe the documentation could be a bit clearer on this as well 👍

@pimterry
Copy link
Member Author

I do consider it supported (and it’s tested). Maybe the documentation could be a bit clearer on this as well 👍

Ok, I've opened a PR to improve this somewhat: #34531

And a separate bug for the issue with emitting sockets on an HTTP/2 server: #34532

Independently of those, I think there are still some arguments for supporting _stream_wrap, so I'll leave this issue open. I now don't have a specific case where I personally need it though other than working around bugs (both the new #34532 and existing #29902), so I'm happy for this to be closed if there's consensus that it isn't worthwhile.

@vinsonchuong
Copy link

Hi @addaleax! I've been trying to follow along with this issue. I noticed that you've worked on this in #34958 and #35678.

I've been trying to re-emit the socket (h2c.emit('connection', socket)) to a HTTP/2 cleartext server.

In v15.13.0, I'm currently getting:

Error [ERR_HTTP2_SESSION_ERROR]: Session closed with error code 2

So, I'm still forced to wrap the socket (h2c.emit('connection', new JSStreamSocket(socket))).

Am I mistaken that the above pull requests were intended to enable re-emitting the socket without needing to wrap it?

@addaleax
Copy link
Member

addaleax commented Apr 6, 2021

Am I mistaken that the above pull requests were intended to enable re-emitting the socket without needing to wrap it?

You are not mistaken, no. There’s also an explicit test that should verify that this actually works: https://github.com/nodejs/node/blob/e38d62a8c9e3494a464c0de6218178329f245440/test/parallel/test-http2-generic-streams.js

So maybe you could open a separate ticket for your specific example?

@pimterry
Copy link
Member Author

Am I mistaken that the above pull requests were intended to enable re-emitting the socket without needing to wrap it?

You are not mistaken [...] So maybe you could open a separate ticket for your specific example?

@addaleax @vinsonchuong There is still at least one bug with this that wasn't yet fixed, and there's already an open issue for it here: #34532

There's a full repro there, which still fails, but passes if a socket wrapper is manually added before emitting the connection. I've just tested and that still fails in the same way in v16.8.0.

The difference with the test you've referenced is that this example reads some data from the socket and then unshifts it, before emitting the connection. As discussed above though, that should be supported in theory, and it does work fine with HTTP/1 servers.

I've just run into this myself because I've hit some fun new internal assertion errors that get thrown when manually mixing stream wrappers with HTTP/2 from Node v16.7.0 onwards. I'll file a proper issue when I've traced that down (clues welcome!) but I'd love to see #34532 fixed somehow so that I don't need the wrapper in the first place. I'm not really familiar with the relevant internals here, but I'm happy to help too if you can point me in the right direction.

@pimterry
Copy link
Member Author

A quick update: this is still an ongoing issue, and some popular NPM packages like http2-wrapper (3.5 million installs a week) now ship with hacks that depend on internal Node.js fields to get access to the _stream_wrap constructor without triggering this deprecation warning.

They do that because it's the only way to support some HTTP/2 use cases right now given various HTTP/2 issues, most notably #34532, but there's a larger list that appears related in szmarczak/http2-wrapper#71 (@szmarczak do you know exactly which other issues are currently worked around by _stream_wrap?).

Is there anything we can do to get #34532 fixed? I'm very out of my depth in the HTTP/2 internals, but I'm happy to investigate if somebody can give me some pointers in the right direction to get started. Alternatively, undeprecating this module for a while longer so that essential workarounds for these issues don't trigger warnings would be very very helpful and would discourage packages from depending on more fragile Node.js internals instead.

@mcollina
Copy link
Member

I don't think we should or will undeprecate streamwrap. Instead, we should try to encourage people to stop using it. It's an internal API and it could/should change.

Unfortunately this comes down to somebody stepping up and develop that feature or funding somebody to do it for them.

The fact that no one has worked on this implies there is no one in the Node maintainers that are interested in this topic.

@pimterry
Copy link
Member Author

Unfortunately this comes down to somebody stepping up and develop that feature or funding somebody to do it for them.

The fact that no one has worked on this implies there is no one in the Node maintainers that are interested in this topic.

To be clear, I am very happy to do all the work necessary to undeprecate this (i.e. expose it somewhere sensible, write docs & tests). I offered to do so in the discussion further above, that's not the problem.

The issue is whether such a change would be accepted if I do that work, because it seems like the answer is no.

we should try to encourage people to stop using it

Right now, it's impossible to avoid using it for quite a few real-world HTTP/2 use cases. Those include custom connection sniffing for HTTP/2, and any use of some APIs like Docker BuildKit (details), because of current HTTP/2 API bugs/limitations.

I am sympathetic to the argument that we should fix those instead! That would be great. I agree that other than these issues, StreamWrap is not a strong candidate for a official node API. I think there are some non-bug-related real use cases - e.g. to let you use a duplex stream with any APIs (in Node itself or in any packages elsewhere) that specifically expect a net.Socket - but that's fairly niche and not super important. If it wasn't for the fact that Node already contains exactly the implementation required for this, so it's extremely easy to do, then there's no way it'd be a good candidate for inclusion.

Still, the current situation is causing real problems, and deprecating it right now is making this worse by pushing packages to tightly depend on node's internals instead.

For now, I'll dig into the HTTP/2 internals a bit tomorrow and see if I can work out what's causing #34532 - fixing that would make this much less of an issue, and might show a overall path towards resolving this properly once and for all. Any pointers on where to look would be very welcome!

@pimterry
Copy link
Member Author

It turns out #34532 was indeed fixed by @addaleax's PR (#34958), but that PR was never merged. #35678 was merged to fix this instead, but that actually only solves the issue for the HTTP/2 client case, not for servers.

I've combined the two to fix this for HTTP/2 servers as well in a new PR: #41185

I'm doing some further testing now to see if this covers other HTTP/2 use cases for wraps (#29902), but if this does resolve all the issues so that _stream_wrap is no longer necessary then of course I'm much happier with deprecating it!

@szmarczak if you've got a list of issues that require the stream wrapper in http2-wrapper and/or Got today which you can share, that would be really useful to confirm this 🙂.

@szmarczak
Copy link
Member

szmarczak commented Dec 20, 2021

The only use case I see here is to mimic a TCP stream (I do that in http2-wrapper) 🤔 However I agree with @mcollina that we should not expose it and wait for a better solution.

@pimterry
Copy link
Member Author

The only use case I see here is to mimic a TCP stream

@szmarczak yep, absolutely, but the idea is that you shouldn't ever need to do that. In theory, all the relevant node APIs that take sockets also already accept duplex streams and wrap them automatically.

If you completely stop using JS stream socket (here) in http2-wrapper today, what breaks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

7 participants