Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate away from request #4655

Closed
chris48s opened this issue Feb 14, 2020 · 22 comments · Fixed by #7319
Closed

migrate away from request #4655

chris48s opened this issue Feb 14, 2020 · 22 comments · Fixed by #7319
Labels
dependencies Related to dependency updates

Comments

@chris48s
Copy link
Member

📋 Description

Request is no longer maintained:

https://github.com/request/request#deprecated

As of Feb 11th 2020, request is fully deprecated. No new changes are expected to land. In fact, none have landed for some time.

We need to switch to a different library. This is going to be an absolutely monstrous job.

@calebcartwright you mentioned in #4644 that there has been some discussion around this already. I couldn't find an issue anywhere. Do you have any kind of summary of where we've got to on this. I'm aware its somewhat complex as our use of the library is quite involved, there are many other libraries we could potentially migrate to, and none are an obvious like-for-like replacement for request.

@chris48s chris48s added the dependencies Related to dependency updates label Feb 14, 2020
@calebcartwright
Copy link
Member

@calebcartwright you mentioned in #4644 that there has been some discussion around this already. I couldn't find an issue anywhere.

I believe it was in one of the private channels on Discord, will try to dig it up. IIRC we didn't go into much detail, just that it'd need to be replaced and that we'd have to keep in mind Self-hosting Shields server users which currently benefit from requests ability to auto-detect and honor the standard proxy env vars (something I believe is missing from most request alternatives

@paulmelnikow
Copy link
Member

This is related to #3368, but yea, I remember @calebcartwright mentioning the limitations in proxy support in other HTTP clients. That was a while ago. I wonder if there is a solution for any of those now.

Should we merge these two issues?

@chris48s
Copy link
Member Author

chris48s commented Apr 5, 2020

Should we merge these two issues?

Yeah sorry. They are probably the same thing.

@paulmelnikow
Copy link
Member

It seems like we should open a new issue which combines the scope of both. Request being unmaintained is making this more urgent than it was before, and deserves high billing, although the first order of business is probably to test the effectiveness of that memory cache.

@chris48s
Copy link
Member Author

We talked a bit the other day about some of the mechanics of this (e.g: an adaptor layer that allows our service code to use request-style params with another library), but it seems like a very major decision we need to make is which library we're going to migrate to. While its on my mind, I'll post this link to discussion of possible alternatives as a starting point, but in making this choice we also need to try and work out what our requirements even are (what is the current subset of the request feature set we depend on?)

@calebcartwright
Copy link
Member

In my experience, the main differentiator with request that has folks still using it to this day over alternatives is the auto-detection and honoring of the standard proxy env vars (http_proxy, https_proxy, no_proxy). The other client libs require explicit in-code proxy specification and/or additional libraries to get the same behavior.

Proxy support is an important requirement for self-hosting Shields, though admittedly not relevant for the Shields.io service.

FWIW at this point I expect most (if not all) of the alternative prominent client libs will have the core features we'd need, and I'm guessing it will likely boil down to ergonomics and ease of request-replacement in our codebase

@chris48s
Copy link
Member Author

chris48s commented Jan 8, 2021

We talked about this again in the ops meeting today.
One of the blockers to this is that we've failed to make a decision on which library we're going to use and basically the outstanding issue is proxying..

Just re-reading the thread, basically the deal is: request configures proxying based on the http_proxy/https_proxy/no_proxy env vars. If we go to another library (lets say got for the sake of argument) we have to explicitly configure proxying: https://github.com/sindresorhus/got#proxies

So it sounds like the options if we move to another library are:

  • We say "breaking change for self-hosting users as of version X* you can't use http_proxy/https_proxy/no_proxy any more you have to do [something else]" (TODO: what is [something else]?)
  • We read values from http_proxy/https_proxy/no_proxy ourselves and use them to optionally configure hpagent (or whatever) because got can't do it for us. i.e: we have to re-implement a feature that exists in requests but we could avoid a breaking change for users that rely on it.

*lets abstract the fact we don't have a good method for doing this at the moment and assume it is covered under #5866

@calebcartwright do you have any input on either of those options? If you can flesh either of those things out a bit, great - that would help. If not, even confirming that is useful. We will at least know what the next things we need to research are..

@calebcartwright
Copy link
Member

@calebcartwright do you have any input on either of those options?

I've become pretty partial to got personally, and given that we already make use of it within this project I'd lean towards trying to make that work as the replacement for request.

The only thing I feel strongly about is that we need to maintain the ability for self-hosted instances to work with proxies/network restricted environments; otherwise a core use case for self-hosted is largely defeated. I don't recall specifics offhand, but there's cases of libs (and/or their internal deps which support proxies) utilize env vars for configuration, just not the standard ones (e.g. it won't pick up http_proxy but instead looks for something like my_awesome_lib_http_proxy). I'd be fine with a posted breaking change that basically requires users to incorporate new env vars in their deployment. This of course depends on the selection of the client lib replacement.

If we end up using a replacement that doesn't have any auto detection, then yes I'd think either adding some new configuration options we support and/or auto mapping from the standard vars would be a viable option.

I'd suggest we select the request replacement based on what will work best for Shields, provided said replacement can be configured to support proxies, and then just take the work on the proxy piece forward based on the contexts and constraints of that replacement we're targetting

@calebcartwright
Copy link
Member

For example, got will work with global agent which can pick up custom env vars:

https://github.com/gajus/global-agent#environment-variables

Or with one minor additional env var, be configured to pick up the standard vars:

https://github.com/gajus/global-agent#what-is-the-reason-global-agentbootstrap-does-not-use-http_proxy

@chris48s
Copy link
Member Author

Good shout on global-agent. That sounds like it would resolve the one outstanding issue

@chris48s
Copy link
Member Author

chris48s commented Mar 9, 2021

We've now merged and deployed #6160 which is step one of the process (and probably now means we are making the majority of our http requests with got rather than request).

The next step is to migrate the rest of the places we are using request to use got as well before we remove the compatibility layer completely.

The other place we use request are:

Some of those aren't a big deal. Some of them will harder as they also require moving from a callback-based approach to async/await. It should be possible to work on each of those in isolation.

As I say in #6160 (comment)
I think where we are calling request within the services code I think it is worth keeping the request options and running through requestOptions2GotOptions() so all the services continue to use the same API during the migration, but where we're just making a "standalone" call, there's no value in that. For example in influx-metrics.js we might as well just go straight to calling got directly.

@chris48s
Copy link
Member Author

OK then. I've spent a bit of time thinking about this and playing with things. Aside from request and got I think there are really 3 other libs in the Node JS ecosystem worth looking at: axios, node-fetch and superagent.

Obviously one of the things we want to optimise for is performance, but it isn't the only thing that is important (if it was, we'd just stick with request forever). We also want a project which is actively maintained and some kind of reasonable expectation that this thing is still going to be around in the future. As we're in the process of doing this it is clear that changing our HTTP client is a big job. If we pick something that is memory efficient but then we end up going through this same process a year or two down the line that is counter-productive.

I did a bit of testing making some calls with various HTTP clients and process.memoryUsage().heapUsed dotted about. It is hard to compare like-for-like. I'm not sure how scientific it was and I don't think we're really going to find out how each lib performs with a real production workload without putting a real production workload through it. Those caveats aside, based on those tests I think axios is going to perform similarly to got but node-fetch and superagent will probably be somewhere in the middle of request and got. So I've discounted axios. Node-fetch and superagent are probably the two worth thinking about more. I think switching to any other library will result in an increase in memory usage compared to request.

In terms of picking, there are a few considerations:

  • Node-fetch is more widely used than superagent (~20mil weekly downloads vs ~4mil weekly downloads)
  • Superagent's interface is an outlier and probably maps less well onto our plan to try and keep a request-like options for now but just use a new library because it uses a fluent interface with lots of chained function calls rather than a config object like got/request/node-fetch/axios. Not an insurmountable obstacle but its more of a gap to bridge than got was or node-fetch would be.
  • Node-fetch is a pretty lightweight library. In some ways this is an advantage given we are chasing performance. The flip-side is this also means it lacks some nice features that exist in other clients. To give a couple of examples: It doesn't provide an abstraction for HTTP basic auth - if you want to use that you do it by assembling an Authorization header yourself. It doesn't provide an abstraction for querystrings. You have to build your querystring with URLSearchParams and append it to the URL yourself. This stuff is probably all do-able but requires a bit more effort to implement than a more "full-featured" client. There is a risk there might be something we need that it doesn't do and we won't find out until we need it.
  • Node-fetch basically mimics the interface of the browser standard window.fetch with some extensions. This is all a bit speculative, but there are also some rumblings about implementing fetch() natively in node core. More details at Implement window.fetch into core nodejs/node#19393 but tl;dr is the maintainers are fine with adding it in principle and some progress is being made (but slowly). If an implementation of fetch() lands in node core I suspect that probably doesn't mean the end of all third-party HTTP clients like got/axios/superagent but I wouldn't be surprised if it was the end of node-fetch so there's a risk of ending up on another abandoned library there. That said, given they'd both be different implementations of the same interface I'd also hope migrating from node-fetch to er... node fetch() wouldn't be too hard, whereas if we wanted to migrate to a native node fetch() implementation from superagent that would be another huge migration job as the interfaces are so different. So if we envisage things going in that direction, node-fetch is probably a good intermediate step. I guess it could go either way..
  • One interesting thing about node-fetch in contrast to other clients is that it treats everything as a stream/promise, so to compare superagent and node-fetch:
const superagent = require('superagent')
const data = await superagent.get('https://google.co.uk')
const body = data.text
const fetch = require('node-fetch');
const data = await fetch('https://google.co.uk')
const body = await data.text()  // note we have to await this

I suspect this doesn't really help us much performance-wise because in 99% of cases we need to evaluate the body so it actually probably means we just have to make more code changes to try it. This might allow us to squeeze a tiny bit of extra performance when we are serving an error response (because we only care about the status code, I think) but we might need to make some changes to unlock that.

I think I am leaning slightly more towards trying #6160 again with node-fetch but if anyone else has any points to add, please do.. I'll probably give it a few days before diving into this again.

@calebcartwright
Copy link
Member

Will be the one to ask the dreaded question.. any difference between those two in terms of proxy support? 😆

Would it even potentially be worthwhile to consider using the stdlib modules directly? I know they aren't exactly known for ease of use and great ergonomics, but I feel like the codebase is structured well in terms of sufficient encapsulation that I wouldn't rule out the stdlib out of pocket, especially given the performance emphasis.

Between superagent and node-fetch specifically, think I'd agree with you on node-fetch being the better candidate for a next test.

@chris48s
Copy link
Member Author

Will be the one to ask the dreaded question.. any difference between those two in terms of proxy support?

When it comes to proxying, it is the same story for node-fetch and superagent really, which is in turn pretty much the same story as got.
Neither explicitly implement support for the proxy env vars but we should be able to use global-agent.
The difference when it comes to superagent/node-fetch is that although they should work with global-agent, they don't explicitly test against those clients: https://github.com/gajus/global-agent#supported-libraries
so I guess we've just got to try it, but my plan for proxy implementation on our side will be identical to what I did for got.

Would it even potentially be worthwhile to consider using the stdlib modules directly?

I think probably not. I've not really evaluated this option but if we go down that route, we're going to end up writing some kind of abstraction (however thin) over the node core HTTP libs, at which point we're essentially making our own version of got/superagent/node-fetch (admittedly focussing on only a small subset of the features) only it will be worse and maintained by us. I think when it comes to this kind of low-level abstraction there is a huge benefit in using something off-the-shelf instead of home-brewing our own solution. (See ScoutCamp for more details). Sure we are trying to serve a lot of traffic on modest infrastructure but I don't think we are enough of a special unique unicorn that we need to write our own HTTP abstraction for it and I don't really think we'd make a better job of it than any of the big popular projects.

@calebcartwright
Copy link
Member

but my plan for proxy implementation on our side will be identical to what I did for got.

👍 Happy to test this again, just lmk

I think probably not.

Fair enough. I'm not actively pushing for it, and am very much hoping that node-fetch will solve our needs. I know I wouldn't have sufficient bandwidth and motivation to do any kind of research/eval on using the stdlib approach anyway 😄.

From the sound of it though, I'm probably a bit more open to that possibility. However, we're fully aligned on the plan to go after node-fetch so it wouldn't make much sense for me to articulate why at this time, can always circle back to it if we end up needing to.

@chris48s
Copy link
Member Author

Not looked at this in a while, but it is still on my mind.

@paulmelnikow suggested looking at https://github.com/nodejs/undici - it is an interesting option and it is possible the the fetch implementation that eventually lands in core my end up looking like https://github.com/Ethan-Arrowood/undici-fetch (backed by https://github.com/nodejs/undici ). There's a few mins on this at the end of https://www.youtube.com/watch?v=fcj2Z9dL060

The thing with undici (having done a small amount of reading) is that it doesn't use node's (current) HTTP stack internally at all - its a completely from-the-ground-up implementation. This means that nock doesn't work with it and neither will global-agent (nor any other packages that rely on your http client being an abstraction over Node's HTTP internals). Given it doesn't (yet) have support for mocking nodejs/undici#531 that's a blocker for us at the moment plus we'd have to roll our own proxy configuration.

I think that is still leaning me towards node-fetch at the moment (again viewing it as a possible 'stepping stone' to a native fetch implementation) but it is worth keeping one eye on undici as it develops..

@ronag
Copy link

ronag commented May 7, 2021

Coming from undici by following the ping. We are about to release undici@4 within a or two which does support mocking.

@chris48s
Copy link
Member Author

I've been having a slightly more detailed think about node-fetch and the featureset we need from it.

In terms of the options of request/got we are currently using:

  • body: Compatible, but I think we're responsible for JSON.stringify()-ing it ourselves if the body is JSON.
  • form: There is no 'shortcut' for this. We have to assemble the body ourselves with new FormData().
  • headers: Compatible
  • method: Compatible
  • url: N/A you can't pass it in the options object, but doesn't matter
  • qs: We're responsible for building it with URLSearchParams and appending it to URL ourselves
  • gzip: Translate to compress
  • auth: We're responsible for building an Authorization header ourselves

So all that is do-able (although some of those cases need a bit of work) but here's the thing it doesn't have:

  • strictSSL: Node-fetch doesn't have the ability to disable SSL checking. We do have some options here. See https://newbedev.com/node-fetch-disable-ssl-verification We should not use process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0" because node is single-threaded. Using a custom Agent should work, but I'm not sure this would be compatible with using global-agent (which also uses a custom agent) for proxying. I'm not sure if they will "stack".

There are currently only two services using this - coverity and jenkins - but it is a non-zero number.

I think the first thing to do on this is research/test the node-fetch/custom agent setting rejectUnauthorized: false/global-agent proxy combo before trying to implement the rest of it.

@chris48s
Copy link
Member Author

I started looking at the strictSSL/proxy thing and submitted #6886 and #6887 Maybe we could actually get away with ignoring this issue completely and say strictSSL is a hard requirement moving forwards if we go with node-fetch..

@chris48s
Copy link
Member Author

Been dragging this on for a bit, but having tried out both got and node-fetch for a bit with a production workload and evaluated both the performance and library ergonomics the conclusion is to go with got. Once #7175 is merged, I will continue to pick through the remaining instances of request replacing with got.

@chris48s
Copy link
Member Author

chris48s commented Nov 2, 2021

Keeping track of the instances in #4655 (comment) and crossing them off as I do them. 2 more to go. As I'm working on this I realise there are quite a few more tasks. This is mainly for my own reference:

  • I think we can now probably completely delete a fair chunk of https://github.com/badges/shields/blob/master/core/base-service/legacy-request-handler.js 🤞
  • Docs: need to update
    4. At this point the situation gets gnarly and hard to follow. For the
    purpose of initialization, suffice it to say that `camp.route` invokes a
    callback with the four parameters `( queryParams, match, end, ask )` which
    is created in a legacy helper function in
    [`legacy-request-handler.js`][legacy-request-handler]. This callback
    delegates to a callback in `BaseService.register` with four different
    parameters `( queryParams, match, sendBadge, request )`, which
    then runs `BaseService.invoke`. `BaseService.invoke` instantiates the
    service and runs `BaseService#handle`.
    and
    2. Scoutcamp invokes a callback with the four parameters:
    `( queryParams, match, end, ask )`. This callback is defined in
    [`legacy-request-handler`][legacy-request-handler]. A timeout is set to
    handle unresponsive service code and the next callback is invoked: the
    legacy handler function.
    3. The legacy handler function receives
    `( queryParams, match, sendBadge, request )`. Its job is to extract data
    from the regex `match` and `queryParams`, invoke `request` to fetch
    whatever data it needs, and then invoke `sendBadge` with the result.
  • Final bits of cleanup once request is completely removed: PR Remove requestOptions2GotOptions compatibility layer #7270
    • Remove requestOptions2GotOptions and switch everything to assume it is talking to got options natively rather than through an adapter
    • Remove request as a dependency
    • Update all the places in the docs and docstrings where we are referring to request (e.g:
      * @param {object} [attrs.options={}] Options to pass to request. See
      * [documentation](https://github.com/request/request#requestoptions-callback)
      and
      - `_requestJson()` uses [request](https://github.com/request/request) to perform the HTTP request. Options can be passed to request, including method, query string, and headers. If headers are provided they will override the ones automatically set by `_requestJson()`. There is no need to specify json, as the JSON parsing is handled by `_requestJson()`. See the `request` docs for [supported options](https://github.com/request/request#requestoptions-callback).
      etc )
  • sendAndCacheRequest: fetcher, // TODO: rename sendAndCacheRequest
    PR rename sendAndCacheRequest #7277
  • Follow up on Continue request --> got migration; affects [travis wordpress myget jenkins-plugin node nuget] #7215 (comment)

@chris48s
Copy link
Member Author

chris48s commented Feb 2, 2022

I suspect we are a long way from this feature reaching stability in a LTS node version, but first commit adding experimental undici-based global fetch() landed in node a couple of days ago nodejs/node@6ec2253 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Related to dependency updates
Projects
None yet
4 participants