Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

dns.lookup shares a thread pool with disk operations #2868

Closed
ntoshev opened this issue Mar 5, 2012 · 19 comments
Closed

dns.lookup shares a thread pool with disk operations #2868

ntoshev opened this issue Mar 5, 2012 · 19 comments
Labels

Comments

@ntoshev
Copy link

ntoshev commented Mar 5, 2012

If you generate a lot of DNS requests, they get queued and become slow. Disk I/O gets in the same queue so it is slowed down as well.

@Southern
Copy link

When you say "disk I/O", are you referring to fs or the actual I/O of Node?

If you're referring to fs, the problem is that both fs and dns both use process.nextTick() callbacks. A simple solution to avoid this issue is to use child_process or cluster to create a couple of threads to handle the massive amounts of requests you require.

@thejh
Copy link

thejh commented Mar 11, 2012

Am Samstag, den 10.03.2012, 17:55 -0800 schrieb Colton Baker:

A simple solution to avoid this issue is to use child_process or cluster to create a couple of threads to handle the massive amounts of requests you require.

That's not "threads", it's processes. And, uhm, this really seems like a
bad response. It's like "sorry, yes, the cooling system has a small
leak, but if you pour in a few litres of water each hour, it doesn't
blow up".

@Southern
Copy link

@thejh Okay. Maybe using "threads" wasn't the best word to choose. Should have known someone would be a technical brat. Regardless of whether child_process and cluster are true threads, each has their own process variable. Thus the callbacks for each process.nextTick should function independently. This eliminates the need to "revive" code that isn't broken in the first place.

This is not a Node error. It is a user error. You act as if this wouldn't happen in another coding language where a lot of requests were made back-to-back, using the same queued callback for everything.

By the way, where was your suggestion on how to fix this? If you're here to TroLLz, can I play too?

@ntoshev
Copy link
Author

ntoshev commented Mar 12, 2012

I'm referring to fs. See this thread for context:

http://groups.google.com/group/nodejs/browse_thread/thread/6f564ac1dcd650bf/100bc657172b88e2

I solved my own problem using dns.resolve4 and then handing the IP to an http request, the performance difference is amazing. I also contributed a patch to the fetch module (https://github.com/andris9/fetch) doing that. I'm a little surprised no other http client library implements this. It would be the bottleneck of many network-bound projects.

@thejh
Copy link

thejh commented Mar 12, 2012

"Colton Baker" reply@reply.github.com schrieb:

@thejh Okay. Maybe using "threads" wasn't the best word to choose.
Should have known someone would be a technical brat. Regardless of
whether child_process and cluster are true threads, each has their
own process variable. Thus the callbacks for each process.nextTick
should function independently. This eliminates the need to "revive"
code that isn't broken in the first place.

This is not a Node error. It is a user error. You act as if this
wouldn't happen in another coding language where a lot of requests were
made back-to-back, using the same queued callback for everything.

By the way, where was your suggestion on how to fix this? If you're
here to TroLLz, can I play too?


Reply to this email directly or view it on GitHub:
#2868 (comment)

I don't mind you contributing... Uhm... half-constructive comments, too. :D

No, seriously - IMO, it'd be good to be able to reserve IO threads for specific node domains, but I don't have code for that as it's not related to any problems I have and I'm a bit lazy. Also, well, we don't have domains yet, so it won't work anyway.

@bnoordhuis
Copy link
Member

To clear up any misunderstandings, what happens here is that getaddrinfo() DNS lookups and async file I/O share the same thread pool in C land. Said thread pool has limited concurrency so if DNS lookups are slow because the nameserver is down (to name something), that also slows down file ops. It's a libuv/libeio issue.

@arnaud-lb
Copy link

This is a very serious issue, as queuing only 4 DNS queries will delay everything else using the thread pool.

Considering that a DNS query can take more than 20 seconds with default timeout/retry settings, if the thread pool queue is filled with DNS queries, everything can be delayed for minutes or even hours.

@jorangreef
Copy link

Are there any blockers to Node using dns.resolve internally for net.connect etc? And leaving dns.lookup for those that need it?

@bnoordhuis
Copy link
Member

I've responded to your question on the mailing list. For posterity:

Is it really necessary that Node's net.connect etc. still use dns.lookup
internally?

Yes, for several reasons. For example, c-ares doesn't speak mDNS and
probably never will, its notion of what the upstream DNS servers are
and in what order they should be tried can be wildly at odds with the
system resolver, and so on.

tl;dr net.connect() and friends won't be switching to dns.resolve() anytime soon.

The scalability of the thread pool is a subject of ongoing research; it's possible to speed up specific workloads by a factor of 10 - but not, it seems, without regressing other workloads by a factor of 5 or more. That won't do, of course.

If you want node.js to use more threads, run export UV_THREADPOOL_SIZE=<num-threads> before starting node.

@bnoordhuis
Copy link
Member

Let me add that a change that makes the DNS resolver configurable in the call to net.connect() might be acceptable. (Should target master, come with docs and regression tests, etc. See CONTRIBUTING.md for details.)

The API I have in mind is something like this:

net.connect({
  host: "nodejs.org",
  port: 80,
  resolver: function(host, type, callback) {
    return dns.resolve(host, type, callback);
  }
});

It's up for debate if the 'type' argument should be called 'family' instead and be a number, like how dns.lookup() works. I'm open to well-reasoned suggestions.

@jorangreef
Copy link

Thanks Ben, that would help.

I had a runtime environment flag in mind though, just to indicate to net.connect that it should use dns.resolve directly. This would save me from having to update calls to net.connect to pass in a custom resolver (although that might be great for other use cases).

@tjfontaine
Copy link

also for posterity there are things like http://npmjs.org/package/native-dns which implement dns in terms of node primitives such that you're not subject to the thread pool

@bminer
Copy link

bminer commented Dec 24, 2013

Can this issue be re-opened? I like the API that @bnoordhuis has proposed. :)

@bnoordhuis - Is it also possible to revise dns.lookup to accept either a family or rrtype? Instead of accepting 4 or 6, it could accept any of these: 4, 6, A or AAAA. This would allow one to use dns.lookup just like dns.resolve for resolving A or AAAA records.

Thoughts?

@indutny
Copy link
Member

indutny commented Dec 25, 2013

I'm ok with it, but have no time to work on it :(

@bminer
Copy link

bminer commented Dec 27, 2013

That's okay. I will try to take a shot at it over the next few days here. Just too busy at the moment.

At any rate, if the issue is re-opened, it might get a bit more attention.

Thanks!

@bminer
Copy link

bminer commented Dec 28, 2013

I think that there are a few issues here:

1.) The API for net.connect and for http.request should be changed to what @bnoordhuis proposed, allowing one to specify a resolver option to handle DNS resolution.
2.) dns.lookup should be revised to accept either a family or rrtype. Instead of accepting 4 or 6, it could accept any of these: 4, 6, A or AAAA. This would allow one to use dns.lookup just like dns.resolve for resolving A or AAAA records and make the API a bit more user-friendly.
3.) Most importantly: For most users who just want to use the default dns.lookup routine and utilize the operating system's resolver, DNS lookups should probably use a separate thread pool, or alternatively, they should not be allowed to fill up the entire shared thread pool. As @arnaud-lb mentioned, DNS queries can take more than a few seconds, which can significantly (and undesirably) slow other asynchronous I/O operations. This might be unacceptable in time-sensitive applications (i.e. reading from a hardware device such as a serial port).
4.) dns.resolve and friends read /etc/resolv.conf when Node starts to determine which nameservers to use. Unfortunately, C-Ares does not expose a ares_reinit() function to re-read this file if it changes. This can make using the other DNS functions a little less reliable than dns.lookup at times.

With all that being said... How should this issue be handled? Should a new issue be opened? Should this one be re-opened?

@trevnorris
Copy link

Going to reopen since we seem to be cool w/ the possibility of implementation. But yeah, core doesn't have time for this now. PR's will be welcome. :)

@trevnorris trevnorris reopened this Jan 7, 2014
@bminer
Copy link

bminer commented Jan 10, 2014

@trevnorris - Thanks for re-opening! If time permits, I will work on a PR.

@chrisdickinson
Copy link

Closing this in favor of the reduced issue #8475.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests