Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Too many open files" w/ Async Client #386

Closed
mmstick opened this issue Nov 14, 2018 · 22 comments
Closed

"Too many open files" w/ Async Client #386

mmstick opened this issue Nov 14, 2018 · 22 comments

Comments

@mmstick
Copy link

mmstick commented Nov 14, 2018

I've been developing the tooling for an apt replacement with the async client, which may need to fetch a couple dozen files at the same time (depending on how many sources are active and what files those sources' release files point to).

The following error occurs when sharing a single async::Client across all connections:

peek 2018-11-13 16-51

Is this something that could be handled better by reqwest? Perhaps return a NotReady when there's too many files open.

@seanmonstar
Copy link
Owner

Is this something that could be handled better by reqwest? Perhaps return a NotReady when there's too many files open.

Hm, possibly. There'd need to be some way to notice that some file descriptors have been closed (and the amount is shared with the whole process, not just what reqwest uses).

In your example, it doesn't look like that many connections are being opened, can you increase the nofiles limit?

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

I'm a little confused myself, since this happens on a fresh start with just that application running.

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-updates/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-updates/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-security/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-security/Release
GET http://apt.pop-os.org/proprietary/dists/cosmic/Release
GET http://ppa.launchpad.net/mmstick76/ion-shell/ubuntu/dists/cosmic/Release
GET http://ppa.launchpad.net/system76/pop/ubuntu/dists/cosmic/Release
GET http://ppa.launchpad.net/system76/pop/ubuntu/dists/cosmic/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-backports/Release
GET http://us.archive.ubuntu.com/ubuntu/dists/cosmic-backports/Release
GET http://repository.spotify.com/dists/stable/Release
failed to upgrade dist files: failed to fetch apt sources: http/s request failed: http://us.archive.ubuntu.com/ubuntu/dists/cosmic-security/main/dep11/icons-64x64.tar: an error occurred trying to connect: Too many open files (os error 24)

Possibly wrong error message?

@seanmonstar
Copy link
Owner

The limit is per-process, so other applications wouldn't affect it. You can check it via ulimit -a.

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

Claims the max open files to be 1024, though I don't think that many files are being opened by reqwest.

fn get_request<'a>(client: &'a Client, url: String) -> impl Future<Item = Response, Error = DistUpdateError> + 'a {
    client.get(&url)
        .send()
        .inspect(move |_v| {
            eprintln!("GET {}", url);
        })
        .and_then(|resp| resp.error_for_status())
        .map_err(|why| DistUpdateError::Request { why })
}

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

According to strace, only 7 files were open at a given time.

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

Ah, nevermind, I see that this occurred:

epoll_ctl(5, EPOLL_CTL_DEL, 1022, 0x7ffdce638198) = 0
close(1022)                             = 0
epoll_ctl(5, EPOLL_CTL_DEL, 1023, 0x7ffdce638198) = 0
close(1023)

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

It seems to be continually creating new socket and connect descriptors without closing them.

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

It calls this repeatedly until it eventually runs out of fds:

connect(131, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("91.189.91.23")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(5, EPOLL_CTL_ADD, 131, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=520093819, u64=520093819}}) = 0
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 132
fcntl(132, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl(132, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(132, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("91.189.91.23")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(5, EPOLL_CTL_ADD, 132, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=524288124, u64=524288124}}) = 0
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 133
fcntl(133, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl(133, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(133, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("91.189.91.23")}, 16) = -1 EINPROGRESS (Operation now in progress)

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

Also occurs when using a hyper::Client, so I suppose the issue is either there or somewhere lower.

@seanmonstar
Copy link
Owner

Oh no, that's not good! Could you share what versions of reqwest and hyper are listed in your Cargo.lock?

If you don't mind, grabbing the logs from hyper::client=trace would also help debugging.

@seanmonstar seanmonstar added C-bug Category: bug. Something is wrong. This is bad! B-upstream Blocked: upstream. Depends on a dependency to make a change first. labels Nov 14, 2018
@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

That would be hyper 0.12.14 and reqwest 0.9.4.

Included two trace logs[0]: one with just the trace logging, and one with both that and strace output.

[0] logs.zip

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

May also want to know that this is being run in a tokio::runtime::current_thread::Runtime.

@seanmonstar
Copy link
Owner

Could you try pinning hyper to 0.12.13? I don't have a cause yet, but there was some changes to the connection pool in the 0.12.14 release...

@mmstick
Copy link
Author

mmstick commented Nov 14, 2018

Same issue with 0.12.13

@seanmonstar
Copy link
Owner

Sorry for taking a little while. I've read through the logs, and it actually seems like the huge increase in connections happens at a specific point in the app. As if something is suddenly spawning >1000 requests on the client. The client doesn't currently put a limit on the number of outstanding requests or connections it can manage, that's left up to the user.

You can use something like Stream::buffered to limit how many futures can be in-flight at a time.

@seanmonstar seanmonstar removed C-bug Category: bug. Something is wrong. This is bad! B-upstream Blocked: upstream. Depends on a dependency to make a change first. labels Nov 22, 2018
@Matrix-Zhang
Copy link

same issue with reqwest 0.8.8 after cargo update, the fd grow up all the time until crash...

@seanmonstar
Copy link
Owner

As mention in my previous comment, you'll need to limit the number of requests you make at a time, or increase the FD limit in your OS.

@emk
Copy link
Contributor

emk commented Jan 12, 2019

Thank you for a great library!

Just a head's up, in case it helps anybody: We use reqwest in synchronous mode, and a recent cargo update has lead to lots of servers crashing with "too many open files". Poking around with lsof suggests that the files are network connections and possibly related to reqwest.

One user ran into a problem with tokio 0.1.13: https://users.rust-lang.org/t/hyper-client-tokio-pipe-not-closed/23327

We're going to try downgrading to tokio 0.1.11 to see if that helps with this issue. If it fixes it, we'll follow up here. Thank you!

@emk
Copy link
Contributor

emk commented Jan 12, 2019

OK, as of this morning, it appears that reverting to tokio 0.1.11 may have fixed the file descriptor leaks on production.

We use reqwest only in synchronous mode. Here's the invocation that's hitting the "too many open files" error:

    /// Fetch an existing resource.
    pub fn fetch<R: Resource>(&self, resource: &Id<R>) -> Result<R> {
        let url = self.url(resource.as_str());
        let client = reqwest::Client::new();
        let res = client.get(url.clone())
            .send()
            .map_err(|e| Error::could_not_access_url(&url, e))?;
        self.handle_response(res)
            .map_err(|e| Error::could_not_access_url(&url, e))
}

We're not directly using tokio, so Stream::buffered won't help us here. We tried raising the ulimit, but the program still crashed with 16,000 open files.

Is there something we can do to help troubleshoot this?

@seanmonstar
Copy link
Owner

@emk yikes! Are you actively making thousands of requests simultaneously? Or does it seem like the connections are just idling?

I noticed you're using reqwest 0.8, which its dependencies don't really see any more updates. I believe there were some bugs in hyper that would leave connections open, fixed in newer versions.

Also, if the host is often the same, you could reuse connections and also better performance by sharing a Client (cloning is fine), instead of making a new one per request.

@emk
Copy link
Contributor

emk commented Jan 13, 2019

@seanmonstar We should have less than 10 std::thread threads making synchronous requests, I'd guess—not thousands.

I'll see if we can upgrade reqwest, at least in the crates I control, and see what happens.

Thank you for your help!

aguestuser added a commit to aguestuser/dl that referenced this issue Sep 13, 2021
* else, runaway async http requests flood the system's capacity for
  open sockets and get a "too many files open" error
  (see: seanmonstar/reqwest#386 (comment))
* as remedy: use a buffered stream (TODO: benchmark to find best
  concurrency limit)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants