Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage for apache + php when transferring large files / sys/poll() returns wrong value #1540

Closed
garci66 opened this issue Dec 26, 2016 · 15 comments

Comments

@garci66
Copy link

garci66 commented Dec 26, 2016

Description

When using apache + php to write a large buffer to the client, or using fpassthru to stream a large file, apache's serving thread will use 100% of a single core. The CPU usage is high regardless of whether the client is on localhost (from a native windows client like google chrome) or another client, on any platform (windows, android, etc). Slow clients simply exacerbate the problem since the file takes longer to download.

Expected results

The expected results is that apache uses low CPU when the amount of traffic being sent over the network is low.

Actual results

High CPU usage throughout the file transfer (each apache thread serving the client using 100% of its core, even if transferring only at 1Mbps)

Windows build:

Microsoft Windows [Version 10.0.14393]

Steps / All commands required to reproduce the error from a brand new installation

On a clean WSL console, install apache and PHP5:
apt install apache2 php5

Then copy the following file (garbage.php) to /var/www/html and make it executable (chmod +x garbage.php) - make sure to rename garbage.php.txt to garbage.php since github doesn't allow php files as in-line uploads.

garbage.php.txt

Now, open a web browser on the windows side (I used google chrome, with network throttling on the tab, using its developer tools as shown below)
image

Then, on the "throttled" tab, browse to localhost/garbage.php and a file called random.dat will start downloading immediately.

On WSL's bash console, executing top will show the apache thread at 100% cpu for the core as shown below:
image

While the download itself is proceeding at the rate-limit specified:
image

  • Strace of the command
    On the WSL host, it can be seen that apache is trying to do a write and failing most of the time due to the buffers being full. The write fails with EAGAIN and calls the poll function. Poll immediately returns with POLLOUT with almost no timeout, but the subsequent write fails again as the buffer is still full.
     0.000164 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.000746 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000186 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.000000 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)

The exact same configuration / process running natively on a linux host runs as follows:

     0.251424 writev(12, [{"\256\34\31"..., 522751}], 1) = 65520
     0.000266 writev(12, [{"\33\2705\"..., 457231}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000230 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.251215 writev(12, [{"\33\2705\"..., 457231}], 1) = 65520
     0.000267 writev(12, [{"'b\247KC\"..., 391711}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000175 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])

(The writev's buffer was truncated to help on the screen capture here)

As it can be seen, On linux, there are two subsequent writev calls, the first one suceeding to write 65520 bytes to the TCP stack and the immediate next failing due to EAGAIN. Then poll blocks for 250msec, returns with POLLOUT and the next write suceeds.

On the WSL process, the writev fails, poll is called and returns IMMEDIATELY with POLLOUT, in theory indicating that new writes could be done, but the next write call fails again.

The only successfull writev calls on WSL are the following:

     0.000501 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000000 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.000000 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000501 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.006016 writev(12, [{"\r\n", 2}], 1) = 2
     0.000000 gettimeofday({1482786017, 198322}, NULL) = 0
     0.000503 write(2, "[Mon Dec 26 18:00:17.198322 2016] [:error] [pid 966] [client ::1:24086] after flush\n", 84) = 84     
     0.000000 gettimeofday({1482786017, 198825}, NULL) = 0
     0.000000 write(2, "[Mon Dec 26 18:00:17.198825 2016] [:error] [pid 966] [client ::1:24086] Before echo\n", 84) = 84     
     0.000500 writev(12, [{"ffffc\r\n", 7}, {"PH:\36\r;\244\321\324]3X<\313\234\364\255\v\\;\r\251\334\267\235;\317K(\265\247\rC+M\2S\206\353|\t\271\367\321\277\0002\320x8\215\276\30\266\357e0\251d\204m\177\21V\354\336\327\17\203x(\7q\32+\364x\216\247A\301Q\371\\\211%\207\207\0338\316[[\21{s\303\330\243"..., 1048572}], 2) = 1048579
     0.001281 gettimeofday({1482786017, 200851}, NULL) = 0
     0.000305 write(2, "[Mon Dec 26 18:00:17.200851 2016] [:error] [pid 966] [client ::1:24086] after fpassthru\n", 88) = 88
     0.000000 gettimeofday({1482786017, 201414}, NULL) = 0
     0.000503 write(2, "[Mon Dec 26 18:00:17.201414 2016] [:error] [pid 966] [client ::1:24086] Buffer at 0\n", 84) = 84     
     0.000000 gettimeofday({1482786017, 201414}, NULL) = 0
     0.000500 write(2, "[Mon Dec 26 18:00:17.201414 2016] [:error] [pid 966] [client ::1:24086] after obflush\n", 86) = 86
     0.000000 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000746 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.000149 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000201 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.000158 writev(12, [{"\r\n", 2}], 1) = -1 EAGAIN (Resource temporarily unavailable)

Where the whole 1Megabyte buffer is written out. The additional write / gettimeofday are for debug messages inside the garbage.php file.

The following section includes the same part of the calls for the linux host:

     0.000149 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.245204 writev(12, [{"\26F\272\4!\364X\240\363\213\225T\212-\27\3\207\n\36\23\266\253\331R\323\216\315M\265\302WE@\200\346$i\37\n6\277\236c\"h\250\254;\2310\21\271u|\317s\371\205\0072P9\303\347l\27Pd\330\201\201\361\320\256tY\21\314\247S|`iP\354\250B\270#\203\317K\305\334\221r\364D\234"..., 129631}], 1) = 66780
     0.000277 writev(12, [{"!\323\253\356H\262]\216\225\224O\nm\32\305]\v\330\376r\352\206f\304\345\342\237`G\372.J,\\\36w\220\325f.R\244}\20gs\1V8s\231\2369\35\202\352\265\227\303\254K\262\366C+!\356;\314\314\315\24j\6\320\237\201\255,\204\2512U\207\244P\177\334\215q\313@\23}/\336\231\316\226"..., 62851}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000178 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])
     0.252087 writev(12, [{"!\323\253\356H\262]\216\225\224O\nm\32\305]\v\330\376r\352\206f\304\345\342\237`G\372.J,\\\36w\220\325f.R\244}\20gs\1V8s\231\2369\35\202\352\265\227\303\254K\262\366C+!\356;\314\314\315\24j\6\320\237\201\255,\204\2512U\207\244P\177\334\215q\313@\23}/\336\231\316\226"..., 62851}], 1) = 62851
     0.000321 write(2, "[Mon Dec 26 15:57:20.901142 2016] [:error] [pid 21498] [client 192.168.1.100:24075] after fpassthru"..., 100) = 100
     0.000224 write(2, "[Mon Dec 26 15:57:20.901384 2016] [:error] [pid 21498] [client 192.168.1.100:24075] Buffer at 0\n", 96) = 96
     0.000189 write(2, "[Mon Dec 26 15:57:20.901573 2016] [:error] [pid 21498] [client 192.168.1.100:24075] after obflush\n", 98) = 98
     0.000179 writev(12, [{"\r\n", 2}], 1) = 2
     0.000177 write(2, "[Mon Dec 26 15:57:20.901930 2016] [:error] [pid 21498] [client 192.168.1.100:24075] after flush\n", 96) = 96
     0.000241 write(2, "[Mon Dec 26 15:57:20.902171 2016] [:error] [pid 21498] [client 192.168.1.100:24075] Before echo\n", 96) = 96
     0.000729 writev(12, [{"ffffc\r\n", 7}, {"\3312x\245D\3666a\26\303\247(k\230\223{\354\n\211\23 \237\301<@5\253\304u\213\216g9*=)?tG\346h\217Q\22`\255$\16\246\364&\204z_\22\267\206\342:\354\347\r\356\307\314N\0203\377\254\257UK\345\316j\322\206\33T\376%\277<\314 \22c\330\343\37\204S\302e!\256\302\265"..., 1048572}], 2) = 1407
     0.000232 writev(12, [{"\335\334\252\305\264\354p\316\302a\355\24\217\235{\230\r\365\352VM\246\355\340\353\213[y\355j\3\307\220\253[\265o\234\333E4\352\266\362\26n`6\266\1X\23-\335&\376\261\216a}\321\375fn\353GJ\330\360\35U\241\376\336\360\375\272\216\375\376\236mU}\371;\366\263\271\35;#\215Pu\355<\217w"..., 1047172}], 1) = -1 EAGAIN (Resource temporarily unavailable)
     0.000172 poll([{fd=12, events=POLLOUT}], 1, 300000) = 1 ([{fd=12, revents=POLLOUT}])

If additional files are needed, please let me know. The issue is extremely straightforward to reproduce and happens every single time.

So far, it seems that the call to poll is returning immediately indicating that the buffer is ready to be written (the POLLOUT value) while it should be blocking/timing out. On linux we see that the call to poll blocks for roughly 220msecs, which is the time needed for the ~64Kbytes to be transmitted to the client at the 2Mbps rate limit applied by chrome.

@aseering
Copy link
Contributor

Hi @garci66 -- thanks for posting this detailed bug report! For what it's worth, this sounds kind of similar to some other network issues that have been reported, some of which have been fixed in Insider builds and some of which have fixes pending. @sunilmut FYI.

@garci66
Copy link
Author

garci66 commented Dec 27, 2016

I saw similr reports but, so far, none that seemed related to poll() failing. Regardless, I'll launch a vm with an insider track and see if I can repro on it.

@garci66 garci66 changed the title High CPU usage for apache + php when transferring large files High CPU usage for apache + php when transferring large files / sys/poll() returns wrong value Dec 27, 2016
@garci66
Copy link
Author

garci66 commented Dec 27, 2016

Looking into more detail, indeed, this seems closely related to #616 / #610 / #1025
Seems like the fix is inbound but not yet on a available on an insiders image.

@benhillis
Copy link
Member

@garci66 - Thank you for the detailed and clear bug report, this is very helpful. I will give your repro a shot today and see if the epoll issue is resolved.

@garci66
Copy link
Author

garci66 commented Jan 4, 2017

Thanks! I asume that this is not on any of the insiders' builds yet no?

@benhillis
Copy link
Member

benhillis commented Jan 4, 2017

@garci66 - Correct, the fixes @aseering mentioned should be in an upcoming insiders build.

Could you provide me with the steps you took to launch the web server? When I navigate to localhost/garbage.php it's just displaying the php source and no file is downloading.

@garci66
Copy link
Author

garci66 commented Jan 4, 2017

Make sure that php5 is installed, not just apache. apt install php5.

Otherwise I don't have any special permissions on the php file:
image

I just tried with the php file marked as +x and also with regular persmissions and was fine.

Just be sure that php5 is installed and enabled as a module in apache:
image

@garci66
Copy link
Author

garci66 commented Jan 4, 2017

make sure the php file indeed has the php extension and not .php.txt. Also restart apache as follows:
sudo /etc/init.d/apache2 restart

once running ou can make sure the php5 module is loaded in apache as follows: sudo apache2ctl -M and then look for php5 in the output:

image

@benhillis
Copy link
Member

Ah I think I see what the issue was. Php5 isn't in the default packages in 16.04, once I followed these instructions I got things working: http://askubuntu.com/questions/756181/installing-php-5-6-on-xenial-16-04.

Unfortunately I'm still seeing the issue that you describe, poll returning that the socket file descriptor is writable but the write failing with EAGAIN because the buffer is full. I will sync up with @sunilmut who owns this area and see if he has any ideas for a fix.

@garci66
Copy link
Author

garci66 commented Jan 4, 2017

Thanks! Good to know... too bad it's not yet fixed though... :( But great to see the work you guys are doing with WSL!

@benhillis
Copy link
Member

Sunil and I took a closer look at this and this issue is fixed internally. The fix should be available in an upcoming Windows Insider build.

@sunilmut
Copy link
Member

sunilmut commented Jan 5, 2017

Thanks @benhillis for brining this to my notice. To be precise, this is a duplicate of #1025, a fix for which should be available in an upcoming (mostly, next) insider build. Thanks @garci66 for the post and the detailed repro steps. Very useful (made it very easy to repro)!

@sunilmut
Copy link
Member

sunilmut commented Jan 9, 2017

This should be fixed in 15002. Please validate and let us know if the issue persists.

@garci66
Copy link
Author

garci66 commented Jan 13, 2017 via email

@garci66
Copy link
Author

garci66 commented May 23, 2017

It's working great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants