Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads of ngx request body socket fails with (17: File exists) randomly on SPDY #252

Closed
pgaertig opened this issue Jun 27, 2013 · 11 comments
Closed

Comments

@pgaertig
Copy link

nginx 1.4.1, lua-module from master, luajit

Below is the snippet (https://github.com/pgaertig/nginx-big-upload/blob/master/file_storage_handler.lua) of code catching the problem:

ngx.log(ngx.ERR, self.left)
ngx.log(ngx.ERR, current_chunk_size) -- 4096

ngx.log(ngx.ERR, 'raw0')
local chunk, err =  self.socket:receive(current_chunk_size)
ngx.log(ngx.ERR, 'raw1')
if err then
    return nil, "socket:receive :"..err
end

Shortly, gigabytes of data is read with plain HTTP-SSL, but when SPDY is enabled usually the first or second socket read fails with epoll_ctl(1, 21) failed (17: File exists). Excerpt of log:

2013/06/27 18:48:38 [error] 10188#0: *1 [lua] request_processor.lua:35: raw_body_by_chunk(): left 28672, client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
2013/06/27 18:48:38 [error] 10188#0: *1 [lua] request_processor.lua:36: raw_body_by_chunk(): read size 4096, client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
2013/06/27 18:48:38 [error] 10188#0: *1 [lua] request_processor.lua:38: raw_body_by_chunk(): raw0, client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
2013/06/27 18:48:38 [alert] 10188#0: *1 epoll_ctl(1, 21) failed (17: File exists), client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
2013/06/27 18:48:38 [error] 10188#0: *1 [lua] request_processor.lua:40: raw_body_by_chunk(): raw1, client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
2013/06/27 18:48:38 [error] 10188#0: *1 [lua] big-upload.lua:41: socket:receive :error, client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"

Full log http://pastebin.com/KP1cixpS .

@agentzh
Copy link
Member

agentzh commented Jun 27, 2013

Thank you for the report! ngx_lua is known to have various problems with nginx's new SPDY support. I haven't got a chance to really look at it. I'll try to find some time to debug such issues. Or will you have the tuits to trace it down and submit a patch? I'd love to accept patches :) Thanks!

@pgaertig
Copy link
Author

Tried to look into ngx_lua and spdy source, damn magic. [OT] I left C for garbage-collected languages decade ago. That's why I decided to write upload module in Lua instead of trying to take over existing one ;) [/OT]
But at least tempted me to look deeper. So here is also some additional stuff from strace, this is exactly receive part:

 read(13, "\27\3\1\0 \344\355H4*\365\217\347L\372\17\ti\273]\7P\32\235x\365\345\266\236\326f\326\334\303(\347g\27\3\1\v@\353\366W\255\355\272\267\30\241[\217mA\363\336_\357\10B\3203G\252\315\327\233\224\315\312\201\350\37\34#\217\206H\277"..., 33093) = 14700
 read(13, 0x1b52213, 33093)              = -1 EAGAIN (Resource temporarily unavailable)
 epoll_ctl(12, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLET, {u32=27405440, u64=27405440}}) = -1 EEXIST (File exists)
 write(4, "2013/06/27 22:45:53 [alert] 19738#0: *7 epoll_ctl(1, 13) failed (17: File exists"..., 277) = 277

I switched to worker_processes 1; to get rid of noise in logs plus I reduced chunk size to 100 bytes and surprisingly it sometimes processes full upload but then unfortunately similar error happens in other code using ngx.location.capture and write.
To me looks like some problem with socket exchange and duplicate event registration?. Epoll is new to me.

@pgaertig
Copy link
Author

Ok more facts. This time I tried other event implementations:

2013/06/28 00:06:51 [error] 26293#0: *1 [lua] request_processor.lua:38: raw_body_by_chunk(): raw0,
2013/06/28 00:06:51 [alert] 26293#0: *1 select event fd:10 ev:1 is already set,

or

2013/06/28 00:18:06 [alert] 27751#0: *21 poll event fd:13 ev:1 is already set,

Both failing on the first retrieve.

@pgaertig
Copy link
Author

FYI this still occurs with nginx 1.5.4 and master. I've spent some time to debug it. Here is some dd info when I tried to upload 2KB file:

lua *** ngx_http_lua_chains_get_free_buf: buf start: 0x21203d0 at ngx_http_lua_util.c line 2735.
lua *** ngx_http_lua_socket_tcp_receive: tcp receive: buf_in: 0x2120328, bufs_in: 0x2120328 at ngx_http_lua_socket_tcp.c line 1135.
lua *** ngx_http_lua_socket_tcp_read: read event ready: 0 at ngx_http_lua_socket_tcp.c line 1517.
2013/09/11 21:55:39 [alert] 6699#0: *1 epoll_ctl(1, 4) failed (17: File exists), client: 127.0.0.1, server: foo.com, request: "POST /upload HTTP/1.1", host: "foo.com", referrer: "https://foo.com/"
lua *** ngx_http_lua_socket_tcp_receive: read failed: 1 at ngx_http_lua_socket_tcp.c line 1146.
lua *** ngx_http_lua_socket_tcp_receive_retval_handler: u->bufs_in: 0x2120328 at ngx_http_lua_socket_tcp.c line 1826.
lua *** ngx_http_lua_socket_push_input_data: bufs_in: 0x2120328, buf_in: 0x2120328 at ngx_http_lua_socket_tcp.c line 3765.
lua *** ngx_http_lua_socket_push_input_data: size: 0, nbufs: 1 at ngx_http_lua_socket_tcp.c line 3782.
lua *** ngx_http_lua_socket_push_input_data: resetting u->buffer pos & last at ngx_http_lua_socket_tcp.c line 3839.

If file is bigger sometimes several small chunks pass with "read event ready: 1" but once 0 occurs it stops.
However I can't get any clue where to start with the code - too much to get into nginx development at nights.

@agentzh
Copy link
Member

agentzh commented Sep 11, 2013

@pgaertig Thank you for the debugging! Could you please enable the nginx debugging logs? That way we can get much more information:

http://nginx.org/en/docs/debugging_log.html

Thanks for your time!

@pgaertig
Copy link
Author

Ah, I've enabled debug but forgot that server block overrides it locally, thanks for the link. I've rerun my tests, this time 177 bytes file upload. To minimize any parallel interference I've set worker_processes 1 and spdy_max_concurrent_streams 1. Here are results on fresh nginx process:

SSL SPDY (fails with 500): http://pastebin.com/195sw71Y
SSL without SPDY (successful): http://pastebin.com/puXb38pB

I also repeated it several times after first failed request or next to successful response to other non-Lua url. Results are the same.

Thanks!

@agentzh
Copy link
Member

agentzh commented Sep 13, 2013

@pgaertig Your detailed logs made me finally sit down and have a close look at the new SPDY implementation in the Nginx core for the problem you're seeing.

I'm very disappointed to see that Nginx essentially forks the code base specifically for the SPDY protocol. So if 3rd-party modules want to play with the downstream connections directly, they also have to fork their code base specifically for SPDY. This is very sad.

The epoll_ctl error that you're seeing is just Nginx's SPDY's read event handler conflicting with ngx_lua's existing read event handler for non-SPDY requests. They're not compatible at all. To make them work together, ngx_lua needs to fork a code path specifically for SPDY in the same way.

So this is no easy task to support SPDY in ngx_lua (or other Nginx modules doing clever things on the downstream connections). Because the Nginx company is also concerned about the SPDY compatibility of ngx_lua, I'll write to them for more suggestions and necessary changes in the Nginx core to make it easier for ngx_lua. I'll keep you updated.

Finally, I must thank you again for your continuing efforts on fixing this issue :)

@pgaertig
Copy link
Author

@agentzh Thanks for the action. I am very happy that my investigation helped. I was surprised that official SPDY module is still a patch file. There was similar problem with SSI module and SPDY and they fixed the problem within SPDY module ( http://trac.nginx.org/nginx/ticket/302 ).

I suppose the fork problem could be solved by #ifdef _NGX_HTTP_SPDY_H_INCLUDED_ or following the same way as nginx - patch file, which I don't like actually.

Lua-nginx deserves any effort on both sides to resolve issue because I think it is the best module which gives nginx users the ability to actually implement any server logic they can imagine.

The most important is that the problem cause is known now. If you need more tests from me just write here.

@agentzh
Copy link
Member

agentzh commented Sep 13, 2013

Hello!

On Fri, Sep 13, 2013 at 12:32 AM, P Gaertig notifications@github.com wrote:

@agentzh Thanks for the action. I am very happy that my investigation helped. I was surprised that official SPDY module is still a patch file.

The SPDY thing has already been merged into the mainline Nginx since
1.3.15. It's no longer a patch :)

I suppose the fork problem could be solved by #ifdef NGX_HTTP_SPDY_H_INCLUDED or following the same way as nginx - patch file, which I don't like actually.

For the forked implementation of ngx.req.socket that'll be
specifically for SPDY, I'll indeed need to guard my code with #if (NGX_HTTP_SPDY) and #endif, just as how the Nginx core forks the
implementation for its own request body reader ;)

Lua-nginx deserves any effort on both sides to resolve issue because I think it is the best module which gives nginx users the ability to actually implement any server logic they can imagine.

Thank you very much for your compliments :)

The most important is that the problem cause is known now. If you need more tests from me just write here.

Thank you in advance! I'll try to add support for SPDY to my test
scaffold (Test::Nginx::Socket) first :)

Best regards,
-agentzh

@pgaertig
Copy link
Author

You are right I was using https://launchpad.net/~chris-lea/+archive/nginx-devel 1.5.4 DEB package, which had patches left. They are not used by build anymore.

@agentzh
Copy link
Member

agentzh commented Feb 16, 2016

SPDY is dead. I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants