Skip to content

websocket 上下游 触发nginx coredump #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lilien1010 opened this issue Jul 14, 2017 · 2 comments
Closed

websocket 上下游 触发nginx coredump #25

lilien1010 opened this issue Jul 14, 2017 · 2 comments

Comments

@lilien1010
Copy link

lilien1010 commented Jul 14, 2017

现在遇到一个openresty coredump的案列,使用场景是作为一个websocket的服务。

使用openresty 作为现有TCP服务器的一个 websocket 接入层

0:openresty的版本 1.11.23
1:客户端到收到客户端的websocket请求,升级协议
2:鉴权成功,连接到后端TCP server。标记 socket1
3:主循环A,阻塞在 ws_recv_loop 函数.不断接收数据,json_decode之后,转化为二进制数据,发送到 socket1
4: 升级协议的同时使用,ngx.thread.spawn 开启新的轻线程B,,从 socket1 读取数据【 recvFromIMServer阻塞在read函数,】
5:读取socket1的数据,读取proto解析为json 下发给 websocket.

大致代码如下

local function im_recv_loop(userid,chatServer)
   ngx.log(ngx.CRIT,'create thread ,userid: ',userid)
   while true do
   	local pack,err	=	chatServer:recvFromIMServer( userid )
   	if not pack then
   		ngx.log(ngx.ERR,userid,',recv bad pack skip : ',err) 
   		break
   	end

   		local bytes, err	 =	chatServer:sendToClient(pack)
   		if not bytes then
   			ngx.log(ngx.ERR, "failed to send text: ", err) 
   			break
   		end

   end
   ngx.log(ngx.ERR,userid,',wb closing sock')
   chatServer:close()
   -- chatServer close 内部 关闭 上游的tcp ,同时关闭
end


local function ws_recv_loop(wb,userid,token)


   	local chatServer	=	chatServer:new(userid,wb)
   	  
   	local ret 			=	chatServer:startLogin(token,userid)
   	
   	if not ret then 
   		chatServer:close()
   		return 
   	end

   	local push_thread 	=	ngx_thread_spawn(im_recv_loop,userid,chatServer)
   	
   	-- 作为收报的时间处理 
   	local _type,_err 		=	''
   	while true do
                                local data, typ, err = wb:recv_frame()
                               if wb.fatal then
   		             _type	=	'fatal'
                                   ngx.log(ngx.ERR, userid," failed to receive frame: ", err)
                                  break
                             end
   		  
   		local timeout 	=	false
   		if ( err and  sub(err,-7) =='timeout') then
   			timeout	=	true
   		end

   		if not err or timeout==false  then
   		  ngx.log(ngx.CRIT,typ,',userid: ',userid, ",data: ", data,',err: ',err)
   		end
   		 _type	=	typ
   		 -- 持续收报
   		 if timeout==true then 
   		 end

   		  if typ == "close" then
   			ngx.log(ngx.ERR,userid,',',typ, ",data:", data,',err:',err)
   			break
   		  elseif typ == "text" then
   			 
   			chatServer:recvFromClient(data)

   			elseif typ == 'ping' then
   				local bytes, err = wb:send_pong('')

   			elseif typ == 'pong' then

   			elseif typ == 'continuation' then
   			elseif typ == 'binary' then

   		  end

     end
     
   	ngx.log(ngx.CRIT,userid,',typ:',_type,',logout')
   	chatServer:close()  
   	
   	-- chatServer close 之后,会在内部标记 wb_closed,
   	-- 是的 chatServer  recvFromIMServer 返回nil,终止 im_recv_loop线程。
   	ngx.thread.wait(push_thread) 

end

若干coredump如下,每一次coredump产生的地方不固定,但是基本上  lj_alloc_free 这个函数,有时候还是 json_decode导致的。情况比较特殊

#0 0x00007f40cf42ce19 in lj_alloc_free (msp=0x40b71010, ptr=) at lj_alloc.c:1404
#1 0x00007f40cf3e3c27 in gc_sweep (g=0x40b713b8, p=0x4036ddc0, lim=37) at lj_gc.c:406
#2 0x00007f40cf3e48d4 in gc_onestep (L=0x419275b8) at lj_gc.c:637
#3 0x00007f40cf3e4f28 in lj_gc_step (L=0x419275b8) at lj_gc.c:689
#4 0x00007f40cf3e91a3 in lj_meta_cat (L=0x419275b8, top=, left=0) at lj_meta.c:304
#5 0x00007f40cf3e0bfc in lj_BC_CAT () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#6 0x00000000004c6ac0 in ngx_http_lua_run_thread (L=0x40b71378, r=0x13ce630, ctx=0x13cf9d0, nrets=0) at ../ngx_lua-0.10.8/src/ngx_http_lua_util.c:1005
#7 0x00000000004c84dc in ngx_http_lua_content_by_chunk (L=0x40b71378, r=0x13ce630) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:120
#8 0x00000000004c8884 in ngx_http_lua_content_handler_file (r=0x13ce630) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:284
#9 0x00000000004c898e in ngx_http_lua_content_handler (r=0x13ce630) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:222
#10 0x000000000044fe40 in ngx_http_core_content_phase (r=0x13ce630, ph=) at src/http/ngx_http_core_module.c:1379
#11 0x000000000044a12d in ngx_http_core_run_phases (r=0x13ce630) at src/http/ngx_http_core_module.c:856
#12 0x0000000000454d61 in ngx_http_process_request (r=0x13ce630) at src/http/ngx_http_request.c:1916
#13 0x0000000000455a1c in ngx_http_process_request_line (rev=0x13ef9f0) at src/http/ngx_http_request.c:1027
#14 0x0000000000438a02 in ngx_event_process_posted (cycle=, posted=0x7603d0) at src/event/ngx_event_posted.c:33
#15 0x000000000043ed48 in ngx_worker_process_cycle (cycle=0x1326040, data=) at src/os/unix/ngx_process_cycle.c:753
#16 0x000000000043d387 in ngx_spawn_process (cycle=0x1326040, proc=0x43ed10 <ngx_worker_process_cycle>, data=0xb, name=0x4fe415 "worker process", respawn=-4)
at src/os/unix/ngx_process.c:198
#17 0x000000000043e24c in ngx_start_worker_processes (cycle=0x1326040, n=32, type=-4) at src/os/unix/ngx_process_cycle.c:358
#18 0x000000000043f5ac in ngx_master_process_cycle (cycle=0x1326040) at src/os/unix/ngx_process_cycle.c:243
#19 0x000000000041cb1f in main (argc=, argv=) at src/core/nginx.c:36



#0 gc_sweep (g=0x40b713b8, p=0x40c082a8, lim=37) at lj_gc.c:395
#1 0x00007f40cf3e48d4 in gc_onestep (L=0x407f50e0) at lj_gc.c:637
#2 0x00007f40cf3e4f28 in lj_gc_step (L=0x407f50e0) at lj_gc.c:689
#3 0x00007f40cf3f1aa3 in lua_pushlstring (L=0x407f50e0, str=, len=) at lj_api.c:577
#4 0x00007f40bd9d21a2 in json_parse_object_context (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1208
#5 json_process_value (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1288
#6 0x00007f40bd9d2374 in json_parse_array_context (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1256
#7 json_process_value (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1291
#8 0x00007f40bd9d220b in json_parse_object_context (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1216
#9 json_process_value (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1288
#10 0x00007f40bd9d2374 in json_parse_array_context (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1256
#11 json_process_value (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1291
#12 0x00007f40bd9d220b in json_parse_object_context (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1216
#13 json_process_value (l=0x407f50e0, json=0x7fff16db2540, token=) at lua_cjson.c:1288
#14 0x00007f40bd9d2530 in json_decode (l=0x407f50e0) at lua_cjson.c:1330
#15 0x00007f40cf3e1bba in lj_BC_FUNCC () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#16 0x00000000004c6ac0 in ngx_http_lua_run_thread (L=0x40b71378, r=0x146d980, ctx=0x146ed20, nrets=0) at ../ngx_lua-0.10.8/src/ngx_http_lua_util.c:1005
#17 0x00000000004c84dc in ngx_http_lua_content_by_chunk (L=0x40b71378, r=0x146d980) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:120
#18 0x00000000004c8884 in ngx_http_lua_content_handler_file (r=0x146d980) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:284
#19 0x00000000004c898e in ngx_http_lua_content_handler (r=0x146d980) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:222
#20 0x000000000044fe40 in ngx_http_core_content_phase (r=0x146d980, ph=) at src/http/ngx_http_core_module.c:1379
#21 0x000000000044a12d in ngx_http_core_run_phases (r=0x146d980) at src/http/ngx_http_core_module.c:856
#22 0x0000000000454d61 in ngx_http_process_request (r=0x146d980) at src/http/ngx_http_request.c:1916
#23 0x0000000000455a1c in ngx_http_process_request_line (rev=0x13ef8d0) at src/http/ngx_http_request.c:1027
#24 0x00000000004409e5 in ngx_epoll_process_events (cycle=, timer=, flags=) at src/event/modules/ngx_epoll_module.c:900
#25 0x0000000000438765 in ngx_process_events_and_timers (cycle=0x1326040) at src/event/ngx_event.c:242
#26 0x000000000043ed48 in ngx_worker_process_cycle (cycle=0x1326040, data=) at src/os/unix/ngx_process_cycle.c:753
#27 0x000000000043d387 in ngx_spawn_process (cycle=0x1326040, proc=0x43ed10 <ngx_worker_process_cycle>, data=0xa, name=0x4fe415 "worker process", respawn=-4)
at src/os/unix/ngx_process.c:198
#28 0x000000000043e24c in ngx_start_worker_processes (cycle=0x1326040, n=32, type=-4) at src/os/unix/ngx_process_cycle.c:358
#29 0x000000000043f5ac in ngx_master_process_cycle (cycle=0x1326040) at src/os/unix/ngx_process_cycle.c:243
#30 0x000000000041cb1f in main (argc=, argv=) at src/core/nginx.c:367


#0 gc_sweep (g=0x40b713b8, p=0x40921ea8, lim=0) at lj_gc.c:395
#1 0x00007f40cf3e48d4 in gc_onestep (L=0x402ebe58) at lj_gc.c:637
#2 0x00007f40cf3e4f28 in lj_gc_step (L=0x402ebe58) at lj_gc.c:689
#3 0x00007f40cf3f210c in lua_newuserdata (L=0x402ebe58, size=) at lj_api.c:684
#4 0x00000000004b35e2 in ngx_http_lua_var_get (L=0x402ebe58) at ../ngx_lua-0.10.8/src/ngx_http_lua_variable.c:113
#5 0x00007f40cf3e1bba in lj_BC_FUNCC () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#6 0x00000000004c6ac0 in ngx_http_lua_run_thread (L=0x40b71378, r=0x13e3410, ctx=0x13e47b0, nrets=0) at ../ngx_lua-0.10.8/src/ngx_http_lua_util.c:1005
#7 0x00000000004c84dc in ngx_http_lua_content_by_chunk (L=0x40b71378, r=0x13e3410) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:120
#8 0x00000000004c8884 in ngx_http_lua_content_handler_file (r=0x13e3410) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:284
#9 0x00000000004c898e in ngx_http_lua_content_handler (r=0x13e3410) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:222
#10 0x000000000044fe40 in ngx_http_core_content_phase (r=0x13e3410, ph=) at src/http/ngx_http_core_module.c:1379
#11 0x000000000044a12d in ngx_http_core_run_phases (r=0x13e3410) at src/http/ngx_http_core_module.c:856
#12 0x0000000000454d61 in ngx_http_process_request (r=0x13e3410) at src/http/ngx_http_request.c:1916
#13 0x0000000000455a1c in ngx_http_process_request_line (rev=0x13efe70) at src/http/ngx_http_request.c:1027
#14 0x0000000000438a02 in ngx_event_process_posted (cycle=, posted=0x7603d0) at src/event/ngx_event_posted.c:33
#15 0x000000000043ed48 in ngx_worker_process_cycle (cycle=0x1326040, data=) at src/os/unix/ngx_process_cycle.c:753
#16 0x000000000043d387 in ngx_spawn_process (cycle=0x1326040, proc=0x43ed10 <ngx_worker_process_cycle>, data=0x10, name=0x4fe415 "worker process", respawn=49)
at src/os/unix/ngx_process.c:198
#17 0x000000000043f99a in ngx_reap_children (cycle=0x1326040) at src/os/unix/ngx_process_cycle.c:621
#18 ngx_master_process_cycle (cycle=0x1326040) at src/os/unix/ngx_process_cycle.c:174
#19 0x000000000041cb1f in main (argc=, argv=) at src/core/nginx.c:367

@lilien1010
Copy link
Author

Program terminated with signal 11, Segmentation fault.
#0  0x00007f40cf3e3c23 in gc_sweep (g=0x41eb43b8, p=0x40a601e8, lim=33) at lj_gc.c:406
406           gc_freefunc[o->gch.gct - ~LJ_TSTR](g, o);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-37.el6_6.x86_64 libcom_err-1.41.12-21.el6.x86_64 libgcc-4.4.7-16.el6.x86_64 libmcrypt-2.5.8-9.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64 nss-softokn-freebl-3.14.3-9.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 pcre-7.8-7.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x00007f40cf3e3c23 in gc_sweep (g=0x41eb43b8, p=0x40a601e8, lim=33) at lj_gc.c:406
#1  0x00007f40cf3e48d4 in gc_onestep (L=0x41886fd0) at lj_gc.c:637
#2  0x00007f40cf3e4f28 in lj_gc_step (L=0x41886fd0) at lj_gc.c:689
#3  0x00007f40cf3f1aa3 in lua_pushlstring (L=0x41886fd0, str=<value optimized out>, len=<value optimized out>) at lj_api.c:577
#4  0x00000000004d4689 in ngx_http_lua_socket_tcp_connect (L=0x41886fd0) at ../ngx_lua-0.10.8/src/ngx_http_lua_socket_tcp.c:520
#5  0x00007f40cf3e1bba in lj_BC_FUNCC () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2
#6  0x00000000004c6ac0 in ngx_http_lua_run_thread (L=0x41eb4378, r=0x13bea80, ctx=0x13bfe20, nrets=0) at ../ngx_lua-0.10.8/src/ngx_http_lua_util.c:1005
#7  0x00000000004c84dc in ngx_http_lua_content_by_chunk (L=0x41eb4378, r=0x13bea80) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:120
#8  0x00000000004c8884 in ngx_http_lua_content_handler_file (r=0x13bea80) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:284
#9  0x00000000004c898e in ngx_http_lua_content_handler (r=0x13bea80) at ../ngx_lua-0.10.8/src/ngx_http_lua_contentby.c:222
#10 0x000000000044fe40 in ngx_http_core_content_phase (r=0x13bea80, ph=<value optimized out>) at src/http/ngx_http_core_module.c:1379
#11 0x000000000044a12d in ngx_http_core_run_phases (r=0x13bea80) at src/http/ngx_http_core_module.c:856
#12 0x0000000000454d61 in ngx_http_process_request (r=0x13bea80) at src/http/ngx_http_request.c:1916
#13 0x0000000000455a1c in ngx_http_process_request_line (rev=0x14f9730) at src/http/ngx_http_request.c:1027
#14 0x00000000004409e5 in ngx_epoll_process_events (cycle=<value optimized out>, timer=<value optimized out>, flags=<value optimized out>) at src/event/modules/ngx_epoll_module.c:900
#15 0x0000000000438765 in ngx_process_events_and_timers (cycle=0x14954f0) at src/event/ngx_event.c:242
#16 0x000000000043ed48 in ngx_worker_process_cycle (cycle=0x14954f0, data=<value optimized out>) at src/os/unix/ngx_process_cycle.c:753
#17 0x000000000043d387 in ngx_spawn_process (cycle=0x14954f0, proc=0x43ed10 <ngx_worker_process_cycle>, data=0x0, name=0x4fe415 "worker process", respawn=-4)
    at src/os/unix/ngx_process.c:198
#18 0x000000000043e24c in ngx_start_worker_processes (cycle=0x14954f0, n=32, type=-4) at src/os/unix/ngx_process_cycle.c:358
#19 0x000000000043f5ac in ngx_master_process_cycle (cycle=0x14954f0) at src/os/unix/ngx_process_cycle.c:243
#20 0x000000000041cb1f in main (argc=<value optimized out>, argv=<value optimized out>) at src/core/nginx.c:367

@agentzh
Copy link
Member

agentzh commented Jul 14, 2017

@lilien1010 Please, do not use Chinese here. This place is considered English only. If you
really want to use Chinese, please join and post to the openresty (Chinese)
mailing list instead. Please see https://openresty.org/en/community.html Thanks for
your cooperation.

Regarding your issue, the backtrace of the crashing site is not helpful since it is not the culprit but just an ultimate consequence of earlier memory corruptions. My hunch is that maybe you are using some buggy 3rd-party Lua libraries or NGINX C modules that are not maintained by OpenResty. Without seeing a self-contained and minimal example that can easily reproduce the problem on our side, we cannot really help you out.

Generally, to debug such memory issues, you should use valgrind and the openresty-valgrind package to run your application and check if valgrind finds any memory errors. OpenResty provides pre-built openresty-valgrind packages for many common Linux distributions. See

If your Linux system is not supported in the prebuilt package repositories yet, you can build your own openresty-valgrind by following the steps in the openresty-valgrind's RPM spec file:

https://github.com/openresty/openresty-packaging/blob/master/rpm/SPECS/openresty-valgrind.spec#L58

Among other things, remember to configure the following lines in your nginx.conf's top level scope to ensure nginx runs as a single non-daemon process when being run by valgrind:

daemon off;
maser_process off;
worker_processes 1;

Good luck!

@agentzh agentzh closed this as completed Jul 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants