Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vegur_roundtrip_SUITE:large_chunked_request_response_interrupt has non-deterministic failures #118

Open
evanmcc opened this issue Dec 18, 2014 · 5 comments
Labels

Comments

@evanmcc
Copy link
Contributor

evanmcc commented Dec 18, 2014

=== Ended at 2014-12-18 10:08:48
=== location [{vegur_roundtrip_SUITE,recv_until_close,2017},
              {vegur_roundtrip_SUITE,large_chunked_request_response_interrupt,1907},
              {test_server,ts_tc,1415},
              {test_server,run_test_case_eval1,1028},
              {test_server,run_test_case_eval,976}]
=== reason = timeout
  in function  vegur_roundtrip_SUITE:recv_until_close/1 (vegur_roundtrip_SUITE.erl, line 2017)
  in call from vegur_roundtrip_SUITE:large_chunked_request_response_interrupt/1 (vegur_roundtrip_SUITE.erl, line 1907)
  in call from test_server:ts_tc/3 (test_server.erl, line 1415)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1028)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 976)

ignore the bogus line numbers and error reason, I have some debugging code in the test. The gen_tcp:recv/3 is failing eventually at Timeout = 100, 300, and 10000. I didn't try anything higher. I don't have good counts on how often this happens, but in all cases less than 5 minutes. just do:

 while [ $? -eq 0 ]; do ct_run -dir test/ -logdir logs -pa ebin -pa deps/*/ebin; done

and you'll get a failure before too long.

@evanmcc evanmcc added the bug label Dec 18, 2014
@evanmcc
Copy link
Contributor Author

evanmcc commented Dec 19, 2014

edited output of erlang:port_info(Port) on the port as it times out.

[{name,"tcp_inet"},
 {links,[<0.4081.2>]},
 {id,10729},
 {connected,<0.4081.2>},
 {input,0},
 {output,12000}, <------
 {os_pid,undefined}]

note that I also tried:

-    {ok, Client} = gen_tcp:connect(IP, Port, [{active,false},list],1000),
+    {ok, Client} = gen_tcp:connect(IP, Port, [{active,false},list,{sndbuf,100000},{recbuf,100000}],1000),

but got the same output when it failed.

@evanmcc
Copy link
Contributor Author

evanmcc commented Dec 19, 2014

also saw an identical failure in: vegur_roundtrip_SUITE:large_close_request_response_interrupt/1

@ferd
Copy link
Contributor

ferd commented Dec 19, 2014

I'm wondering if this isn't just bad TCP stacks falling into weird states here and there. Running tests on localhost and on travis sometimes would yield entirely different ways to terminate connections.

@evanmcc
Copy link
Contributor Author

evanmcc commented Dec 19, 2014

could well be. I'd feel more comfortable if this failed less often though, so mostly I am looking for ameliorations to make the failure < 1% of the time, ideally much less.

@ferd
Copy link
Contributor

ferd commented May 6, 2015

I think this has been fixed while reworking the interruption detection and semantics. Marking as closed, will reopen if we see it happen again.

@ferd ferd closed this as completed May 6, 2015
@ferd ferd reopened this May 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants