-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait longer for replaced c2s #2054
Conversation
This comment has been minimized.
This comment has been minimized.
Codecov Report
@@ Coverage Diff @@
## master #2054 +/- ##
==========================================
- Coverage 78.86% 78.85% -0.01%
==========================================
Files 333 333
Lines 28777 28788 +11
==========================================
+ Hits 22694 22701 +7
- Misses 6083 6087 +4
Continue to review full report at Codecov.
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Awesome! I think connection replacement and resumption is one of the pain points for client app developers - really good stuff! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
doc/Advanced-configuration.md
Outdated
@@ -328,6 +328,11 @@ There are some additional options that influence all database connections in the | |||
* **Default:** `[mongoose_router_global, mongoose_router_localdomain, mongoose_router_external_localnode, mongoose_router_external, ejabberd_s2s]` | |||
* **Example:** `{routing_modules, [mongoose_router_global, mongoose_router_localdomain]}.` | |||
|
|||
* **replaced_wait_timeout** (local) | |||
* **Description:** When a user session is replaced (due to a full JID conflict) by a new one, this parameter specifies time that MongooseIM waits for old sessions to close. The default value is sufficient in most cases. If you observe `replaced_wait_timeout` warning in logs, then most probably old sessions are frozen for some reason and it should be investigated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Description: When a user session is replaced (due to a full JID conflict) by a new one, this parameter specifies the time MongooseIM waits for the old sessions to close. The default value is sufficient in most cases. If you observe
replaced_wait_timeout
warning in logs, then most probably rgw old sessions are frozen for some reason and it should be investigated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rgw
This comment has been minimized.
This comment has been minimized.
@@ -210,9 +212,13 @@ forget_persistent_rooms() -> | |||
rpc(mim(), mnesia, clear_table, [muc_registered]), | |||
ok. | |||
|
|||
-spec successful_rpc(atom(), atom(), list()) -> term(). | |||
-spec successful_rpc(M :: atom(), F :: atom(), A :: list()) -> term(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does MFA
even need typespecs? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are even builtin mfa/0
and module/0
types.
big_tests/tests/mongoose_helper.erl
Outdated
inject_module(Module) -> | ||
inject_module(Module, false). | ||
|
||
-spec inject_module(Module :: module(), CheckIfLoadedAlready :: boolean()) -> ok | already_loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like DoOverwrite
would better explain what is its role
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will pick a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean params are evil. If you don't know the definition by heart, what does this call site tell you?
inject_module(Module, false)
Inject if true
, do not if false
- so do not really inject the module here? Inject, but false
what?
Instead: inject_module(Module, force_reload)
, or inject_module(Module, do_overwrite)
, or inject_module(Module, [force_reload])
...
ok; | ||
_ -> | ||
Timeout = get_replaced_wait_timeout(S), | ||
erlang:send_after(Timeout, self(), replaced_wait_timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether it is crucial, but maybe it would be worth to use TRef
? In order to prevent c2s process from getting messages replaced_wait_timeout
from different sources than itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some another source of such message? Not really :)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
6258.1 / Erlang 19.3 / small_tests / 54ce9a8 6258.5 / Erlang 19.3 / ldap_mnesia / 54ce9a8 6258.2 / Erlang 19.3 / internal_mnesia / 54ce9a8 6258.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 54ce9a8 6258.3 / Erlang 19.3 / mysql_redis / 54ce9a8 6258.4 / Erlang 19.3 / odbc_mssql_mnesia / 54ce9a8 6258.8 / Erlang 20.0 / pgsql_mnesia / 54ce9a8 6258.9 / Erlang 21.0 / riak_mnesia / 54ce9a8 mod_global_distrib_SUITE:mod_global_distrib:test_pm_with_ungraceful_reconnection_to_different_server{error,
{timeout_when_waiting_for_stanza,
[{escalus_client,wait_for_stanza,
[{client,<<"eve90.56542@localhost/res1">>,escalus_tcp,
<0.25627.1>,
[{event_manager,<0.25618.1>},
{server,<<"localhost">>},
{username,<<"eve90.56542">>},
{resource,<<"res1">>}],
[{event_client,
[{event_manager,<0.25618.1>},
{server,<<"localhost">>},
{username,<<"eve90.56542">>},
{resource,<<"res1">>}]},
{resource,<<"res1">>},
{username,<<"eve90.56542">>},
{server,<<"localhost">>},
{host,<<"localhost">>},
{port,5222},
{auth,{escalus_auth,auth_plain}},
{wspath,undefined},
{username,<<"eve90.56542">>},
{server,<<"localhost">>},
{password,<<"password">>},
{port,5222},
{stream_management,true},
{stream_id,<<"0629346B1ABA6CDE">>}]},
10000],
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
{line,138}]},
{mod_global_distrib_SUITE,
'-test_pm_with_ungraceful_reconnection_to_different_server/1-fun-0-',
4,
[{file,"mod_global_distrib_SUITE.erl"},{line,610}]},
{escalus_story,story,4,
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,72}]},
{tes... 6258.1 / Erlang 19.3 / small_tests / 54ce9a8 6258.2 / Erlang 19.3 / internal_mnesia / 54ce9a8 pubsub_SUITE:dag+collection:notify_collection_leaf_and_item_test{error,
{timeout_when_waiting_for_stanza,
[{escalus_client,wait_for_stanza,
[{client,<<"bOb81.274969@localhost/res1">>,escalus_tcp,
<0.4318.1>,
[{event_manager,<0.4255.1>},
{server,<<"localhost">>},
{username,<<"bOb81.274969">>},
{resource,<<"res1">>}],
[{event_client,
[{event_manager,<0.4255.1>},
{server,<<"localhost">>},
{username,<<"bOb81.274969">>},
{resource,<<"res1">>}]},
{resource,<<"res1">>},
{username,<<"bOb81.274969">>},
{server,<<"localhost">>},
{host,<<"localhost">>},
{port,5222},
{auth,{escalus_auth,auth_plain}},
{wspath,undefined},
{username,<<"bOb81.274969">>},
{server,<<"localhost">>},
{password,<<"makrolika">>},
{stream_id,<<"6CA1E0ADC05383E1">>}]},
5000],
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
{line,138}]},
{pubsub_tools,receive_notification,3,
[{file,"pubsub_tools.erl"},{line,480}]},
{pubsub_tools,receive_node_creation_notification,3,
[{file,"pubsub_tools.erl"},{line,304}]},
{pubsub_SUITE,'-notify_collection_leaf_and_item_test/1-fun-0-',2,
[{file,"pubsub_SUITE.erl"},{line,1266}]},
{escalus_story,story,4,
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_bui... 6258.3 / Erlang 19.3 / mysql_redis / 54ce9a8 pubsub_SUITE:dag+collection:discover_child_nodes_test{error,
{timeout_when_waiting_for_stanza,
[{escalus_client,wait_for_stanza,
[{client,<<"alicE86.623382@localhost/res1">>,escalus_tcp,
<0.10178.3>,
[{event_manager,<0.10126.3>},
{server,<<"localhost">>},
{username,<<"alicE86.623382">>},
{resource,<<"res1">>}],
[{event_client,
[{event_manager,<0.10126.3>},
{server,<<"localhost">>},
{username,<<"alicE86.623382">>},
{resource,<<"res1">>}]},
{resource,<<"res1">>},
{username,<<"alicE86.623382">>},
{server,<<"localhost">>},
{host,<<"localhost">>},
{port,5222},
{auth,{escalus_auth,auth_plain}},
{wspath,undefined},
{username,<<"alicE86.623382">>},
{server,<<"localhost">>},
{password,<<"matygrysa">>},
{stream_id,<<"4181867406491172">>}]},
5000],
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
{line,138}]},
{pubsub_tools,receive_response,3,
[{file,"pubsub_tools.erl"},{line,456}]},
{pubsub_tools,receive_and_check_response,4,
[{file,"pubsub_tools.erl"},{line,447}]},
{pubsub_SUITE,'-discover_child_nodes_test/1-fun-0-',2,
[{file,"pubsub_SUITE.erl"},{line,1493}]},
{escalus_story,story,4,
[{file,
"/home/travis/build/esl/MongooseIM/big_tests/_build/default... 6258.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 54ce9a8 6258.5 / Erlang 19.3 / ldap_mnesia / 54ce9a8 6258.4 / Erlang 19.3 / odbc_mssql_mnesia / 54ce9a8 6258.8 / Erlang 20.0 / pgsql_mnesia / 54ce9a8 6258.9 / Erlang 21.0 / riak_mnesia / 54ce9a8 |
Succeeded on Travis twice without restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good 👍 I like the src/ejabberd_c2s.erl
refactor :)
,{group, just_tls} | ||
[ | ||
{group, session_replacement}, | ||
%% these groups must be last, as they really... complicate configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate?
UserSpec = [{resource, <<"conflict">>} | escalus_users:get_userspec(Config, alice)], | ||
{ok, Alice1, _} = escalus_connection:start(UserSpec), | ||
|
||
{ok, Alice2, _} = escalus_connection:start(UserSpec), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename
Alice1
-> Alice
and Alice2
-> Bellatrix
Alice1
-> Geralt
and Alice2
-> Ciri
or something like that?
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea. Different names are used for different users. Different resources of the same user are suffixed, usually by a number. This is the convention used across all big tests.
{ok, Alice2, _} = escalus_connection:start(UserSpec), | ||
|
||
% THEN a timeout warning is logged | ||
FilterFun = fun(_, Msg) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this lambda to named private method because you use it in clean_close_of_replaced_session/1
and replaced_session_cannot_terminate/1
??
|
||
-import(mongoose_helper, [successful_rpc/3]). | ||
|
||
-type state() :: #{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this map contains one key only, why not directly use list (value for only key) as a state instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
-type state() :: [{pid(), lager:log_level()}].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less refactoring when you actually need to add one more field, which often happens with state
.
Awesome! |
This PR addresses #1880
When a node is slow, it may take more than 100ms for the replaced session to close or for
'DOWN'
message to reach the new c2s.TODO:
handle_info