Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait longer for replaced c2s #2054

Merged
merged 16 commits into from
Mar 19, 2019
Merged

Wait longer for replaced c2s #2054

merged 16 commits into from
Mar 19, 2019

Conversation

fenek
Copy link
Member

@fenek fenek commented Aug 29, 2018

This PR addresses #1880

When a node is slow, it may take more than 100ms for the replaced session to close or for 'DOWN' message to reach the new c2s.

TODO:

  • Tests
  • C2S state could store replaced PIDs list in the state so they may be ignored in handle_info
  • Wait time for replaced sessions may be configurable
  • Docs

@mongoose-im

This comment has been minimized.

@codecov
Copy link

codecov bot commented Aug 29, 2018

Codecov Report

Merging #2054 into master will decrease coverage by <.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2054      +/-   ##
==========================================
- Coverage   78.86%   78.85%   -0.01%     
==========================================
  Files         333      333              
  Lines       28777    28788      +11     
==========================================
+ Hits        22694    22701       +7     
- Misses       6083     6087       +4
Impacted Files Coverage Δ
src/ejabberd_c2s.erl 87.14% <94.73%> (+0.26%) ⬆️
src/mam/mod_mam_rdbms_prefs.erl 92.52% <0%> (-3.74%) ⬇️
src/rdbms/mongoose_rdbms.erl 68.87% <0%> (-2.56%) ⬇️
src/mam/mod_mam_utils.erl 82.78% <0%> (-0.67%) ⬇️
src/mod_muc_log.erl 77.69% <0%> (-0.52%) ⬇️
src/pubsub/mod_pubsub.erl 71.7% <0%> (-0.49%) ⬇️
src/mod_bosh_socket.erl 82.17% <0%> (+0.33%) ⬆️
src/mod_muc_room.erl 77.5% <0%> (+0.46%) ⬆️
src/ejabberd_router.erl 74.21% <0%> (+1.25%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9939861...a436d8d. Read the comment docs.

src/ejabberd_c2s.erl Outdated Show resolved Hide resolved
@mongoose-im

This comment has been minimized.

src/ejabberd_c2s.erl Outdated Show resolved Hide resolved
@mongoose-im

This comment has been minimized.

@erszcz
Copy link
Member

erszcz commented Mar 7, 2019

Awesome! I think connection replacement and resumption is one of the pain points for client app developers - really good stuff!

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@@ -328,6 +328,11 @@ There are some additional options that influence all database connections in the
* **Default:** `[mongoose_router_global, mongoose_router_localdomain, mongoose_router_external_localnode, mongoose_router_external, ejabberd_s2s]`
* **Example:** `{routing_modules, [mongoose_router_global, mongoose_router_localdomain]}.`

* **replaced_wait_timeout** (local)
* **Description:** When a user session is replaced (due to a full JID conflict) by a new one, this parameter specifies time that MongooseIM waits for old sessions to close. The default value is sufficient in most cases. If you observe `replaced_wait_timeout` warning in logs, then most probably old sessions are frozen for some reason and it should be investigated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Description: When a user session is replaced (due to a full JID conflict) by a new one, this parameter specifies the time MongooseIM waits for the old sessions to close. The default value is sufficient in most cases. If you observe replaced_wait_timeout warning in logs, then most probably rgw old sessions are frozen for some reason and it should be investigated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rgw

@mongoose-im

This comment has been minimized.

@kanes115 kanes115 self-requested a review March 8, 2019 14:00
@@ -210,9 +212,13 @@ forget_persistent_rooms() ->
rpc(mim(), mnesia, clear_table, [muc_registered]),
ok.

-spec successful_rpc(atom(), atom(), list()) -> term().
-spec successful_rpc(M :: atom(), F :: atom(), A :: list()) -> term().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MFA even need typespecs? :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject_module(Module) ->
inject_module(Module, false).

-spec inject_module(Module :: module(), CheckIfLoadedAlready :: boolean()) -> ok | already_loaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like DoOverwrite would better explain what is its role

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will pick a better name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean params are evil. If you don't know the definition by heart, what does this call site tell you?

inject_module(Module, false)

Inject if true, do not if false - so do not really inject the module here? Inject, but false what?
Instead: inject_module(Module, force_reload), or inject_module(Module, do_overwrite), or inject_module(Module, [force_reload])...

ok;
_ ->
Timeout = get_replaced_wait_timeout(S),
erlang:send_after(Timeout, self(), replaced_wait_timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether it is crucial, but maybe it would be worth to use TRef? In order to prevent c2s process from getting messages replaced_wait_timeout from different sources than itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some another source of such message? Not really :)

@fenek fenek mentioned this pull request Mar 12, 2019
5 tasks
@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@mongoose-im
Copy link
Collaborator

6257.1 / Erlang 19.3 / small_tests / 2c08eb3
Reports root / small

@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 18, 2019

6258.1 / Erlang 19.3 / small_tests / 54ce9a8
Reports root / small


6258.5 / Erlang 19.3 / ldap_mnesia / 54ce9a8
Reports root/ big
OK: 1157 / Failed: 0 / User-skipped: 100 / Auto-skipped: 0


6258.2 / Erlang 19.3 / internal_mnesia / 54ce9a8
Reports root/ big
OK: 1192 / Failed: 0 / User-skipped: 65 / Auto-skipped: 0


6258.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 54ce9a8
Reports root/ big
OK: 469 / Failed: 0 / User-skipped: 8 / Auto-skipped: 0


6258.3 / Erlang 19.3 / mysql_redis / 54ce9a8
Reports root/ big
OK: 3063 / Failed: 0 / User-skipped: 232 / Auto-skipped: 0


6258.4 / Erlang 19.3 / odbc_mssql_mnesia / 54ce9a8
Reports root/ big
OK: 3065 / Failed: 0 / User-skipped: 230 / Auto-skipped: 0


6258.8 / Erlang 20.0 / pgsql_mnesia / 54ce9a8
Reports root/ big / small
OK: 3097 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6258.9 / Erlang 21.0 / riak_mnesia / 54ce9a8
Reports root/ big / small
OK: 1440 / Failed: 1 / User-skipped: 63 / Auto-skipped: 0

mod_global_distrib_SUITE:mod_global_distrib:test_pm_with_ungraceful_reconnection_to_different_server
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,<<"eve90.56542@localhost/res1">>,escalus_tcp,
          <0.25627.1>,
          [{event_manager,<0.25618.1>},
           {server,<<"localhost">>},
           {username,<<"eve90.56542">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.25618.1>},
            {server,<<"localhost">>},
            {username,<<"eve90.56542">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,<<"eve90.56542">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,{escalus_auth,auth_plain}},
           {wspath,undefined},
           {username,<<"eve90.56542">>},
           {server,<<"localhost">>},
           {password,<<"password">>},
           {port,5222},
           {stream_management,true},
           {stream_id,<<"0629346B1ABA6CDE">>}]},
        10000],
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,138}]},
     {mod_global_distrib_SUITE,
       '-test_pm_with_ungraceful_reconnection_to_different_server/1-fun-0-',
       4,
       [{file,"mod_global_distrib_SUITE.erl"},{line,610}]},
     {escalus_story,story,4,
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
        {line,72}]},
     {tes...

Report log


6258.1 / Erlang 19.3 / small_tests / 54ce9a8
Reports root / small


6258.2 / Erlang 19.3 / internal_mnesia / 54ce9a8
Reports root/ big
OK: 1205 / Failed: 1 / User-skipped: 65 / Auto-skipped: 0

pubsub_SUITE:dag+collection:notify_collection_leaf_and_item_test
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,<<"bOb81.274969@localhost/res1">>,escalus_tcp,
          <0.4318.1>,
          [{event_manager,<0.4255.1>},
           {server,<<"localhost">>},
           {username,<<"bOb81.274969">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.4255.1>},
            {server,<<"localhost">>},
            {username,<<"bOb81.274969">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,<<"bOb81.274969">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,{escalus_auth,auth_plain}},
           {wspath,undefined},
           {username,<<"bOb81.274969">>},
           {server,<<"localhost">>},
           {password,<<"makrolika">>},
           {stream_id,<<"6CA1E0ADC05383E1">>}]},
        5000],
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,138}]},
     {pubsub_tools,receive_notification,3,
       [{file,"pubsub_tools.erl"},{line,480}]},
     {pubsub_tools,receive_node_creation_notification,3,
       [{file,"pubsub_tools.erl"},{line,304}]},
     {pubsub_SUITE,'-notify_collection_leaf_and_item_test/1-fun-0-',2,
       [{file,"pubsub_SUITE.erl"},{line,1266}]},
     {escalus_story,story,4,
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_bui...

Report log


6258.3 / Erlang 19.3 / mysql_redis / 54ce9a8
Reports root/ big
OK: 3076 / Failed: 1 / User-skipped: 232 / Auto-skipped: 0

pubsub_SUITE:dag+collection:discover_child_nodes_test
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,<<"alicE86.623382@localhost/res1">>,escalus_tcp,
          <0.10178.3>,
          [{event_manager,<0.10126.3>},
           {server,<<"localhost">>},
           {username,<<"alicE86.623382">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.10126.3>},
            {server,<<"localhost">>},
            {username,<<"alicE86.623382">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,<<"alicE86.623382">>},
           {server,<<"localhost">>},
           {host,<<"localhost">>},
           {port,5222},
           {auth,{escalus_auth,auth_plain}},
           {wspath,undefined},
           {username,<<"alicE86.623382">>},
           {server,<<"localhost">>},
           {password,<<"matygrysa">>},
           {stream_id,<<"4181867406491172">>}]},
        5000],
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,138}]},
     {pubsub_tools,receive_response,3,
       [{file,"pubsub_tools.erl"},{line,456}]},
     {pubsub_tools,receive_and_check_response,4,
       [{file,"pubsub_tools.erl"},{line,447}]},
     {pubsub_SUITE,'-discover_child_nodes_test/1-fun-0-',2,
       [{file,"pubsub_SUITE.erl"},{line,1493}]},
     {escalus_story,story,4,
       [{file,
          "/home/travis/build/esl/MongooseIM/big_tests/_build/default...

Report log


6258.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 54ce9a8
Reports root/ big
OK: 469 / Failed: 0 / User-skipped: 8 / Auto-skipped: 0


6258.5 / Erlang 19.3 / ldap_mnesia / 54ce9a8
Reports root/ big
OK: 1157 / Failed: 0 / User-skipped: 100 / Auto-skipped: 0


6258.4 / Erlang 19.3 / odbc_mssql_mnesia / 54ce9a8
Reports root/ big
OK: 3065 / Failed: 0 / User-skipped: 230 / Auto-skipped: 0


6258.8 / Erlang 20.0 / pgsql_mnesia / 54ce9a8
Reports root/ big / small
OK: 3097 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6258.9 / Erlang 21.0 / riak_mnesia / 54ce9a8
Reports root/ big / small
OK: 1427 / Failed: 0 / User-skipped: 63 / Auto-skipped: 0

@fenek
Copy link
Member Author

fenek commented Mar 18, 2019

Succeeded on Travis twice without restarts.

Copy link
Contributor

@aleklisi aleklisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good 👍 I like the src/ejabberd_c2s.erl refactor :)

big_tests/tests/connect_SUITE.erl Show resolved Hide resolved
,{group, just_tls}
[
{group, session_replacement},
%% these groups must be last, as they really... complicate configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate?

UserSpec = [{resource, <<"conflict">>} | escalus_users:get_userspec(Config, alice)],
{ok, Alice1, _} = escalus_connection:start(UserSpec),

{ok, Alice2, _} = escalus_connection:start(UserSpec),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename
Alice1 -> Alice and Alice2 -> Bellatrix
Alice1 -> Geralt and Alice2 -> Ciri
or something like that?
;)

Copy link
Member

@erszcz erszcz Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good idea. Different names are used for different users. Different resources of the same user are suffixed, usually by a number. This is the convention used across all big tests.

{ok, Alice2, _} = escalus_connection:start(UserSpec),

% THEN a timeout warning is logged
FilterFun = fun(_, Msg) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting this lambda to named private method because you use it in clean_close_of_replaced_session/1 and replaced_session_cannot_terminate/1 ??


-import(mongoose_helper, [successful_rpc/3]).

-type state() :: #{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this map contains one key only, why not directly use list (value for only key) as a state instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

-type state() :: [{pid(), lager:log_level()}].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less refactoring when you actually need to add one more field, which often happens with state.

doc/Advanced-configuration.md Show resolved Hide resolved
@erszcz erszcz merged commit bc5e861 into master Mar 19, 2019
@erszcz erszcz deleted the fix-down-error branch March 19, 2019 11:37
@erszcz
Copy link
Member

erszcz commented Mar 19, 2019

Awesome!

@fenek fenek added this to the MongooseIM 3.3.0++ milestone Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants