Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async pools #3433

Merged
merged 9 commits into from
Dec 17, 2021
Merged

Async pools #3433

merged 9 commits into from
Dec 17, 2021

Conversation

NelsonVides
Copy link
Collaborator

@NelsonVides NelsonVides commented Dec 6, 2021

This PR learns from the MAM async workers, and tries to separate two different concepts into two different abstractions: one, is the concept of a pool of workers that accumulate tasks, and act on the tasks every time a batch size is reached, or a timeout since the first submitted task is triggered; the second, is how MAM acts on the batches.

I attempt here to implement the first concept in a very generic way, using the new modules mongoose_batch_worker and mongoose_async_pools, as an abstraction that takes the batch size, the flush timeout, and a callback to be run on the accumulated queues. This is then used by a new mam async module, but the goal is, that it can be used as well in for example a future inbox async module, or any other extension that would benefit from batching (I know of quite a few myself!).

Note that in the case of inbox, the batches would accumulate inbox changes for a user and then actually only submit the last one to the DB, as, note, unlike mam being an append-only operation, inbox is an update-only operation, so applying all inbox changes sequentially is equivalent to applying the last one.

NOTE: please do read the commit messages, as they contain really detailed information on the code submitted on each of them.

@mongoose-im

This comment has been minimized.

@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

Merging #3433 (6479445) into master (38fcb5e) will increase coverage by 0.04%.
The diff coverage is 81.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3433      +/-   ##
==========================================
+ Coverage   80.78%   80.83%   +0.04%     
==========================================
  Files         415      415              
  Lines       32306    32266      -40     
==========================================
- Hits        26100    26083      -17     
+ Misses       6206     6183      -23     
Impacted Files Coverage Δ
src/ejabberd_sup.erl 85.71% <ø> (-0.50%) ⬇️
src/mam/mod_mam_muc.erl 74.01% <ø> (ø)
src/mongoose_batch_worker.erl 56.81% <56.81%> (ø)
src/mam/mod_mam_rdbms_arch_async.erl 86.90% <86.90%> (ø)
src/mongoose_async_pools.erl 96.87% <96.87%> (ø)
src/mam/mod_mam_meta.erl 93.47% <100.00%> (+1.08%) ⬆️
src/domain/mongoose_domain_loader.erl 89.28% <0.00%> (-0.90%) ⬇️
src/mod_muc_room.erl 76.97% <0.00%> (-0.29%) ⬇️
src/pubsub/mod_pubsub.erl 73.13% <0.00%> (-0.06%) ⬇️
src/pubsub/mod_pubsub_db_mnesia.erl 92.85% <0.00%> (+0.42%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38fcb5e...6479445. Read the comment docs.

@mongoose-im

This comment has been minimized.

@NelsonVides NelsonVides marked this pull request as ready for review December 7, 2021 12:17
@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good in general, my main concern is the low test coverage, the workers are tested only for storing messages one by one. A load test of this PR for the PM and MUC scenarios with MAM enabled should be enough here.

big_tests/tests/mam_SUITE.erl Outdated Show resolved Hide resolved
doc/migrations/5.0.0_x.md Outdated Show resolved Hide resolved
doc/modules/mod_mam.md Outdated Show resolved Hide resolved
src/mongoose_async_pools.erl Outdated Show resolved Hide resolved
Copy link
Contributor

@arcusfelis arcusfelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty nice!

src/mongoose_async_pools.erl Outdated Show resolved Hide resolved
src/mam/mod_mam_rdbms_arch_async.erl Outdated Show resolved Hide resolved
src/mam/mod_mam_rdbms_arch_async.erl Outdated Show resolved Hide resolved
InsertResult =
case MessageCount of
MaxSize ->
mongoose_rdbms:execute(HostType, insert_mam_messages, lists:append(Rows));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line could be covered with batch_size = 1

case InsertResult of
{updated, _Count} -> ok;
{error, Reason} ->
mongoose_metrics:update(HostType, modMamDropped, MessageCount),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need a drop test to test that the metric is incremented. Maybe not in this PR

src/mam/mod_mam_rdbms_arch_async.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
flush_queue_len => length(State#state.flush_queue)}),
NewState = do_run_flush(State#state{flush_queue = lists:reverse(Queue),
flush_interval_tref = undefined}),
erlang:garbage_collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if that would collect Queue (something says to me it will still keep Queue, because the caller is still referencing it).
There is another way to collect for gen_server: is to return {noreply,NewState,hibernate} after a flush, could be more efficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hibernate more efficient, not at all, because it will not only GC the heap, but also collapse the whole process memory to a PCB, which will then require a new heap when the process is woken up, which will be a very small heap because hibernation imposes a shrink as well, that will very likely need to be grown once the queue starts filling... no way.

But I'll make sure the queue is GC'ed here, hold on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right, it wasn't collected. But got how to, see ee83696 😁

@NelsonVides NelsonVides force-pushed the async_pools branch 2 times, most recently from 617c016 to ee83696 Compare December 8, 2021 17:46
@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, only the load test concerns remain.

@mongoose-im

This comment has been minimized.

src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mam/mod_mam_rdbms_arch_async.erl Outdated Show resolved Hide resolved

% Don't leak the tasks to logs, can contain private information
format_status(_Opt, [_PDict, State | _]) ->
[{data, [{"State", State#state{flush_queue = []}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush_queue = censored ;)
actually, setting it to a valid value here (i.e. []) could be pretty confusing, when people try to call for status.

could be = hidden or sanitized

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

censored returns a dialyzer error here, but can add a type for that in the typedef

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sad :(

Actually, just actually, why not to convert State into a map in format status? :)
It is much easier to read ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a super canonical example for format_status was actually converting state into a proplist :)
(it is what I think about when see that gen-server callback).

src/mam/mod_mam_rdbms_arch_async.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
src/mongoose_batch_worker.erl Outdated Show resolved Hide resolved
@mongoose-im

This comment has been minimized.

Copy link
Contributor

@arcusfelis arcusfelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant. Merging after a load test done by Nelson :)

@mongoose-im

This comment has been minimized.

This is a gen_server, whose purpose is to be part of a worker pool. The
server accumulates tasks to take action to, until the number of tasks
reaches a configured threshold, or, a timeout is fired since the first
task that was requested in this batch (this is to ensure that no queue
awaits too long for action).

Note that we don't trigger timeouts unless the queue is not empty, to
avoid waking up an unused worker.

The worker also takes a callback, and a map with any metadata the
callback may need. When is time to flush a queue, it is this callback
that will be ran, taking the queue in the order in which it has been
submitted, and the optionally provided metadata.

Most of the code for this worker is learnt from MAM's async workers,
with some changes or learnt lessons:
* Note that this worker sets info logs on start and terminate.
* There is also a debug log on queue flushes.
* The terminate callback flushes the task queue if there's any task
  pending, to try to avoid losing tasks inadvertently.
* The format_status callback prevents printing the actual tasks on
  crashes or system state lookups, as the tasks might contain data
  private to clients under any GDPR-like legal constrains.
* Garbage collection is triggered after every flush, first to ensure
  saving memory, but second, to ensure the release of any binaries this
  worker might be holding.
This module implements a worker pool abstraction to handle batch
workers. It will start a transient supervisor under ejabberd_sup, that
will spawn a pool using the `worker_pool` library. This will then
supervise the three supervisors that `worker_pool` supervises
(time-manager, queue-manager, and worker-sup). One of these,
'worker-sup', will in turn spawn all the mongoose_batch_worker workers
with the correct configuration.
First of all, remove all the old modules implementing the mam rdbms
async pools.

Then, create a new module called mod_mam_rdbms_arch_async, that abstract
what the previous two where doing, by using the modules created in the
two previous commits. This module starts the pools with all the correct
parameters, and registers the hooks that will then call `worker_pool`
using the `hash_worker` strategy, to ensure MAM entries to a single
archive are all processed in parallel.

Note a few lessons here:
* Selecting the worker isn't sound based on the `rem` operator, because
  the distribution of archives over workers is not at all ensured to be
  uniform. `worker_pool` uses `erlang:phash2/2`, which ensures such
  uniform distribution.
* The callbacks for the batch workers abstract all the metrics.
* mod_mam_meta parsing has breaking changes: this is what was reported
  as broken in the previous implementation. Now there's an entire toml
  section called `async_writer`, that can be enabled, disabled and
  configured both globally for mam, or in a fine-grained manner within
  the `pm` and `muc` sections.
The purpose of this is for the supervisor to be able to dynamically
generate data that will be passed to the workers, an example being, the
supervisor might create ets tables. We keep this in the context of the
supervisor and its workers, to keep the abstraction isolated to this
supervision tree: an ets table may not make any sense outside of this
supervision tree, we might want the ets table not to even be named, so
that nobody outside of the tree will have access to it, only the workers
will get an opaque reference. We will also want the ets tables to be
cleaned when the supervision tree dies, as the data might not make sense
on a newly restarted tree anymore, or we will want the tree to die fast
if the ets table is lost. This init callback can also set global
configuration flags, or send messages, or log something, or define new
metrics, or many other things that for example the current MongooseIM
init callbacks for `mongoose_wpool` are already doing.
The whole operation to build lists and append them and generate an atom
in the end is quite expensive: not only it generates a lot of lists and
copies of lists that later need to be garbage collected, but also, the
`list_to_atom` operation is a concurrent bottleneck: the atom table is
not optimised for inserts, only for reads, this one needs to grab locks
and check if the atom already existed and so on. Instead, store the name
on a persistent_term record, and simply fetch that when needed, making
the hotter part of the code, finding the pool to submit a task to,
faster.
Note that running garbage collection within a function will not collect
the references this function has, so the queue will actually not be
cleared after being flushed as it was expected. Instead, we use the
option `async` for the GC. This will make this process end its
reductions, and once it has been preempted, it will be scheduled for
garbage collection, and thereafter, a message will be delivered
notifying him of so.
@mongoose-im
Copy link
Collaborator

mongoose-im commented Dec 17, 2021

small_tests_24 / small_tests / 6479445
Reports root / small


small_tests_23 / small_tests / 6479445
Reports root / small


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 6479445
Reports root/ big
OK: 2725 / Failed: 0 / User-skipped: 186 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 6479445
Reports root/ big
OK: 2725 / Failed: 0 / User-skipped: 186 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 6479445
Reports root/ big
OK: 2708 / Failed: 0 / User-skipped: 203 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 6479445
Reports root/ big
OK: 2724 / Failed: 1 / User-skipped: 186 / Auto-skipped: 0

mam_SUITE:rdbms_async_cache_mam_all:mam06:retract_message
{error,{{archive_size,1,[{times,200,0}]},
    [{mongoose_helper,do_wait_until,2,
              [{file,"/home/circleci/project/big_tests/tests/mongoose_helper.erl"},
               {line,377}]},
     {mam_SUITE,'-retract_message/1-fun-0-',4,
          [{file,"/home/circleci/project/big_tests/tests/mam_SUITE.erl"},
           {line,1927}]},
     {escalus_story,story,4,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1224}]}]}}

Report log


ldap_mnesia_24 / ldap_mnesia / 6479445
Reports root/ big
OK: 1500 / Failed: 0 / User-skipped: 386 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / 6479445
Reports root/ big
OK: 1500 / Failed: 0 / User-skipped: 386 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / 6479445
Reports root/ big
OK: 1585 / Failed: 0 / User-skipped: 301 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 6479445
Reports root/ big
OK: 1878 / Failed: 0 / User-skipped: 313 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 6479445
Reports root/ big
OK: 3112 / Failed: 0 / User-skipped: 195 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 6479445
Reports root/ big
OK: 3112 / Failed: 0 / User-skipped: 195 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / 6479445
Reports root/ big
OK: 3107 / Failed: 0 / User-skipped: 200 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 6479445
Reports root/ big
OK: 1724 / Failed: 0 / User-skipped: 314 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 6479445
Reports root/ big
OK: 2722 / Failed: 3 / User-skipped: 186 / Auto-skipped: 0

service_domain_db_SUITE:db:db_keeps_syncing_after_cluster_join
{error,{test_case_failed,{[<<"example1.com">>],
              [<<"example1.com">>,<<"example2.com">>]}}}

Report log

service_domain_db_SUITE:db:rest_with_auth:rest_delete_domain_cleans_data_from_mam
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,
          <<"bob_rest_delete_domain_cleans_data_from_mam_1935@example.org/res1">>,
          escalus_tcp,<0.29327.1>,
          [{event_manager,<0.29321.1>},
           {server,<<"example.org">>},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1935">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.29321.1>},
            {server,<<"example.org">>},
            {username,
              <<"bob_rest_delete_domain_cleans_data_from_mam_1935">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1935">>},
           {server,<<"example.org">>},
           {host,<<"localhost">>},
           {port,5232},
           {auth,{escalus_auth,auth_plain}},
           {wspath,undefined},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1935">>},
           {server,<<"example.org">>},
           {host,<<"localhost">>},
           {password,<<"makota3">>},
           {port,5232},
           {stream_id,<<"7a54345859a684a1">>}]},
        5000],
       [{file,
          "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,136}]},
     {service_domain_db_SUITE,
       '-rest_delete_domain_cleans_data_from_mam/1-fun-0-',5,
       [{file,
    ...

Report log

service_domain_db_SUITE:db:rest_without_auth:rest_delete_domain_cleans_data_from_mam
{error,
  {timeout_when_waiting_for_stanza,
    [{escalus_client,wait_for_stanza,
       [{client,
          <<"bob_rest_delete_domain_cleans_data_from_mam_1936@example.org/res1">>,
          escalus_tcp,<0.29923.1>,
          [{event_manager,<0.29917.1>},
           {server,<<"example.org">>},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1936">>},
           {resource,<<"res1">>}],
          [{event_client,
             [{event_manager,<0.29917.1>},
            {server,<<"example.org">>},
            {username,
              <<"bob_rest_delete_domain_cleans_data_from_mam_1936">>},
            {resource,<<"res1">>}]},
           {resource,<<"res1">>},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1936">>},
           {server,<<"example.org">>},
           {host,<<"localhost">>},
           {port,5232},
           {auth,{escalus_auth,auth_plain}},
           {wspath,undefined},
           {username,
             <<"bob_rest_delete_domain_cleans_data_from_mam_1936">>},
           {server,<<"example.org">>},
           {host,<<"localhost">>},
           {password,<<"makota3">>},
           {port,5232},
           {stream_id,<<"bc16d79a34247558">>}]},
        5000],
       [{file,
          "/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_client.erl"},
        {line,136}]},
     {service_domain_db_SUITE,
       '-rest_delete_domain_cleans_data_from_mam/1-fun-0-',5,
       [{file,
    ...

Report log


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 6479445
Reports root/ big
OK: 2739 / Failed: 3 / User-skipped: 186 / Auto-skipped: 0

mam_SUITE:rdbms_async_cache_mam_all:mam06:retract_wrong_message
{error,{{archive_size,1,[{times,200,0}]},
    [{mongoose_helper,do_wait_until,2,
              [{file,"/home/circleci/project/big_tests/tests/mongoose_helper.erl"},
               {line,377}]},
     {mam_SUITE,'-retract_message/1-fun-0-',4,
          [{file,"/home/circleci/project/big_tests/tests/mam_SUITE.erl"},
           {line,1927}]},
     {escalus_story,story,4,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1224}]}]}}

Report log

mam_SUITE:rdbms_async_cache_mam_all:mam06:querying_for_all_messages_with_jid
{error,{test_case_failed,"Respond size is 7, 17 is expected."}}

Report log

vcard_SUITE:ro_full:retrieve_own_card
{error,{test_case_failed,"Expected <<\"alice\">> got undefined\n"}}

Report log


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 6479445
Reports root/ big
OK: 2725 / Failed: 0 / User-skipped: 186 / Auto-skipped: 0

@arcusfelis arcusfelis merged commit 1e94f3f into master Dec 17, 2021
@arcusfelis arcusfelis deleted the async_pools branch December 17, 2021 19:13
@Premwoik Premwoik modified the milestone: 5.1.0 May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants