Performance and metrics touches on async workers #3502

NelsonVides · 2022-01-14T17:56:20Z

Right now, when using send_after, we're sending a single atom to the
process, which means, that we then don't know how to match the flush
request when it arrives, and therefore we need to act on all of them, or
explore the entire message queue to drop them when we do the selective
receive in cancel_and_flush_timer.

But that selective receive can potentially be expensive, when the batch
worker is really loaded and its queues are growing faster than it can
process, the code becomes really critical. Even more so when these
workers are enabled with message_queue_data=off_heap, which means that a
selective receive will go out of its heap, screwing cache locality, and
take all those heap fragments into its heap, which is all more
expensive.

To solve this, we can use start_timer instead of send_after, which will
then send a message that contains the TimerRef we get, so we know then
exactly which timeout is the active one, where we to receive any. This
way, cancel_timer doesn't need to worry about whether the timer actually
arrived or not: if it didn't, it's canceled, if it did, we won't attend
to it next time, it is simply ignored when the mailbox view arrives to
it.

One more tiny optimisation: we run cancel_timer asynchronously:
https://www.erlang.org/doc/man/erlang.html#cancel_timer-2

The timer service that manages the timer can be co-located with
another scheduler than the scheduler that the calling process is
executing on. If so, communication with the timer service takes much
longer time than if it is located locally. If the calling process is in
critical path, and can do other things while waiting for the result of
this operation, or is not interested in the result of the operation, you
want to use option {async, true}. If using option {async, false}, the
calling process blocks until the operation has been performed.

There's other code, see commits for details.

Right now, when using send_after, we're sending a single atom to the process, which means, that we then don't know how to match the flush request when it arrives, and therefore we need to act on all of them, or explore the entire message queue to drop them when we do the selective receive in `cancel_and_flush_timer`. But that selective receive can potentially be expensive, when the batch worker is really loaded and its queues are growing faster than it can process, the code becomes really critical. Even more so when these workers are enabled with message_queue_data=off_heap, which means that a selective receive will go out of its heap, screwing cache locality, and take all those heap fragments into its heap, which is all more expensive. To solve this, we can use start_timer instead of send_after, which will then send a message that contains the TimerRef we get, so we know then exactly which timeout is the active one, where we to receive any. This way, cancel_timer doesn't need to worry about whether the timer actually arrived or not: if it didn't, it's canceled, if it did, we won't attend to it next time, it is simply ignored when the mailbox view arrives to it. One more tiny optimisation: we run cancel_timer asynchronously: https://www.erlang.org/doc/man/erlang.html#cancel_timer-2 > The timer service that manages the timer can be co-located with another scheduler than the scheduler that the calling process is executing on. If so, communication with the timer service takes much longer time than if it is located locally. If the calling process is in critical path, and can do other things while waiting for the result of this operation, or is not interested in the result of the operation, you want to use option {async, true}. If using option {async, false}, the calling process blocks until the operation has been performed.

Watch timed_flushes metric - if it is high, it means that you don't have enough traffic to complete batches within the flush_interval period. In effect, you are writing your data record by record, only a bit later. Apparently you have either too many workers, too big batch_size, or too low flush_interval.

codecov · 2022-01-14T18:06:09Z

Codecov Report

Merging #3502 (4572526) into master (1f73b71) will decrease coverage by 0.00%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##           master    #3502      +/-   ##
==========================================
- Coverage   81.00%   80.99%   -0.01%     
==========================================
  Files         418      418              
  Lines       32333    32329       -4     
==========================================
- Hits        26190    26184       -6     
- Misses       6143     6145       +2

Impacted Files	Coverage Δ
src/mam/mod_mam_rdbms_arch_async.erl	`97.77% <ø> (+0.96%)`	⬆️
src/mongoose_batch_worker.erl	`62.79% <75.00%> (+1.92%)`	⬆️
src/mongoose_async_pools.erl	`95.74% <100.00%> (+0.29%)`	⬆️
src/elasticsearch/mongoose_elasticsearch.erl	`76.92% <0.00%> (-7.70%)`	⬇️
src/http_upload/mod_http_upload.erl	`94.62% <0.00%> (-2.16%)`	⬇️
...bal_distrib/mod_global_distrib_hosts_refresher.erl	`71.69% <0.00%> (-1.89%)`	⬇️
src/mam/mod_mam_elasticsearch_arch.erl	`85.08% <0.00%> (-1.76%)`	⬇️
src/logger/mongoose_log_filter.erl	`78.08% <0.00%> (-1.37%)`	⬇️
src/mod_muc_room.erl	`76.80% <0.00%> (-0.47%)`	⬇️
src/mod_muc_log.erl	`78.11% <0.00%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f73b71...4572526. Read the comment docs.

…a key Adding this queue length hopefully helps with micro-performance, as this is data already calculated, and the handlers will very likely use this information as well, so just keeping it will save some cycles calculating it when the handler has to calculate it again, which can be important when buffers are actually big.

mongoose-im · 2022-01-14T18:15:52Z

small_tests_24 / small_tests / d6d522f
Reports root / small

small_tests_23 / small_tests / d6d522f
Reports root / small

dynamic_domains_mysql_redis_24 / mysql_redis / d6d522f
Reports root/ big
OK: 2673 / Failed: 0 / User-skipped: 248 / Auto-skipped: 0

dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / d6d522f
Reports root/ big
OK: 2690 / Failed: 0 / User-skipped: 231 / Auto-skipped: 0

dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / d6d522f
Reports root/ big
OK: 2690 / Failed: 0 / User-skipped: 231 / Auto-skipped: 0

dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / d6d522f
Reports root/ big
OK: 2690 / Failed: 0 / User-skipped: 231 / Auto-skipped: 0

ldap_mnesia_23 / ldap_mnesia / d6d522f
Reports root/ big
OK: 1500 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0

ldap_mnesia_24 / ldap_mnesia / d6d522f
Reports root/ big
OK: 1500 / Failed: 0 / User-skipped: 390 / Auto-skipped: 0

internal_mnesia_24 / internal_mnesia / d6d522f
Reports root/ big
OK: 1541 / Failed: 0 / User-skipped: 349 / Auto-skipped: 0

pgsql_mnesia_24 / pgsql_mnesia / d6d522f
Reports root/ big
OK: 3077 / Failed: 0 / User-skipped: 240 / Auto-skipped: 0

elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / d6d522f
Reports root/ big
OK: 1834 / Failed: 0 / User-skipped: 361 / Auto-skipped: 0

mysql_redis_24 / mysql_redis / d6d522f
Reports root/ big
OK: 3072 / Failed: 0 / User-skipped: 245 / Auto-skipped: 0

pgsql_mnesia_23 / pgsql_mnesia / d6d522f
Reports root/ big
OK: 3077 / Failed: 0 / User-skipped: 240 / Auto-skipped: 0

riak_mnesia_24 / riak_mnesia / d6d522f
Reports root/ big
OK: 1680 / Failed: 0 / User-skipped: 362 / Auto-skipped: 0