Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async rdbms order #3611

Merged
merged 5 commits into from
Mar 30, 2022
Merged

Async rdbms order #3611

merged 5 commits into from
Mar 30, 2022

Conversation

NelsonVides
Copy link
Collaborator

If we have asynchronous inserts for inbox or smart_markers, there's a risk of different nodes aggregating different data, and then inserting it in undefined orders, so an earlier event could be flushed by one node after a latter event from another node. For example in the case of a conversation between Alice and Bob, when Alice and Bob are connected to different nodes on a cluster.

So what we want here, is for the DB to ignore updates, if the timestamp of such update is not strictly higher than the timestamp from the previous update.

For this, we add one more parameter to the preparing of queries, that constructs the appropriate query for Postgres and MySQL. This is currently not implemented for MSSQL.

With this, now the asynchronous backends for inbox and markers are much closer to being production ready, the only thing missing is the remove events, which don't have a timestamp so they won't be reordered correctly with updates. I plan to make removals tagged updates instead, like in the case of inbox, moving them to a hidden box instead.

Note the commit message for f887636 for how the trick works for MySQL.

@codecov
Copy link

codecov bot commented Mar 26, 2022

Codecov Report

Merging #3611 (7db782e) into master (7161179) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3611      +/-   ##
==========================================
- Coverage   80.90%   80.86%   -0.05%     
==========================================
  Files         425      425              
  Lines       32231    32237       +6     
==========================================
- Hits        26077    26068       -9     
- Misses       6154     6169      +15     
Impacted Files Coverage Δ
src/smart_markers/mod_smart_markers_rdbms.erl 94.54% <ø> (ø)
...rc/smart_markers/mod_smart_markers_rdbms_async.erl 85.18% <ø> (-3.71%) ⬇️
src/inbox/mod_inbox_rdbms.erl 93.08% <100.00%> (-0.05%) ⬇️
src/inbox/mod_inbox_rdbms_async.erl 64.40% <100.00%> (-3.39%) ⬇️
src/rdbms/rdbms_queries.erl 91.66% <100.00%> (+2.55%) ⬆️
src/event_pusher/mod_event_pusher.erl 82.35% <0.00%> (-17.65%) ⬇️
...c/global_distrib/mod_global_distrib_server_mgr.erl 77.71% <0.00%> (-3.43%) ⬇️
src/rdbms/mongoose_rdbms.erl 62.54% <0.00%> (-1.10%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7161179...7db782e. Read the comment docs.

@mongoose-im

This comment was marked as outdated.

In the case of mysql, such incremental field is required to be the last
one, so we need to reorder the fields here.

See https://thewebfellas.com/blog/conditional-duplicate-key-updates-with-mysql/

------------------------------------------------------------------------

> Simple conditional updates

Unfortunately there’s a catch: the app can process events out-of-order
meaning updates to the summary table won’t necessarily occur in the
order that the events are created. This means that the last_event_id and
last_event_created_at fields should only be updated if the event is
newer than the last one for the day. In SQL terms it should look like
this:

INSERT INTO daily_events
  (created_on, last_event_id, last_event_created_at)
VALUES
  ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
  last_event_id = VALUES(last_event_id),
  last_event_created_at = VALUES(last_event_created_at)
WHERE last_event_created_at < VALUES(last_event_created_at);

The bad news is that this won’t work as MySQL doesn’t allow WHERE
clauses in the update portion of the query. In a simple case like this
the easiest workaround is to use the IF function:

INSERT INTO daily_events
  (created_on, last_event_id, last_event_created_at)
VALUES
  ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
  last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id),
  last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at);

This works by checking if the last_event_created_at timestamp of the
event being updated is newer than the current timestamp, if it is then
the new value is assigned to the field in the update, otherwise the
current value is used.

An important thing to keep in mind when using this approach is that the
order in which you update your fields is very important. I was wrongly
under the impression that the updates took place in one mass-assignment
after the entire query had been interpreted by MySQL. But they’re not:
the assignments happen in the order they appear in the query. To give
you an example, this query won’t produce the expected result:

INSERT INTO daily_events
  (created_on, last_event_id, last_event_created_at)
VALUES
  ('2010-01-19', 23, '2010-01-19 10:23:11')
ON DUPLICATE KEY UPDATE
  last_event_created_at = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_created_at), last_event_created_at),
  last_event_id = IF(last_event_created_at < VALUES(last_event_created_at), VALUES(last_event_id), last_event_id);

When the update is executed with a more recent event, the
last_event_created_at field will be updated, but the last_event_id field
won’t. This is because when the second IF is evaluated
last_event_created_at has already been updated so that
last_event_created_at is equal to VALUES(last_event_created_at). Crazy
huh?!

------------------------------------------------------------------------
@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 26, 2022

small_tests_24 / small_tests / 32a7f94
Reports root / small


small_tests_23 / small_tests / 32a7f94
Reports root / small


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 32a7f94
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 32a7f94
Reports root/ big
OK: 2833 / Failed: 0 / User-skipped: 150 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 32a7f94
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 32a7f94
Reports root/ big
OK: 2850 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


ldap_mnesia_24 / ldap_mnesia / 32a7f94
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 399 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / 32a7f94
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 399 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / 32a7f94
Reports root/ big
OK: 1548 / Failed: 0 / User-skipped: 358 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 32a7f94
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / 32a7f94
Reports root/ big
OK: 3219 / Failed: 0 / User-skipped: 147 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 32a7f94
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 32a7f94
Reports root/ big
OK: 1855 / Failed: 0 / User-skipped: 366 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / 32a7f94
Reports root/ big
OK: 3224 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 32a7f94
Reports root/ big
OK: 1698 / Failed: 0 / User-skipped: 365 / Auto-skipped: 0

@NelsonVides NelsonVides mentioned this pull request Mar 29, 2022
Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, I only have some general notes:

  • How do we check that it indeed works as intended? Maybe add a test in rdbms_SUITE?
  • Of course it would be good to add it for MS SQL. Do you know if this is even possible?

SQL = upsert_query(Host, Table, InsertFields, Updates, UniqueKeyFields),
prepare_upsert(Host, Name, Table, InsertFields, Updates, UniqueKeyFields, none).

-spec prepare_upsert(Host :: mongoose_rdbms:server(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: I think we should make this a map of arguments, especially that some of them are optional.

@NelsonVides
Copy link
Collaborator Author

* How do we check that it indeed works as intended? Maybe add a test in `rdbms_SUITE`?

Test added 👌🏽

* Of course it would be good to add it for MS SQL. Do you know if this is even possible?

Probably, but honestly I have no idea how, and I'd like to unblock inbox now. Perhaps we can leave the mssql case as a ticket for the future? Perhaps to when a client actually requests it?

Just a note: I think we should make this a map of arguments, especially that some of them are optional.

May be, but there's plenty of modules using this function and I didn't want to introduce a big diff (albein mechanical) in this PR, dunno, but could do 🤔

@mongoose-im
Copy link
Collaborator

mongoose-im commented Mar 30, 2022

small_tests_24 / small_tests / 7db782e
Reports root / small


small_tests_23 / small_tests / 7db782e
Reports root / small


dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 7db782e
Reports root/ big
OK: 2851 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_mysql_redis_24 / mysql_redis / 7db782e
Reports root/ big
OK: 2846 / Failed: 1 / User-skipped: 150 / Auto-skipped: 0

muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive
{error,{{assertion_failed,assert,is_groupchat_message,
              [<<"Restorable message">>],
              undefined,"undefined"},
    [{escalus_new_assert,assert_true,2,
               [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
                {line,84}]},
     {muc_SUITE,wait_for_mam_result,3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4383}]},
     {muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4124}]},
     {escalus_story,story,4,
            [{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
          [{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
           {line,4120}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1292}]},
     {test_server,run_test_case_eval,9,
            [{file,"test_server.erl"},{line,1224}]}]}}

Report log


ldap_mnesia_24 / ldap_mnesia / 7db782e
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 400 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 7db782e
Reports root/ big
OK: 2851 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 7db782e
Reports root/ big
OK: 2851 / Failed: 0 / User-skipped: 133 / Auto-skipped: 0


internal_mnesia_24 / internal_mnesia / 7db782e
Reports root/ big
OK: 1548 / Failed: 0 / User-skipped: 359 / Auto-skipped: 0


ldap_mnesia_23 / ldap_mnesia / 7db782e
Reports root/ big
OK: 1507 / Failed: 0 / User-skipped: 400 / Auto-skipped: 0


mysql_redis_24 / mysql_redis / 7db782e
Reports root/ big
OK: 3220 / Failed: 0 / User-skipped: 147 / Auto-skipped: 0


pgsql_mnesia_24 / pgsql_mnesia / 7db782e
Reports root/ big
OK: 3225 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 7db782e
Reports root/ big
OK: 1855 / Failed: 0 / User-skipped: 367 / Auto-skipped: 0


pgsql_mnesia_23 / pgsql_mnesia / 7db782e
Reports root/ big
OK: 3225 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


mssql_mnesia_24 / odbc_mssql_mnesia / 7db782e
Reports root/ big
OK: 3225 / Failed: 0 / User-skipped: 142 / Auto-skipped: 0


riak_mnesia_24 / riak_mnesia / 7db782e
Reports root/ big
OK: 1698 / Failed: 0 / User-skipped: 366 / Auto-skipped: 0

@chrzaszcz
Copy link
Member

chrzaszcz commented Mar 30, 2022

* How do we check that it indeed works as intended? Maybe add a test in `rdbms_SUITE`?

Test added 👌🏽

Nice!

* Of course it would be good to add it for MS SQL. Do you know if this is even possible?

Probably, but honestly I have no idea how, and I'd like to unblock inbox now. Perhaps we can leave the mssql case as a ticket for the future? Perhaps to when a client actually requests it?

Ok, let's do it like this

Just a note: I think we should make this a map of arguments, especially that some of them are optional.

May be, but there's plenty of modules using this function and I didn't want to introduce a big diff (albein mechanical) in this PR, dunno, but could do 🤔

Yes, let's do it separately.

Copy link
Member

@chrzaszcz chrzaszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@chrzaszcz chrzaszcz merged commit 66a1652 into master Mar 30, 2022
@chrzaszcz chrzaszcz deleted the async_rdbms_order branch March 30, 2022 11:28
@Premwoik Premwoik added this to the 5.1.0 milestone May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants