-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inbox/async removes #3616
Inbox/async removes #3616
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3616 +/- ##
==========================================
- Coverage 80.96% 80.92% -0.04%
==========================================
Files 424 426 +2
Lines 32061 32126 +65
==========================================
+ Hits 25958 25998 +40
- Misses 6103 6128 +25
Continue to review full report at Codecov.
|
This comment was marked as outdated.
This comment was marked as outdated.
2bb7f5f
to
91add19
Compare
This comment was marked as outdated.
This comment was marked as outdated.
small_tests_24 / small_tests / b7b8b54 small_tests_23 / small_tests / b7b8b54 dynamic_domains_mysql_redis_24 / mysql_redis / b7b8b54 dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / b7b8b54 dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / b7b8b54 dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / b7b8b54 ldap_mnesia_24 / ldap_mnesia / b7b8b54 ldap_mnesia_23 / ldap_mnesia / b7b8b54 internal_mnesia_24 / internal_mnesia / b7b8b54 elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / b7b8b54 mysql_redis_24 / mysql_redis / b7b8b54 pgsql_mnesia_23 / pgsql_mnesia / b7b8b54 pgsql_mnesia_24 / pgsql_mnesia / b7b8b54 mssql_mnesia_24 / odbc_mssql_mnesia / b7b8b54 riak_mnesia_24 / riak_mnesia / b7b8b54 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good in general. I have a few general remarks:
- There are no docs explaining what
bin
is and e.g. if messages from bin will be returned by default or not. - It is counter-intuitive for me that enabling the async writer implicitly changes the removal semantics in quite a significant way. The user should definitely know that from the docs. It is quite unexpected that e.g. if I switch off the async writer, suddenly my deleted messages stop landing in the bin.
b7b8b54
to
42b6549
Compare
Huh, I think those points are answered precisely in the PR made on top of this one, here: #3619 so perhaps I should push them both into a single PR 🤔 |
This comment was marked as outdated.
This comment was marked as outdated.
e74e605
to
2dedbcb
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed most of the code and it looks fine, but I have some doubts about the tight coupling between the bin and the async writer. IMO the user would benefit from using the bin independently of the async writer. I also think that the REST commands should be put in a separate module and decoupled from the specific backend.
Parallel to how the exml is parsed on backend code, we also encode the exml packet in the backend code, for several reasons: consistency, being able to modify the payload in custom code before is put on the db, and, performance, as, if the entry is aggregated, the payload didn't need to be encoded, and such operation can be skipped.
This fixes the fact that the asynchronous inbox is currently dangerously inconsistent. If on two nodes one writes new messages to a user's conversation with a room, and in another node that user is removed from the room, the order in which those events are flushed into the DB is now not ensured, and the risk of the removal being flushed before the insert, and therefore the insert recovering a conversation forever, is much higher than in the synchronous inbox. With the concept of boxes, the async backend can instead have a deleted box, called bin, so that removals aren't really removals, but just moves to another box, which are atomic and can interleave with modifications to that entry's other columns. Then asynchronously the deleted boxes can be collected, much like email clears the Bin after so many days.
Note that the handle_result callback in the API stack had to be modified. The HTTP standard allows to return a message body for a DELETE call under the status code 200-OK, so we want to allow for such possibility in the code. Two old delete operations needed to be updated though, as they were inconsistently registering they'd return a message that was then ignored.
Note specially the partial index for postgres, which allows us to keep very small indexes for all rows that jump into the bin box. Smaller index means faster performance and faster lookups for the bin box.
7857006
to
b7b6bd3
Compare
This comment was marked as outdated.
This comment was marked as outdated.
b7b6bd3
to
72a01cc
Compare
small_tests_24 / small_tests / 72a01cc small_tests_23 / small_tests / 72a01cc dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 72a01cc dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 72a01cc dynamic_domains_mysql_redis_24 / mysql_redis / 72a01cc muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive{error,{{assertion_failed,assert,is_groupchat_message,
[<<"Restorable message">>],
undefined,"undefined"},
[{escalus_new_assert,assert_true,2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
{line,84}]},
{muc_SUITE,wait_for_mam_result,3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4383}]},
{muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4124}]},
{escalus_story,story,4,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,72}]},
{muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4120}]},
{test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
{test_server,run_test_case_eval1,6,
[{file,"test_server.erl"},{line,1292}]},
{test_server,run_test_case_eval,9,
[{file,"test_server.erl"},{line,1224}]}]}} ldap_mnesia_23 / ldap_mnesia / 72a01cc ldap_mnesia_24 / ldap_mnesia / 72a01cc dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 72a01cc disco_and_caps_SUITE:disco_with_caps:user_can_query_friend_resources{error,{{assertion_failed,assert_many,false,[is_roster_set],[],[]},
[{escalus_new_assert,assert_true,2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
{line,84}]},
{escalus_story,'-make_all_clients_friends/1-fun-0-',2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,108}]},
{escalus_utils,'-each_with_index/3-fun-0-',3,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
{line,87}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
{escalus_utils,'-each_with_index/3-fun-0-',3,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
{line,87}]},
{lists,foldl,3,[{file,"lists.erl"},{line,1267}]},
{escalus_utils,distinct_pairs,2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_utils.erl"},
{line,60}]},
{escalus_story,make_all_clients_friends,1,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,106}]}]}} internal_mnesia_24 / internal_mnesia / 72a01cc pgsql_mnesia_23 / pgsql_mnesia / 72a01cc elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 72a01cc mysql_redis_24 / mysql_redis / 72a01cc mssql_mnesia_24 / odbc_mssql_mnesia / 72a01cc riak_mnesia_24 / riak_mnesia / 72a01cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good 👍
Do you think we might enable using the bin with the synchronous writer in the future? Because I see it is unsupported now.
72a01cc
to
5c359e0
Compare
I think it is enabled already, it is added in the |
small_tests_24 / small_tests / 5c359e0 small_tests_23 / small_tests / 5c359e0 dynamic_domains_pgsql_mnesia_23 / pgsql_mnesia / 5c359e0 dynamic_domains_pgsql_mnesia_24 / pgsql_mnesia / 5c359e0 dynamic_domains_mysql_redis_24 / mysql_redis / 5c359e0 ldap_mnesia_23 / ldap_mnesia / 5c359e0 ldap_mnesia_24 / ldap_mnesia / 5c359e0 internal_mnesia_24 / internal_mnesia / 5c359e0 dynamic_domains_mssql_mnesia_24 / odbc_mssql_mnesia / 5c359e0 pgsql_mnesia_24 / pgsql_mnesia / 5c359e0 pgsql_mnesia_23 / pgsql_mnesia / 5c359e0 elasticsearch_and_cassandra_24 / elasticsearch_and_cassandra_mnesia / 5c359e0 mysql_redis_24 / mysql_redis / 5c359e0 muc_SUITE:hibernation:hibernated_room_can_be_queried_for_archive{error,{{assertion_failed,assert,is_groupchat_message,
[<<"Restorable message">>],
undefined,"undefined"},
[{escalus_new_assert,assert_true,2,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_new_assert.erl"},
{line,84}]},
{muc_SUITE,wait_for_mam_result,3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4383}]},
{muc_SUITE,'-hibernated_room_can_be_queried_for_archive/1-fun-0-',3,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4124}]},
{escalus_story,story,4,
[{file,"/home/circleci/project/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
{line,72}]},
{muc_SUITE,hibernated_room_can_be_queried_for_archive,1,
[{file,"/home/circleci/project/big_tests/tests/muc_SUITE.erl"},
{line,4120}]},
{test_server,ts_tc,3,[{file,"test_server.erl"},{line,1783}]},
{test_server,run_test_case_eval1,6,
[{file,"test_server.erl"},{line,1292}]},
{test_server,run_test_case_eval,9,
[{file,"test_server.erl"},{line,1224}]}]}} mssql_mnesia_24 / odbc_mssql_mnesia / 5c359e0 riak_mnesia_24 / riak_mnesia / 5c359e0 |
Building on top of #3608 and #3611
This PR fixes the fact that the asynchronous inbox is currently dangerously inconsistent. If on two nodes one writes new messages to a user's conversation with a room, and in another node that user is removed from the room, the order in which those events are flushed into the DB is now not ensured, and the risk of the removal being flushed before the insert, and therefore the insert recovering a conversation forever, is much higher than in the synchronous inbox.
With the concept of boxes, the async backend can instead have a deleted box, called
bin
, so that removals aren't really removals, but just moves to another box, which are atomic and can interleave with modifications to that entry's other columns. Then asynchronously the deleted boxes can be collected, much like email clears the Bin after so many days.Later on #3619 builds on top of this one.It also implements three different mechanisms to clean the inbox bin after the asynchronous backend: an XMPP endpoint for the user, two REST endpoints for per-user and global cleanups, and an automatic collector working with timeouts.