Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gdpr retrieve data from mod_offline #2289

Merged
merged 17 commits into from
May 13, 2019
Merged

Conversation

ludwikbukowski
Copy link
Contributor

@ludwikbukowski ludwikbukowski commented Apr 29, 2019

This PR addresses https://erlangsolutions.atlassian.net/browse/MIM-306

TODO:

  • support for mnesia
  • support for rdbms
  • support for riak

Notes:

  • Timestamp is in format of utc datetime, eg:
    2019-04-29T11:03:21Z

@ludwikbukowski ludwikbukowski changed the base branch from master to gdpr-retrieve-clean April 29, 2019 12:03
@mongoose-im

This comment has been minimized.

@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #2289 into gdpr-retrieve-clean will increase coverage by <.01%.
The diff coverage is 89.13%.

Impacted file tree graph

@@                   Coverage Diff                   @@
##           gdpr-retrieve-clean    #2289      +/-   ##
=======================================================
+ Coverage                78.93%   78.94%   +<.01%     
=======================================================
  Files                      334      334              
  Lines                    29110    29023      -87     
=======================================================
- Hits                     22977    22911      -66     
+ Misses                    6133     6112      -21
Impacted Files Coverage Δ
src/rdbms/rdbms_queries.erl 83.25% <100%> (+0.08%) ⬆️
src/mod_offline_mnesia.erl 89.39% <85.71%> (-0.44%) ⬇️
src/mod_offline_riak.erl 94.44% <87.5%> (-1.99%) ⬇️
src/mod_offline_rdbms.erl 85.93% <90.9%> (+1.03%) ⬆️
src/mod_offline.erl 78.3% <90.9%> (+0.77%) ⬆️
src/global_distrib/mod_global_distrib_receiver.erl 85.52% <0%> (-3.95%) ⬇️
src/mam/mod_mam_rdbms_prefs.erl 92.52% <0%> (-3.74%) ⬇️
src/rdbms/mongoose_rdbms.erl 69.38% <0%> (-2.56%) ⬇️
src/mod_bosh.erl 92.85% <0%> (-2.15%) ⬇️
...c/global_distrib/mod_global_distrib_server_mgr.erl 83.09% <0%> (-2.12%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec387d4...6861f96. Read the comment docs.

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@mongoose-im

This comment has been minimized.

@@ -57,7 +57,10 @@ retrieve_all(Username, Domain, ResultFilePath) ->
-spec modules_with_personal_data() -> [module()].
modules_with_personal_data() ->
[
mod_vcard
mod_vcard,
mod_offline_mnesia,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should call mod_offline, not the backends directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense

NowUniversal = calendar:now_to_universal_time(Timestamp),
{UTCTime, UTCDiff} = jlib:timestamp_to_iso(NowUniversal, utc),
UTC = list_to_binary(UTCTime ++ UTCDiff),
{UTC, jid:to_binary(jid:to_bare(From)), jid:to_binary(jid:to_bare(To)), exml:to_binary(Packet)}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we converting these to bare JIDs?

Copy link
Contributor Author

@ludwikbukowski ludwikbukowski May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I used bare jids because there was assertion in test for bare jid so I assumed that it is the requirement
Do we want to return Full jids ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, no need to convert them to bare. The test case was just a stub, written without examining what is exactly stored in the DB.

{ok, Obj} = mongoose_riak:get(bucket_type(LServer), Key),

PacketRaw = riakc_obj:get_value(Obj),
{ok, Packet} = exml:parse(PacketRaw),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we parse it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, doesnt make sense as im parsing it back again. will remove :)

NowUniversal = calendar:now_to_universal_time(usec:to_now(Timestamp)),
{UTCTime, UTCDiff} = jlib:timestamp_to_iso(NowUniversal, utc),
UTC = list_to_binary(UTCTime ++ UTCDiff),
{UTC, jid:to_binary(jid:binary_to_bare(From)), To, exml:to_binary(Packet)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Username is provided as To, not JID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is fine, just again the variable name is confusing

NowUniversal = calendar:now_to_universal_time(Timestamp),
{UTCTime, UTCDiff} = jlib:timestamp_to_iso(NowUniversal, utc),
UTC = list_to_binary(UTCTime ++ UTCDiff),
[UTC, jid:to_binary(jid:binary_to_bare(SFrom)), User, SPacket].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JID is expected as third element, not username.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ludwikbukowski ludwikbukowski force-pushed the gdpr-retrieve-offline branch 2 times, most recently from 4b04179 to 7911c0e Compare May 6, 2019 16:43
@mongoose-im

This comment has been minimized.

Copy link
Member

@fenek fenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the common part for all backends must be extracted to mod_offline. :)

@@ -47,14 +47,15 @@ groups() ->
[
{retrieve_personal_data, [parallel], [
% per type
retrieve_vcard,
%% retrieve_vcard,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be reverted (after rebase) now as the base is fixed.

NowUniversal = calendar:now_to_universal_time(Timestamp),
{UTCTime, UTCDiff} = jlib:timestamp_to_iso(NowUniversal, utc),
UTC = list_to_binary(UTCTime ++ UTCDiff),
{UTC, jid:to_binary(jid:to_bare(From)), jid:to_binary(jid:to_bare(To)), exml:to_binary(Packet)}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, no need to convert them to bare. The test case was just a stub, written without examining what is exactly stored in the DB.

Copy link
Contributor

@aleklisi aleklisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good, I would consider extracting changes in common base for other tests to separate PR.

ExpectedHeader = ["timestamp", "from", "to", "packet"],
ExpectedItems = [
#{ "packet" => [{contains, Body}], "from" => BobJid }
#{ "packet" => [{contains, Body1}],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way of generating those maps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more readable to have:

Expected = [
{Body1, BobJid, AliceJid},
...],
ExpectedItems = lists:map(
    fun({Body, From ,To) -> 
         #{ "packet" => [{contains, Body}],
            "from" => binary_to_list(From),
            "to" => binary_to_list(To),
            "timestamp" => [{validate, fun validate_datetime/1}]}
    end, Expected)

??

@@ -304,19 +322,25 @@ csv_to_maps(ExpectedHeader, [ExpectedHeader | Rows]) ->
csv_row_to_map(Header, Row) ->
maps:from_list(lists:zip(Header, Row)).

validate_personal_maps(_, []) -> ok;
validate_personal_maps([Map | RMaps], [Checks | RChecks]) ->
validate_personal_maps(PersonalMaps, ExpectedItems) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change affects all other PRs, isn't it going to break all other testcases?

Copy link
Contributor Author

@ludwikbukowski ludwikbukowski May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually its fixing them, as the order of incoming stanzas can result in bad validations

@mongoose-im

This comment has been minimized.

%retrieve_private_xml,
%retrieve_inbox,
retrieve_logs
%% retrieve_logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be done when rebasing on the base with "dynamic" PR merged.

@mongoose-im

This comment has been minimized.

Copy link
Member

@fenek fenek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. retrieve_offline test case is not enabled!
  2. mod_offline doesn't query all backends
  3. retrieve_offline test is not added to the "with disabled" group

@esl esl deleted a comment from mongoose-im May 10, 2019
@ludwikbukowski
Copy link
Contributor Author

Oh sorry, something went wrong after rabase, fixing

@mongoose-im

This comment has been minimized.

@mongoose-im
Copy link
Collaborator

mongoose-im commented May 10, 2019

6465.1 / Erlang 19.3 / small_tests / 70b598f
Reports root / small


6465.2 / Erlang 19.3 / internal_mnesia / 70b598f
Reports root/ big
OK: 1262 / Failed: 2 / User-skipped: 65 / Auto-skipped: 0

sm_SUITE:parallel:subscription_requests_are_buffered_properly
{error,{{badmatch,false},
    [{escalus_session,stream_management,2,
              [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_session.erl"},
               {line,227}]},
     {escalus_connection,connection_step,2,
               [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
                {line,134}]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
     {escalus_connection,start,2,
               [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
                {line,118}]},
     {sm_SUITE,'-subscription_requests_are_buffered_properly/1-fun-3-',6,
           [{file,"sm_SUITE.erl"},{line,848}]},
     {escalus_story,story,4,
            [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1529}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1045}]}]}}

Report log

sm_SUITE:parallel:subscription_requests_are_buffered_properly
{error,{{badmatch,false},
    [{escalus_session,stream_management,2,
              [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_session.erl"},
               {line,227}]},
     {escalus_connection,connection_step,2,
               [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
                {line,134}]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1263}]},
     {escalus_connection,start,2,
               [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_connection.erl"},
                {line,118}]},
     {sm_SUITE,'-subscription_requests_are_buffered_properly/1-fun-3-',6,
           [{file,"sm_SUITE.erl"},{line,848}]},
     {escalus_story,story,4,
            [{file,"/home/travis/build/esl/MongooseIM/big_tests/_build/default/lib/escalus/src/escalus_story.erl"},
             {line,72}]},
     {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1529}]},
     {test_server,run_test_case_eval1,6,
            [{file,"test_server.erl"},{line,1045}]}]}}

Report log


6465.3 / Erlang 19.3 / mysql_redis / 70b598f
Reports root/ big
OK: 3089 / Failed: 0 / User-skipped: 232 / Auto-skipped: 0


6465.4 / Erlang 19.3 / odbc_mssql_mnesia / 70b598f
Reports root/ big
OK: 3091 / Failed: 0 / User-skipped: 230 / Auto-skipped: 0


6465.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 70b598f
Reports root/ big
OK: 469 / Failed: 0 / User-skipped: 8 / Auto-skipped: 0


6465.5 / Erlang 19.3 / ldap_mnesia / 70b598f
Reports root/ big
OK: 1181 / Failed: 0 / User-skipped: 102 / Auto-skipped: 0


6465.8 / Erlang 20.0 / pgsql_mnesia / 70b598f
Reports root/ big / small
OK: 3123 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6465.9 / Erlang 21.0 / riak_mnesia / 70b598f
Reports root/ big / small
OK: 1453 / Failed: 0 / User-skipped: 63 / Auto-skipped: 0

@mongoose-im
Copy link
Collaborator

mongoose-im commented May 13, 2019

6479.1 / Erlang 19.3 / small_tests / 5ee9077
Reports root / small


6479.3 / Erlang 19.3 / mysql_redis / 5ee9077
Reports root/ big
OK: 3096 / Failed: 0 / User-skipped: 232 / Auto-skipped: 0


6479.5 / Erlang 19.3 / ldap_mnesia / 5ee9077
Reports root/ big
OK: 1185 / Failed: 0 / User-skipped: 105 / Auto-skipped: 0


6479.6 / Erlang 19.3 / elasticsearch_and_cassandra_mnesia / 5ee9077
Reports root/ big
OK: 469 / Failed: 0 / User-skipped: 8 / Auto-skipped: 0


6479.2 / Erlang 19.3 / internal_mnesia / 5ee9077
Reports root/ big
OK: 1222 / Failed: 0 / User-skipped: 68 / Auto-skipped: 0


6479.4 / Erlang 19.3 / odbc_mssql_mnesia / 5ee9077
Reports root/ big
OK: 3098 / Failed: 0 / User-skipped: 230 / Auto-skipped: 0


6479.8 / Erlang 20.0 / pgsql_mnesia / 5ee9077
Reports root/ big / small
OK: 3130 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6479.9 / Erlang 21.0 / riak_mnesia / 5ee9077
Reports root/ big / small
OK: 1457 / Failed: 0 / User-skipped: 66 / Auto-skipped: 0

@fenek fenek merged commit 6e22173 into gdpr-retrieve-clean May 13, 2019
@fenek fenek deleted the gdpr-retrieve-offline branch May 13, 2019 09:56
DenysGonchar pushed a commit that referenced this pull request May 27, 2019
* Improve mod offline retrival test

* Retrieve offline data from mnesia

* Add support for RDBMS

* Change timestamp format in rdbms return

* Fetch personal data for offline in riak backend

* Riak support for mod offline

* dont do useless parsing, fix var names in mod_offline_riak

* Call generic mod_offline instead of backend implementations

* Move GDPR logic to generic mod_offline

* Retrieve full jid

* Shorten assertions in tests

* Remove unimplemented function

* fix fetching offline for riak

* fix upper/lower case of jid issue

* Enable mod_offline retrieve tests

* merge reults from disabled backends as well

* test offline retrieve with unloading module
@fenek fenek added this to the MongooseIM 3.3.0++ milestone Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants