Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch GDPR migration guide. #2345

Merged
merged 2 commits into from
Jun 25, 2019
Merged

Conversation

DenysGonchar
Copy link
Collaborator

@DenysGonchar DenysGonchar commented Jun 24, 2019

to test schema update on our dev setup:

source tools/test-runner-complete.sh 
git checkout a41c96c42 ##some commit before elasticsearch schema update
test-runner.sh --db elasticsearch --preset elasticsearch_and_cassandra_mnesia --one-node --skip-small-tests --skip-cover  --skip-stop-nodes -- mam gdpr
tools/stop-nodes.sh 
curl -sX GET "localhost:9200/muc_messages/_mapping/"  | jq

##update mapping
curl -sX PUT "localhost:9200/muc_messages/_mapping/muc" -w ' {"status": "%{http_code}"}' -d'
{
  "properties": {
    "mam_id": {
      "type": "long"
    },
    "room": {
      "type": "keyword"
    },
    "from_jid" : {
      "type": "keyword"
    },
    "source_jid": {
      "type": "keyword"
    },
    "message": {
      "type": "text",
      "index": false
    },
    "body": {
      "type": "text",
      "analyzer": "english"
    }
  }
}' | jq
curl -sX GET "localhost:9200/muc_messages/_mapping/"  | jq


git checkout gdpr-elasticsearch-migration
git clean -ffxd --exclude tools/ssl/
test-runner.sh --db --preset elasticsearch_and_cassandra_mnesia --one-node --skip-small-tests --skip-cover  --skip-stop-nodes -- gdpr

#check the number of records w/o "from_jid" field
curl -sX GET 'localhost:9200/muc_messages/_count/?q=!_exists_:from_jid' | jq '."count"'

#check how many records does have "from_jid" field
curl -sX GET 'localhost:9200/muc_messages/_count/?q=_exists_:from_jid' | jq '."count"'

#see all the records that must be converted ()
curl -sX GET 'localhost:9200/muc_messages/_search/?size=1000&q=!_exists_:from_jid' | jq '' | less

example of conversion:

  1. Get some messages for conversion
curl -sX GET 'localhost:9200/muc_messages/_search/?size=1000&q=!_exists_:from_jid' | jq '."hits"."hits"[] | {id : ."_id", message : ."_source"."message"}'
{
  "id": "alice24.824556room0-25633-1@muc.localhost$399703199958901761",
  "message": "<message xml:lang='en' type='groupchat' to='alicE24.824556room0-25633-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.824556@localhost/res1' role='moderator'/></x></message>"
}
{
  "id": "alice15.709225room0-24985-1@muc.localhost$399703197637795841",
  "message": "<message xml:lang='en' type='groupchat' to='alicE15.709225room0-24985-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE15.709225@localhost/res1' role='moderator'/></x></message>"
}
{
  "id": "alice24.898748room0-25659-1@muc.localhost$399703199978590977",
  "message": "<message xml:lang='en' type='groupchat' to='alicE24.898748room0-25659-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.898748@localhost/res1' role='moderator'/></x></message>"
}
  1. Convert one of them:
id='alice24.824556room0-25633-1@muc.localhost$399703199958901761'
msg="<message xml:lang='en' type='groupchat' to='alicE24.824556room0-25633-1@muc.localhost'><body>Hi, Bob\!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.824556@localhost/res1' role='moderator'/></x></message>"

msg_len="$(echo -n "$msg" | wc -c)"
from_jid="$(echo -e "${msg_len}\n${msg}" | tools/migration/sender-jid-from-mam-message.escript xml | tail -1)"

curl -sX POST "localhost:9200/muc_messages/muc/${id}/_update/" -w ' {"status": "%{http_code}"}' -d'
{
  "doc" : { "from_jid" : "'"${from_jid}"'" }
}' | jq
{
  "_index": "muc_messages",
  "_type": "muc",
  "_id": "alice24.824556room0-25633-1@muc.localhost$399703199958901761",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}
{
  "status": "200"
}
  1. check the remaining records w/o from_jid
curl -sX GET 'localhost:9200/muc_messages/_search/?size=1000&q=!_exists_:from_jid' | jq '."hits"."hits"[] | {id : ."_id", message : ."_source"."message"}'
{
  "id": "alice15.709225room0-24985-1@muc.localhost$399703197637795841",
  "message": "<message xml:lang='en' type='groupchat' to='alicE15.709225room0-24985-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE15.709225@localhost/res1' role='moderator'/></x></message>"
}
{
  "id": "alice24.898748room0-25659-1@muc.localhost$399703199978590977",
  "message": "<message xml:lang='en' type='groupchat' to='alicE24.898748room0-25659-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.898748@localhost/res1' role='moderator'/></x></message>"
}
  1. check that record can be found by from_jid:
curl -sX POST 'localhost:9200/muc_messages/muc/_search/' -w ' {"status": "%{http_code}"}' -d'
{
  "query" : {"bool": {"filter" : {"term": {"from_jid" : "alice24.824556@localhost"}}}}
}' | jq '."hits"."hits", ."status" | values'
[
  {
    "_index": "muc_messages",
    "_type": "muc",
    "_id": "alice24.824556room0-25633-1@muc.localhost$399703199958901761",
    "_score": 0,
    "_source": {
      "body": "Hi, Bob!",
      "mam_id": 399703199958901760,
      "message": "<message xml:lang='en' type='groupchat' to='alicE24.824556room0-25633-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.824556@localhost/res1' role='moderator'/></x></message>",
      "room": "alice24.824556room0-25633-1@muc.localhost",
      "source_jid": "alice24.824556room0-25633-1@muc.localhost/alicE24.824556",
      "from_jid": "alice24.824556@localhost"
    }
  }
]
"200"
  1. check manually that MIM can extract GDPR data:
(mongooseim@localhost)1> rp(mod_mam_muc:get_personal_data(<<"alicE24.824556">>,<<"localhost">>)).
[{mam_muc,["id","message"],
          [{<<"399703199958901761">>,
            <<"<message xml:lang='en' type='groupchat' to='alicE24.824556room0-25633-1@muc.localhost'><body>Hi, Bob!</body><x xmlns='http://jabber.org/protocol/muc#user'><item affiliation='owner' jid='alicE24.824556@localhost/res1' role='moderator'/></x></message>">>}]}]
ok

@mongoose-im
Copy link
Collaborator

mongoose-im commented Jun 24, 2019

6712.1 / Erlang 20.3 / small_tests / 2f7069d
Reports root / small


6712.2 / Erlang 20.3 / internal_mnesia / 2f7069d
Reports root/ big
OK: 1238 / Failed: 0 / User-skipped: 159 / Auto-skipped: 0


6712.3 / Erlang 20.3 / odbc_mssql_mnesia / 2f7069d
Reports root/ big
OK: 3141 / Failed: 0 / User-skipped: 272 / Auto-skipped: 0


6712.4 / Erlang 20.3 / ldap_mnesia / 2f7069d
Reports root/ big
OK: 1199 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6712.5 / Erlang 20.3 / elasticsearch_and_cassandra_mnesia / 2f7069d
Reports root/ big
OK: 528 / Failed: 0 / User-skipped: 51 / Auto-skipped: 0

@codecov
Copy link

codecov bot commented Jun 24, 2019

Codecov Report

Merging #2345 into master will decrease coverage by 10.58%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2345       +/-   ##
===========================================
- Coverage   78.33%   67.74%   -10.59%     
===========================================
  Files         335      335               
  Lines       29260    29260               
===========================================
- Hits        22920    19822     -3098     
- Misses       6340     9438     +3098
Impacted Files Coverage Δ
src/mod_auth_token_rdbms.erl 0% <0%> (-100%) ⬇️
src/inbox/mod_inbox_rdbms_mysql.erl 0% <0%> (-100%) ⬇️
src/mam/mam_message.erl 0% <0%> (-100%) ⬇️
src/inbox/mod_inbox_muclight.erl 0% <0%> (-100%) ⬇️
src/inbox/mod_inbox_rdbms_pgsql.erl 0% <0%> (-100%) ⬇️
src/mam/mam_jid.erl 0% <0%> (-100%) ⬇️
src/inbox/mod_inbox_one2one.erl 0% <0%> (-100%) ⬇️
src/mam/mam_jid_rfc.erl 0% <0%> (-100%) ⬇️
src/inbox/mod_inbox_rdbms_mssql.erl 0% <0%> (-100%) ⬇️
src/event_pusher/mod_event_pusher_rabbit.erl 0% <0%> (-97.02%) ⬇️
... and 87 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0672304...79cbfdc. Read the comment docs.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Jun 24, 2019

6716.1 / Erlang 20.3 / small_tests / 35e6c4c
Reports root / small


6716.2 / Erlang 20.3 / internal_mnesia / 35e6c4c
Reports root/ big
OK: 1238 / Failed: 0 / User-skipped: 159 / Auto-skipped: 0


6716.3 / Erlang 20.3 / odbc_mssql_mnesia / 35e6c4c
Reports root/ big
OK: 3141 / Failed: 0 / User-skipped: 272 / Auto-skipped: 0


6716.4 / Erlang 20.3 / ldap_mnesia / 35e6c4c
Reports root/ big
OK: 1199 / Failed: 0 / User-skipped: 198 / Auto-skipped: 0


6716.5 / Erlang 20.3 / elasticsearch_and_cassandra_mnesia / 35e6c4c
Reports root/ big
OK: 528 / Failed: 0 / User-skipped: 51 / Auto-skipped: 0


6716.8 / Erlang 21.3 / mysql_redis / 35e6c4c
Reports root/ big
OK: 3139 / Failed: 0 / User-skipped: 274 / Auto-skipped: 0


6716.7 / Erlang 21.3 / pgsql_mnesia / 35e6c4c
Reports root/ big / small
OK: 3173 / Failed: 0 / User-skipped: 240 / Auto-skipped: 0


6716.9 / Erlang 21.3 / riak_mnesia / 35e6c4c
Reports root/ big / small
OK: 1487 / Failed: 0 / User-skipped: 143 / Auto-skipped: 0

@DenysGonchar DenysGonchar force-pushed the gdpr-elasticsearch-migration branch from 5cfeb8b to 745bd07 Compare June 25, 2019 06:59
@DenysGonchar DenysGonchar requested review from fenek and goddammit June 25, 2019 07:00
doc/migrations/3.3.0_3.3.0plus.md Outdated Show resolved Hide resolved
doc/migrations/3.3.0_3.3.0plus.md Outdated Show resolved Hide resolved
doc/migrations/3.3.0_3.3.0plus.md Outdated Show resolved Hide resolved
@DenysGonchar DenysGonchar force-pushed the gdpr-elasticsearch-migration branch from 1eafd3d to 79cbfdc Compare June 25, 2019 08:59
@mongoose-im
Copy link
Collaborator

6719.1 / Erlang 20.3 / small_tests / d428df0
Reports root / small

@fenek fenek merged commit c020e01 into master Jun 25, 2019
@fenek fenek deleted the gdpr-elasticsearch-migration branch June 25, 2019 09:15
@fenek fenek added this to the 3.4.0 milestone Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants