Riak base #378

michalwski · 2015-02-17T12:36:44Z

This PR:

adds simple connectivity layer with Riak
- allows to specify more than one pool of riak workers
abstracts some of riak operations in mongoose_riak module
adds ejabberd_auth_riak module

mongoose-im · 2015-02-17T12:37:44Z

travis is using test branch riak-base from https://github.com/esl/ejabberd_tests/tree/riak-base

michalwski · 2015-04-13T13:46:33Z

This PR goes on top of #426 now

ppikula · 2015-05-07T21:16:38Z

apps/ejabberd/src/ejabberd_auth_riak.erl

+        #scram{} = Scram ->
+            scram:check_digest(Scram, Digest, DigestGen, Password);
+        PassRiak when is_binary(PassRiak) ->
+            ejabberd_auth:check_digest(Digest, DigestGen, Password, PassRiak)


The check_password/3 function has _ clause. Do we need the same clause here?

We don't, it's covered in do_get_password/2.

ppikula · 2015-05-07T22:07:53Z

I don't see any changes in the doc directory ;)

michalwski · 2015-05-08T06:03:22Z

There is no doc yet because right now I'm focused on implementation only :)

ppikula · 2015-07-16T18:06:13Z

Any updates regarding docs ? I think this is very important to not make the doc worse than it is now and
adding undocumented stuff is step towards that situation. I won't merge it without the doc :).

automatically add worker to list of arguments in future can be used to update some metrics

by default travis sends build notifications to committer and author (I think that's enough). This will prevent unwanted notification from forks.

include riakc.hrl

this is required because riak plugin for rebar doesn't work with our rebar

if Mongoose should talk with cluster of Riak nodes without a LB

instead of creating pool of pools - each pool connecting to the one riak node, we move the responsiblity to the user. The user can use load balancer and point mongoose towards it. This simplifies reconfiguration of running Mongoose when the riak node is either added or removed. With this approach MiM has nothing to do, everything is handled by load balancer.

ppikula · 2015-08-13T15:14:05Z

RFC - auto reconnecting pool?

There is one thing that bothers me (not sure how to solve it), and I would like to here your opinion @esl/mongooseim-developers, because
it is a base PR for other riak features.

Right now when Riak is down, the pool after couple of supervisor's restarts goes down and
it won't reconnect automatically.

The question is: what should we do?

Of course the answer is: It depends on use case :)

Option 1: Use auto reconnect feature from riakc

We are consistent with odbc pool behaviour - the pool is constantly trying to reestablish connections and is even
capable of buffering some requests till connection is reestablished - it can
be set using client_options.
I tested it, again manually, seems to work. There is one problem, the riak library doesn't
log the fact that connection was lost and reestablished. @michalwski has some workaround to this.
Of course we still have logs from failed writes/reads, but we don't see exact moment when the outage started. Do we really need that ?

Option 2: Take down the server

It depends on the use case, but sometime it doesn't make sense to process messages -
it is better to crash early and give the feedback to client applications. In this
case we are not pretending that we saved messages in the archive and then oops - they are gone
because our riak cluster was down at that time. On the other hand, in case of authentication
backend it is ok to continue processing, we only won't be able to authenticate new sessions.
Maybe separate pools per each module?

Riak's selling point is its high availability, so if whole Riak cluster is unavailable,
something must be really wrong or it indicates network problems. Maybe it is a
good idea to stop the node?

Option 3: Make it configurable? What & How?

We can put strategy in the riak_server options (take down or retry). Does it make
sense? Does it introduce confusion?

I would vote for option 1, we keep configuration simple, it is the same as in case of odbc.

Summary of my recent changes:

added entries in the docs about configuring riak connection and the database itself + added description for remaining db backends
converted the pool of pools to one pool
made the pool reloadable via our config reload mechanism
changed Riak's configuration on Travis I added {delete_mode, immediate} option to the ensure that tombstones are deleted immediately not after 3secs, otherwise some of our tests may fail - escalus thinks that it removed users, but in fact it needs to wait 3 secs ... (more here ).
I'm not happy with this workaround, because we are testing not the real world case, I guess. On the other hand waiting 3 secs between every test case is pointless. I am not a Riak expert so, if someone has opinion on that, please share.

michalwski · 2015-08-17T09:24:55Z

I also vote for option 1. We some additional effort we can add logs to riakc process (thanks to sys:install function) informing when connection was closed and when reconnection attempts were made. From my experience it's always good to know when exactly communication with external service was broken and re-established.

pzel · 2015-08-17T09:28:10Z

I also vote for #1.
By the way, do we have a consistent method of raising alarms from
MongooseIM?
Now that most installations in the wild are actually instrumented with
Wombat, (and Wombat can send emails, khm khm), maybe we should start
notifying the alarm registry that something has gone wrong?

-S

On Mon, Aug 17, 2015 at 11:25 AM, Michał Piotrowski <
notifications@github.com> wrote:

I also vote for option 1. We some additional effort we can add logs to
riakc process (thanks to sys:install function) informing when connection
was closed and when reconnection attempts were made. From my experience
it's always good to know when exactly communication with external service
was broken and re-established.

—
Reply to this email directly or view it on GitHub
#378 (comment).

Simon Zelazny
Erlang Solutions, Ltd.

michalwski · 2015-08-17T09:36:04Z

I'm not sure if most installations are instrumented with Wombat. Did you count all the installations done without our knowledge? They are thousands I'd say :D and not necessarily using Wombat (yet :D ).
Anyway, alarm registry sounds good to me but we should not assume we have Wombat installed.

pzel · 2015-08-17T09:39:38Z

Sure -- my point was rather that alarms are an easy and extensible way to notify operators about serious failures, and they cost nothing in terms of performance. Plus, there's no chance that an alarm will get throttled, as sometimes happens either at the lager or syslog level. It's a shame when critical information gets killed by an overwhelmed lager.

Anyway, alarms are another topic that should be addressed separately, sorry for side-tracking the conversation!

erszcz · 2015-08-17T11:36:51Z

👍 for option 1 because of consistency with ODBC pooling. Moreover, server shutdown is something that requires manual intervention, which is labour, therefore cost. I'd say that providing a limited service (e.g. no archive) is a much lesser impact/deficiency from the end-user's point of view. IMO option 3 is subject to YAGNI.

Also 👍 for alarms.

michalwski · 2015-08-17T13:50:38Z

👍 from me for merge

Riak base

michalwski added the WIP 🚧 label Feb 17, 2015

michalwski force-pushed the riak-base branch from 6c41e31 to 5c46d7e Compare February 17, 2015 13:58

michalwski removed the WIP 🚧 label Feb 23, 2015

michalwski mentioned this pull request Feb 23, 2015

Riak base esl/ejabberd_tests#114

Merged

michalwski mentioned this pull request Apr 13, 2015

Auth refactoring #426

Merged

michalwski force-pushed the riak-base branch from 5c46d7e to ba11c9a Compare April 13, 2015 13:36

michalwski force-pushed the riak-base branch from ba11c9a to b8577d5 Compare May 5, 2015 08:32

ppikula reviewed May 7, 2015
View reviewed changes

michalwski mentioned this pull request Jun 29, 2015

Riak vcard #460

Merged

michalwski added 15 commits August 10, 2015 16:06

add riak client as a dep

d1e82fe

use cuesport from esl's branch

7bbf351

base riak pool

af323b6

add abstraction over riak_pb_socket API

1740f1c

automatically add worker to list of arguments in future can be used to update some metrics

remove notification address from .travis.yml

020ef6d

by default travis sends build notifications to committer and author (I think that's enough). This will prevent unwanted notification from forks.

do not auto import put/2

2f87741

include riakc.hrl

compile riakc in prehook

9f1233a

this is required because riak plugin for rebar doesn't work with our rebar

test riak on travis

b15f903

test mysql on travis

3db064f

simplify setup_riak script

c519c29

add more helpers to mongoose_riak module

6b1217f

add basic riak auth backend

1e77e66

add riak_config to ejabberd.cfg file

c7a9258

add more helpers to mongoose_riak

1df679a

implement some not implemented functions for riak auth

a387754

michalwski added 7 commits August 10, 2015 16:06

restore ldap and external presets on travis

1e43b8c

do not prepare username and server in auth_riak

7faba25

implement more functions defined by the gen_auth behaviour

f829b05

restore other jobs on travis

66f1888

add possibility to specify more than on riak pool

11e3f15

if Mongoose should talk with cluster of Riak nodes without a LB

ignore riak_pools_count local config option during conf reload

1cd0337

keep pools count in riak_sup protected ets table

f226072

ppikula force-pushed the riak-base branch from b8577d5 to f226072 Compare August 10, 2015 14:07

Pawel Pikula added 2 commits August 12, 2015 14:15

[skip ci] describe how to setup riak and other databases

5c9f2dc

ppikula added the WIP 🚧 label Aug 13, 2015

Pawel Pikula added 3 commits August 13, 2015 14:08

try to override riak config

d0ad70a

make riak pool reloadable

3d0f5ff

[skip ci] typo in docs

ed05733

set auto_reconnect and keepalive options

4e1e861

ppikula removed the WIP 🚧 label Aug 17, 2015

ppikula added a commit that referenced this pull request Aug 17, 2015

Merge pull request #378 from esl/riak-base

4c9c418

Riak base

ppikula merged commit 4c9c418 into master Aug 17, 2015

ppikula deleted the riak-base branch September 11, 2015 16:13

michalwski mentioned this pull request Oct 15, 2015

MongooseIM 1.6.0 #552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Riak base #378

Riak base #378

michalwski commented Feb 17, 2015

mongoose-im commented Feb 17, 2015

michalwski commented Apr 13, 2015

ppikula May 7, 2015

michalwski May 8, 2015

ppikula commented May 7, 2015

michalwski commented May 8, 2015

ppikula commented Jul 16, 2015

ppikula commented Aug 13, 2015

michalwski commented Aug 17, 2015

pzel commented Aug 17, 2015

michalwski commented Aug 17, 2015

pzel commented Aug 17, 2015

erszcz commented Aug 17, 2015

michalwski commented Aug 17, 2015

Riak base #378

Riak base #378

Conversation

michalwski commented Feb 17, 2015

mongoose-im commented Feb 17, 2015

michalwski commented Apr 13, 2015

ppikula May 7, 2015

Choose a reason for hiding this comment

michalwski May 8, 2015

Choose a reason for hiding this comment

ppikula commented May 7, 2015

michalwski commented May 8, 2015

ppikula commented Jul 16, 2015

ppikula commented Aug 13, 2015

RFC - auto reconnecting pool?

Option 1: Use auto reconnect feature from riakc

Option 2: Take down the server

Option 3: Make it configurable? What & How?

Summary of my recent changes:

michalwski commented Aug 17, 2015

pzel commented Aug 17, 2015

michalwski commented Aug 17, 2015

pzel commented Aug 17, 2015

erszcz commented Aug 17, 2015

michalwski commented Aug 17, 2015