Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Riak base #378

Merged
merged 28 commits into from
Aug 17, 2015
Merged

Riak base #378

merged 28 commits into from
Aug 17, 2015

Conversation

michalwski
Copy link
Contributor

This PR:

  • adds simple connectivity layer with Riak
    • allows to specify more than one pool of riak workers
  • abstracts some of riak operations in mongoose_riak module
  • adds ejabberd_auth_riak module

@mongoose-im
Copy link
Collaborator

travis is using test branch riak-base from https://github.com/esl/ejabberd_tests/tree/riak-base

@michalwski
Copy link
Contributor Author

This PR goes on top of #426 now

#scram{} = Scram ->
scram:check_digest(Scram, Digest, DigestGen, Password);
PassRiak when is_binary(PassRiak) ->
ejabberd_auth:check_digest(Digest, DigestGen, Password, PassRiak)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_password/3 function has _ clause. Do we need the same clause here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, it's covered in do_get_password/2.

@ppikula
Copy link
Contributor

ppikula commented May 7, 2015

I don't see any changes in the doc directory ;)

@michalwski
Copy link
Contributor Author

There is no doc yet because right now I'm focused on implementation only :)

@michalwski michalwski mentioned this pull request Jun 29, 2015
@ppikula
Copy link
Contributor

ppikula commented Jul 16, 2015

Any updates regarding docs ? I think this is very important to not make the doc worse than it is now and
adding undocumented stuff is step towards that situation. I won't merge it without the doc :).

Pawel Pikula added 2 commits August 12, 2015 14:15
instead of creating pool of pools - each pool connecting to the one
riak node, we move the responsiblity to the user. The user can use load balancer
and point mongoose towards it. This simplifies reconfiguration of running Mongoose
when the riak node is either added or removed. With this approach MiM has nothing
to do, everything is handled by load balancer.
@ppikula
Copy link
Contributor

ppikula commented Aug 13, 2015

RFC - auto reconnecting pool?

There is one thing that bothers me (not sure how to solve it), and I would like to here your opinion @esl/mongooseim-developers, because
it is a base PR for other riak features.

Right now when Riak is down, the pool after couple of supervisor's restarts goes down and
it won't reconnect automatically.

The question is: what should we do?

Of course the answer is: It depends on use case :)

Option 1: Use auto reconnect feature from riakc

We are consistent with odbc pool behaviour - the pool is constantly trying to reestablish connections and is even
capable of buffering some requests till connection is reestablished - it can
be set using client_options.
I tested it, again manually, seems to work. There is one problem, the riak library doesn't
log the fact that connection was lost and reestablished. @michalwski has some workaround to this.
Of course we still have logs from failed writes/reads, but we don't see exact moment when the outage started. Do we really need that ?

Option 2: Take down the server

It depends on the use case, but sometime it doesn't make sense to process messages -
it is better to crash early and give the feedback to client applications. In this
case we are not pretending that we saved messages in the archive and then oops - they are gone
because our riak cluster was down at that time. On the other hand, in case of authentication
backend it is ok to continue processing, we only won't be able to authenticate new sessions.
Maybe separate pools per each module?

Riak's selling point is its high availability, so if whole Riak cluster is unavailable,
something must be really wrong or it indicates network problems. Maybe it is a
good idea to stop the node?

Option 3: Make it configurable? What & How?

We can put strategy in the riak_server options (take down or retry). Does it make
sense? Does it introduce confusion?

I would vote for option 1, we keep configuration simple, it is the same as in case of odbc.

Summary of my recent changes:

  • added entries in the docs about configuring riak connection and the database itself + added description for remaining db backends
  • converted the pool of pools to one pool
  • made the pool reloadable via our config reload mechanism
  • changed Riak's configuration on Travis I added {delete_mode, immediate} option to the ensure that tombstones are deleted immediately not after 3secs, otherwise some of our tests may fail - escalus thinks that it removed users, but in fact it needs to wait 3 secs ... (more here ).
    I'm not happy with this workaround, because we are testing not the real world case, I guess. On the other hand waiting 3 secs between every test case is pointless. I am not a Riak expert so, if someone has opinion on that, please share.

@michalwski
Copy link
Contributor Author

I also vote for option 1. We some additional effort we can add logs to riakc process (thanks to sys:install function) informing when connection was closed and when reconnection attempts were made. From my experience it's always good to know when exactly communication with external service was broken and re-established.

@pzel
Copy link
Contributor

pzel commented Aug 17, 2015

I also vote for #1.
By the way, do we have a consistent method of raising alarms from
MongooseIM?
Now that most installations in the wild are actually instrumented with
Wombat, (and Wombat can send emails, khm khm), maybe we should start
notifying the alarm registry that something has gone wrong?

-S

On Mon, Aug 17, 2015 at 11:25 AM, Michał Piotrowski <
notifications@github.com> wrote:

I also vote for option 1. We some additional effort we can add logs to
riakc process (thanks to sys:install function) informing when connection
was closed and when reconnection attempts were made. From my experience
it's always good to know when exactly communication with external service
was broken and re-established.


Reply to this email directly or view it on GitHub
#378 (comment).

Simon Zelazny
Erlang Solutions, Ltd.

@michalwski
Copy link
Contributor Author

I'm not sure if most installations are instrumented with Wombat. Did you count all the installations done without our knowledge? They are thousands I'd say :D and not necessarily using Wombat (yet :D ).
Anyway, alarm registry sounds good to me but we should not assume we have Wombat installed.

@pzel
Copy link
Contributor

pzel commented Aug 17, 2015

Sure -- my point was rather that alarms are an easy and extensible way to notify operators about serious failures, and they cost nothing in terms of performance. Plus, there's no chance that an alarm will get throttled, as sometimes happens either at the lager or syslog level. It's a shame when critical information gets killed by an overwhelmed lager.

Anyway, alarms are another topic that should be addressed separately, sorry for side-tracking the conversation!

@erszcz
Copy link
Member

erszcz commented Aug 17, 2015

👍 for option 1 because of consistency with ODBC pooling. Moreover, server shutdown is something that requires manual intervention, which is labour, therefore cost. I'd say that providing a limited service (e.g. no archive) is a much lesser impact/deficiency from the end-user's point of view. IMO option 3 is subject to YAGNI.

Also 👍 for alarms.

@ppikula ppikula removed the WIP 🚧 label Aug 17, 2015
@michalwski
Copy link
Contributor Author

👍 from me for merge

ppikula added a commit that referenced this pull request Aug 17, 2015
@ppikula ppikula merged commit 4c9c418 into master Aug 17, 2015
@ppikula ppikula deleted the riak-base branch September 11, 2015 16:13
@michalwski michalwski mentioned this pull request Oct 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants