Skip to content

Delivery guarantees

Mogens Heller Grabe edited this page May 24, 2017 · 2 revisions

As described on the transactions page, Rebus does not use DTC or any other kind of distributed transaction mechanism – instead, it will simply ensure that your handlers are invoked in the right way, thus making it possible to preserve a strict at least once-delivery guarantee throughout.

But what does that mean?

Which delivery guarantees are there?

When talking about which guarantees can be provided by messaging systems, they fall into three categories:

  1. Exactly Once(*)
  2. At Most Once
  3. At Least Once

Exactly once has a (*) next to it, because in the general case, it is simply not possible 😄

At Most Once

The At Most Once delivery guarantee covers the case when you are guaranteed to receive all messages either once, or maybe not at all.

This type of delivery guarantee can arise from your messaging system and your code performing its actions in the following order:

1. Remove message from queue
2. Start work transaction
3. Handle message (your code)
4. Success?
    Yes:
        1. Commit work transaction
    No: 
        1. Roll back work transaction
        2. Put message back into the queue

In the sunshine scenario, this is all well and good – your messages will be received, and work transactions will be committed, and you will be happy.

However, the sun does not always shine, and stuff tends to fail – especially if you do enough stuff. Consider e.g. what would happen if anything fails after having performed step (1), and then – when you try to execute step (4)/(2) (i.e. put the message back into the queue) – the network was temporarily unavailable, or the message broker restarted, or the host machine decided to reboot because it had installed an update.

With this protocol, it is easy to see you run the risk of losing messages when things fail.

This can be OK if it's what you want, but most things in Rebus revolve around the concept of DURABLE messages, i.e. messages whose contents is just as important as the data in your database.

At Least Once

This delivery guarantee covers the case when you are guaranteed to receive all messages either once, or maybe more times if something has failed.

It requires a slight change to the order we are executing our steps in, and it requires that the message queue system supports transactions, either in the form of the traditional begin-commit-rollback protocol (MSMQ does this), or in the form of a receive-ack-nack protocol (RabbitMQ, Azure Service Bus, etc. do this).

Check this out – if we do this:

1. Grab lease on message in queue
2. Start work transaction
3. Handle message (your code)
4. Success?
    Yes: 
        1. Commit work transaction
        2. Delete message from queue
    No: 
        1. Roll back work transaction
        2. Release lease on message

and the "lease" we grabbed on the message in step (1) is associated with an appropriate timeout, then we are guaranteed that no matter how wrong things go, we will only actually remove the message from the queue (i.e. execute step (4)/(2)) if we have successfully committed our "work transaction".

What is a "work transaction"?

It depends on what you're doing 😄 maybe it's a transaction in a relational database (which traditionally have pretty good support in this regard), maybe it's a transaction in a document database that happens to support transaction (like RavenDB or Postgres), or maybe it's a conceptual transaction in the form of whichever work you happen to carry out as a consequence of handling a message, e.g. update a bunch of documents in MongoDB, move some files around in the file system, or mutate some obscure in-mem data structure.

The fact that the "work transaction" is just a conceptual thing is what makes it impossible to support the aforementioned Exactly Once delivery guarantee – it's just not generally possible to commit or roll back a "work transaction" and a "queue transaction" (which is what we could call the protocol carried out with the message queue systems) atomically and consistently.


(*) Because of CAP

Clone this wiki locally