rodeos reliable amqp connection - develop #9336

heifner · 2020-07-23T16:25:13Z

Change Description

Pulling over AMQP work now used in rodeos to develop.

===

As EOSIO gains more support for AMQP it gains the need in some use cases to reliably publish messages to AMQP exchanges. reliable_amqp_publisher is a small class for publishing messages to an exchange that:

Reconnects to the AMQP server on connection failure
Reconnects to the AMQP server on channel failure (such as when publishing to a non-existing exchange)
Publishes messages in an AMQP transaction for positive confirmation the broker received them
Queues unconfirmed messages while not connected
Saves unconfirmed messages on disk on exit, and restores on relaunch

reliable_amqp_queue expects that the exchange it is configured to publish to will exist. It will not attempt to create it.

I am expecting and fully accept grief due to the lack of integration tests. I would like to explore with the team how we can best add integration tests for AMQP.

===

Getting a bit tired of copy pasting around some amqp connection management stuff in a few places. retrying_amqp_connection tries to solve the typical use case of needing a single channel connected to an AMQP server that is retried on failure. Compared to the code that was in reliable_amqp_publisher previously, a few functional differences are:

Doesn’t use AMQP::LibBoostAsioHandler; AMQP-CPP upstream is very clear that this isn’t supported
Doesn’t rely on AMQP’s TCP module; that would break a win32 build
Retries the connection/channel every second instead of a back off on consecutive failures
Doesn’t print the AMQP password to logs (whoops)

But there will likely be future improvements easier to add this way too, like TLS support with peer auth for example.

===

Add reliable_amqp_publisher to rodeos streamer to provide reliable publishing of messages.
Added ability to make the reliable publishing by block optional so that messages are published immediately as the filter is processed.

Change Type

Select ONE

Documentation

Stability bug fix

Other

Other - special case

Consensus Changes

Consensus Changes

API Changes

API Changes

Documentation Additions

Documentation Additions

New options:
Command line option: --stream-delete-unsent
"Delete unsent AMQP stream data retained from previous connections"

Config option: stream-rabbits-immediately=true
"Stream to RabbitMQ immediately instead of batching per block. Disables reliable message delivery."

it won't be possible to shutdown cleanly if it's stopped

& just support passing an io_context for now. Move most the impl to a .cpp since it's not templated any longer

This reverts commit 1c401e1.

… durable.

…er.hpp into one amqp library

…_error calls

kimjh2005 · 2020-08-06T18:03:44Z

libraries/reliable_amqp_publisher/reliable_amqp_publisher.cpp

+      })
+      .onFinalize([this]() {
+         in_flight = 0;
+         //unfortuately we don't know if an error is due to something recoverable or if an error is due


#1. If commitTransaction() fails, we may need to log it. We can get useful information such as how often it happens, the message_id, the number of messages in the envelop, the size of envelop, etc.

#2. Can the size of AMQP::Envelop be too big for rabbitmq if the the number or message with the same message_id is too many and too big?

#3. Can pump_queue() be called continually and consuming all the cpu if it doesn't cause channel_failed and still commitTransaction fails?

The default max message size in rabbitmq is 128MB; rabbitmq's max configurable is 512MB; amqp's max is 2GB. These limits were no where close to what I was expecting this to be used for originally. @heifner is rodeos usage a lot of data?

I think any error is going to bubble up as a channel error or connection error, both of which have a 1s retry timer on them. You can see this by doing actions like publishing to a non-exsisting exchange, or publishing to an exchange that is bound to a queue that is full (and rejecting publishing due to that). Both of those result in a channel error and a 1s retry before attempting again.

No way they will be anywhere near that size.

libraries/reliable_amqp_publisher/reliable_amqp_publisher.cpp

kimjh2005 · 2020-08-10T14:28:26Z

libraries/amqp/reliable_amqp_publisher.cpp

+      boost::filesystem::ofstream o(data_file_path);
+      FC_ASSERT(o.good(), "Failed to create unconfirmed AMQP message file at ${f}", ("f", (fc::path)data_file_path));
+   }
+   boost::filesystem::remove(data_file_path, ec);


If preserving unsent messages is important, Instead of removing file here can it be renamed as a backup file or deleted later?

I believe our current plan is to always persist to disk as we go along to support hard-failure cases like kill -9

I just think of a case that the file has some unconfirmed messages because of the previous problem. And nodeos restarts and reads the file and crashes or killed by -9 before writing to a data file. Since we deleted the data file already, the next rerun will not have a valid data file.

(the reason it works this way) was IMO if nodeos crashes I feel like you're going to have bigger problems. You're going to have to start anew from a snapshot or reply/resync anyways. So you can pick a block to restart from that covers the area affected.

But as Kevin explains, there is a desire to harden usage of file further.

kimjh2005

It looks good.

jeffreyssmith2nd

Hold for product review

heifner · 2021-03-30T19:53:05Z

Way out of date.

spoonincode and others added 30 commits July 22, 2020 15:38

a reliable AMQP publisher

c76d466

replace std::for_each_n with just for_each fixing gnu libc++7/8

3a46050

fix an out of date comment

71b3c9f

mark reliable amqp messages as persistent

208a012

check writeablity of file location

6046eec

allow setting the routing key

dad78c2

pump the queue on reconnection

7a3d153

add support for setting message-id on published messages

a59216a

stopping variable should be atomic

43afb6e

comment fixes

b225769

move some items to a strand for guaranteed fifo ordering

0950359

connect to amqp immediately on startup -- do not wait 1s

1561a22

rename serialized function to avoid accidental usage

9bd588f

move location where directory is created

57d2758

remove dispatch() for a little more control over dispatch semantics

83ded9c

refactor away reliable_amqp_publisher_callbacks

3051c27

don't stop the io_context on an escaped exception

ee0dfe8

it won't be possible to shutdown cleanly if it's stopped

make a bunch of parameters references

de47308

retrying_amqp_connection

c9b7ec8

refactor reliable_amqp_publisher to use retrying_amqp_connection

cb7d6e9

tweak retry_connection() ordering based on recent tests

b88e3dc

update single_channel_retrying_amqp_connection usage

c93a643

support for fc::fwd with 5 ctor args

a5b0612

use a strand for all operations

51a239d

& just support passing an io_context for now. Move most the impl to a .cpp since it's not templated any longer

prevent potential double stack of start_connection()

a640f37

comment update

4db2cc8

remove the paranoia around Channel's onError() (just don't do that)

c9fe568

just use a unique_ptr here instead of fc::fwd

875bfcd

Revert "support for fc::fwd with 5 ctor args"

e0d4093

This reverts commit 1c401e1.

Add explicit send feature

fd0e2af

heifner added 18 commits July 23, 2020 10:45

Add ability to just publish directly to amqp queue

5f64192

Take data by value

512e8ae

Add stream-rabbits-immediately option

3ab2f7b

Handle unsigned_int overflow

bec0f2b

Use message_queue size only for determination of dropping messages

a338f14

Remove redundent eosio::reliable_amqp_publisher create. Make exchange…

a18a61e

… durable.

Do not try to publish unless connected

3b2cf17

shutdown cleanly on amqp connection issues

f3fa5ed

Merge retrying_amqp_connection reliable_amqp_publisher and amqp_handl…

c21b7ba

…er.hpp into one amqp library

Refactored to use amqp_handler

f7b8cf8

Move AMQP::Address to_variant to util

626755a

Add appbase

25e7bd1

on_error can be called from calling thread on time out

2640db3

Fix race condition

738cf78

fix thread-safety of stop()

c39eaf1

Document non-thread-safe. Use post instead of mutex for protecting on…

4ea0137

…_error calls

revert to using mtx. Add missing calls to wait promise

652d22c

Fix merge issues

2d25526

heifner requested a review from b1bart July 23, 2020 16:25

jeffreyssmith2nd added enhancement needs review labels Aug 5, 2020

kimjh2005 self-requested a review August 6, 2020 13:06

kimjh2005 reviewed Aug 6, 2020

View reviewed changes

libraries/reliable_amqp_publisher/reliable_amqp_publisher.cpp Outdated Show resolved Hide resolved

kimjh2005 reviewed Aug 10, 2020

View reviewed changes

kimjh2005 approved these changes Aug 10, 2020

View reviewed changes

jeffreyssmith2nd suggested changes Aug 10, 2020

View reviewed changes

heifner closed this Mar 30, 2021

heifner deleted the retrying_amqp_connection-develop branch March 30, 2021 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rodeos reliable amqp connection - develop #9336

rodeos reliable amqp connection - develop #9336

heifner commented Jul 23, 2020

kimjh2005 Aug 6, 2020 •

edited

Loading

spoonincode Aug 6, 2020 •

edited

Loading

spoonincode Aug 6, 2020

heifner Aug 6, 2020

kimjh2005 Aug 10, 2020

heifner Aug 10, 2020

kimjh2005 Aug 10, 2020 •

edited

Loading

spoonincode Aug 10, 2020

kimjh2005 left a comment

jeffreyssmith2nd left a comment •

edited

Loading

heifner commented Mar 30, 2021

rodeos reliable amqp connection - develop #9336

rodeos reliable amqp connection - develop #9336

Conversation

heifner commented Jul 23, 2020

Change Description

Change Type

Consensus Changes

API Changes

Documentation Additions

kimjh2005 Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

spoonincode Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

spoonincode Aug 6, 2020

Choose a reason for hiding this comment

heifner Aug 6, 2020

Choose a reason for hiding this comment

kimjh2005 Aug 10, 2020

Choose a reason for hiding this comment

heifner Aug 10, 2020

Choose a reason for hiding this comment

kimjh2005 Aug 10, 2020 • edited Loading

Choose a reason for hiding this comment

spoonincode Aug 10, 2020

Choose a reason for hiding this comment

kimjh2005 left a comment

Choose a reason for hiding this comment

jeffreyssmith2nd left a comment • edited Loading

Choose a reason for hiding this comment

heifner commented Mar 30, 2021

kimjh2005 Aug 6, 2020 •

edited

Loading

spoonincode Aug 6, 2020 •

edited

Loading

kimjh2005 Aug 10, 2020 •

edited

Loading

jeffreyssmith2nd left a comment •

edited

Loading