-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat[MQB]: Enhance queue consumption monitor alarm log with additional details #420
Feat[MQB]: Enhance queue consumption monitor alarm log with additional details #420
Conversation
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Few questions.
-
Maybe, we can print
Storage::numMessages
andStorage::numBytes
as well? -
Can we get rid of
QueueEngineUtil_AppState::head()
andQueueConsumptionMonitor::SubStreamInfo::d_headCb
now? -
Is
QueueConsumptionMonitor::onTransitionToIdle
"level triggered" (vs "edge triggered")? It can be noisy, how often we want that log?
@chrisbeard please take a look at the output
Regarding
Aren't they the same as Storage::numMessages() and Storage::numBytes? In my test they printed the same values. Are there any possible scenarios when they will differ? |
Regarding
Is Regarding |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a possibility to reduce mqbblp_queueconsumptionmonitor.t
dependencies. Let's explore it.
@@ -503,19 +498,7 @@ TEST_F(Test, putAliveIdleWithConsumer) | |||
ASSERT_EQ(logObserver.records().size(), ++expectedLogRecords); | |||
ASSERT(mwctst::ScopedLogObserverUtil::recordMessageMatch( | |||
logObserver.records().back(), | |||
"ALARM \\[QUEUE_CONSUMER_MONITOR\\].*It currently has 2 consumers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, there is no "test consumer 1/2" log because it is outside of the QueueConsumptionMonitor
now.
That is fine.
Question, maybe we do not need d_consumer1/2
now?
Maybe, the Test
can shrink now since the state of monitored objects is factored out by the Test::LoggingCb
?
If that is the case, we can remove Test::d_queue
, Test::d_queueState
, Test::d_domain
, Test::d_cluster
, and Test::createClient
. Possibly, d_storage
as well, if Test::putMessage()
changes the state of Test
instead of the Test::d_storage
(is that Test::d_advance
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the test by simplifying logic and removing consumers, but still need Test::d_queueState
and its dependencies, because it is required for d_monitor
.
BSLS_ASSERT_SAFE(d_queueState_p->queue()->dispatcher()->inDispatcherThread( | ||
d_queueState_p->queue())); | ||
|
||
// Construct AppId from appKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace the App lookup code with
Apps::const_iterator cItApp = d_apps.findByKey2(AppKeyCount(appKey, 0));
AppKeyCount
is an (hopefully, not for too long) artefact of our transitioning from non-CSL to CSL way of registering Apps. In short, only registered Apps are supposed to have storage and consumption monitoring. And registered Apps should have 0
as the "count" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced as proposed.
bdlma::LocalSequentialAllocator<4096> localAllocator(d_allocator_p); | ||
|
||
bmqt::Uri uri(&localAllocator); | ||
uriBuilder.uri(&uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_queueState_p->uri()
<< " consumers." << ss.str() << '\n'; | ||
|
||
// Log un-delivered messages info | ||
mqbi::Storage* const storage = d_queueState_p->storage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use storage
or d_queueState_p->storage()
everywhere?
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
QueueConsumptionMonitor::State::e_ALIVE); | ||
ASSERT_EQ(logObserver.records().size(), expectedLogRecords); | ||
} | ||
|
||
TEST_F(Test, logFormat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test originally tested the log format, but now logging is done by the callback, so this test makes no sense.
@@ -727,122 +497,6 @@ TEST_F(Test, putAliveIdleSendAliveTwoSubstreams) | |||
ASSERT_EQ(d_monitor.state(key2), QueueConsumptionMonitor::State::e_ALIVE); | |||
} | |||
|
|||
TEST_F(Test, putAliveIdleSendAliveTwoSubstreamsTwoConsumers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since consumers are removed, this test becomes the same as above, remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
/// `enableLog` is `true` it logs alarm data. Return `true` if there are | ||
/// un-delivered messages and `false` otherwise. | ||
bool logAlarmCb(const mqbu::StorageKey& appKey, | ||
const bool enableLog) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const bool enableLog) const; | |
bool enableLog) const; |
Not necessary to qualify as const for a basic type in arguments
@@ -229,6 +215,7 @@ void QueueConsumptionMonitor::onTimer(bsls::Types::Int64 currentTimer) | |||
// PRECONDITIONS | |||
BSLS_ASSERT_SAFE(d_queueState_p->queue()->dispatcher()->inDispatcherThread( | |||
d_queueState_p->queue())); | |||
BSLS_ASSERT_SAFE(d_loggingCb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BSLS_ASSERT_SAFE(d_loggingCb); |
Might remove this precondition since we set this up in the only constructor and check it right there already. We don't have setters or other way to change d_loggingCb
to an invalid value
@@ -210,8 +210,7 @@ class QueueConsumptionMonitor { | |||
static const char* toAscii(Transition::Enum value); | |||
}; | |||
|
|||
typedef bsl::function<bslma::ManagedPtr<mqbi::StorageIterator>(void)> | |||
HeadCb; | |||
typedef bsl::function<bool(const mqbu::StorageKey&, bool)> LoggingCb; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to explain here what the first and the second args mean
@@ -220,8 +219,7 @@ class QueueConsumptionMonitor { | |||
struct SubStreamInfo { | |||
// CREATORS | |||
|
|||
SubStreamInfo(const HeadCb& headCb); | |||
SubStreamInfo(const SubStreamInfo& other); | |||
SubStreamInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good simplification
/// Update the specified 'subStreamInfo', associated to the specified | ||
/// 'appKey', and write log, upon transition to alive state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Update the specified 'subStreamInfo', associated to the specified | |
/// 'appKey', and write log, upon transition to alive state. | |
/// Update the specified `subStreamInfo`, associated to the specified | |
/// `appKey`, and write log, upon transition to alive state. |
int idx = 1; | ||
int numConsumers = 0; | ||
|
||
QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); | |
const QueueEngineUtil_AppState::Consumers& consumers = app->consumers(); |
out << k_EXPR_NUM_LIMIT << " of " | ||
<< " consumer subscription expressions: "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out << k_EXPR_NUM_LIMIT << " of " | |
<< " consumer subscription expressions: "; | |
out << k_EXPR_NUM_LIMIT << " of " | |
<< "consumer subscription expressions: "; |
Double space here, also, from the log itself it's not clear that not all of the existing expressions were printed due to limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added First
word in the beginning for clarity, e.g. First 50 of consumer subscription expressions:
app->putAsideList().first()); | ||
if (rc == mqbi::StorageResult::e_SUCCESS) { | ||
// Log timestamp | ||
out << "Oldest message in a 'Put aside' list:\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out << "Oldest message in a 'Put aside' list:\n"; | |
out << "Oldest message in the 'Put aside' list:\n"; |
Since we are logging info about concrete queue
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | ||
<< rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | |
<< rc; | |
BALL_LOG_WARN << "Failed to streamIn MessageProperties, rc = " | |
<< rc; | |
out << "Message Properties: Failed to acquire [rc: " << rc << "]\n"; |
Do we want to print this in the alarm record itself?
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | ||
<< app->putAsideList().first() << ", rc = " << rc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | |
<< app->putAsideList().first() << ", rc = " << rc; | |
BALL_LOG_WARN << "Failed to get storage iterator for GUID: " | |
<< app->putAsideList().first() << ", rc = " << rc; | |
out << "'Put aside' list: Failed to acquire [rc: " << rc << "]\n"; |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
@678098 thank you, applied your suggestions. |
Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> fixing Solaris build (bloomberg#434) Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com> Remove `-DBMQ_ENABLE_MSG_GROUPID` from the build system We do not ever want to build with this flag when releasing, and users often manage to enable this flag accidentally. Because message group IDs are not fully implemented, we remove this temporary definition. It can be added in later if we ever come back to this feature. Signed-off-by: Patrick M. Niedzielski <patrick@pniedzielski.net> Make unit tests pass without `BMQ_ENABLE_MSG_GROUPID` The unit tests currently assume that message group IDs are enabled, and since have updated our build system to no longer enable this feature, the unit tests now fail in CI. This patch guards the message group ID tests with preprocessor conditionals, disabling the parts of tests that try to set and check message group IDs. When `BMQ_ENABLE_MSG_GROUPID` is set, these parts of the unit tests run again. Signed-off-by: Patrick M. Niedzielski <patrick@pniedzielski.net> Fix mqbstat doc formatting (bloomberg#438) Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> Fix[bmqeval]: limit expression length to avoid stack overflow (bloomberg#441) Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net> Fix Solaris unit tests (bloomberg#440) Signed-off-by: Anton Pryakhin <apryakhin1@bloomberg.net> Docs[BMQ]: Use `.dox` files rather than `.md` files Package group documentation in `libbmq` was converted to Markdown files named `README.md`, and which was tied to the directory containing the code for the package group using Doxygen `@dir` commands. However, when generating the documentation, this left several empty pages in the documentation named `README`, which we were not able to remove. The solution for this that this patch uses is to switch from `.md` files to `.dox` files, which contain a single Doxygen-style C++ comment containing the `@dir` command. Unlike `.md` files, these do not automatically create pages, so there is no empty `README` page created for each package group. The cost of this is that `.dox` files cannot be simple Markdown files, but instead need to be wrapped in a C++ comment. Signed-off-by: Patrick M. Niedzielski <patrick@pniedzielski.net> Docs[BMQ] bde -> doxygen conversion fixes (bloomberg#443) * Doc[BMQT] minor bde -> doxygen docs * Doc[BMQA] minor bde -> doxygen docs * Doc[BMQA] re-wrap data member comments * Doc[BMQT] re-wrap data member comments * Apply suggestions from code review --------- Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> Signed-off-by: Chris Beard <chrisbeard@users.noreply.github.com> Co-authored-by: Evgeny Malygin <678098@protonmail.com> Feat: track queue depth per appId (bloomberg#320) Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net> configurator, bmqit: mode protos (bloomberg#447) Signed-off-by: Jean-Louis Leroy <jleroy9@bloomberg.net> Revert "configurator, bmqit: mode protos (bloomberg#447)" (bloomberg#449) This reverts commit a4b20db. Fix[mqbs_virtualstoragecatalog.cpp]: fix Solaris build (bloomberg#450) Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net> fix: configurator: apply app ids (bloomberg#452) Signed-off-by: Jean-Louis Leroy <jleroy9@bloomberg.net> Fix [MQB]: mqbc::StorageMgr: Transition to available only when all primary active (bloomberg#416) * mqbc::StorageMgr: Ban 'processPrimaryStatusAdvisory' in non-FSM mode Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> * mqbc::StorageMgr: Transition to available only when all primary active Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> * mqbc::StorageMgr: clang-format Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> * mqbc::StorageMgr: Healing replica buffers primary status advisories Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> * mqbs::FileStore: Rename setPrimary -> setActivePrimary Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> * mqbc::StorageMgr: Comment about check if all partitions available Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> --------- Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> Fix some compiler warnings in mqb (bloomberg#455) * -Wunused-parameter * -Wshadow * -Wswitch-enum Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> It: Include full path for admin stat it test failures (bloomberg#453) * It: Include full path for admin stat it test failures This patch makes it a little easier to debug the metric & operation that causes an integration test for stats to fail. Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Update src/integration-tests/test_admin_client.py Co-authored-by: Evgeny Malygin <678098@protonmail.com> Signed-off-by: Chris Beard <chrisbeard@users.noreply.github.com> --------- Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> Signed-off-by: Chris Beard <chrisbeard@users.noreply.github.com> Co-authored-by: Evgeny Malygin <678098@protonmail.com> Feat: Add queue history size metric (bloomberg#436) * [WIP] Feat: Add queue history size metric This adds a new queue metric that counts the number of GUIDs in that queue's history. This is useful for identifying excessive memory utilization from history and potential history garbage collection issues (where history is filled up faster than it's cleaned up). Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * It: Extend admin it for history size stat Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> --------- Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> Feat[plugins]: report queue depth per appId to prometheus (bloomberg#446) Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net> [Fix] m_bmqstoragetool::FileManagerImpl: Asserts not have side effects (bloomberg#461) Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> Feat[MQB]: Enhance queue consumption monitor alarm log with additional details (bloomberg#420) Enhance filebackedstorage alarm log Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Cleanup Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Add test to mqbu_capacitymeter.t Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> mqbc::StorageUtil, mqbi::StorageMgr: updateQueue -> updateQueuePrimary (bloomberg#466) Signed-off-by: Yuan Jing Vincent Yan <yyan82@bloomberg.net> Fix[MQB]: misc warnings (bloomberg#464) Allow dots in subscription property names Message properties allow arbitrary strings for property names, but our subscription expression language is more limited, requiring an initial alphabetic character followed by any number of alphanumeric characters and underscores. Some producers have begun using hierarchical message property names, separated by dots (“.”), and are unable to use subscriptions to filter or route according to these message properties. This patch extends the expression language grammar to enable matching on subscription property names with dots in them. This change is a pure extension: the language recognized by the subscription expression grammar after this patch is a strict superset of the language recognized by the subscription expression grammar before this patch. This patch also extends the unit test for the lexer to ensure this is a strict superset. Signed-off-by: Patrick M. Niedzielski <patrick@pniedzielski.net> fix: clean app subscriptions on reconfigure Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com> Fix[mqbstat_domainstats.cpp]: empty tier StringRef (bloomberg#431) Signed-off-by: Evgeny Malygin <emalygin@bloomberg.net> Fix Solaris build, it does not support ctor delegation Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Doc: Document app subscriptions (bloomberg#463) * Docs upgrade jekyll -> 4.3.3 Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Docs: Document app subscriptions Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Expand on difference in subscriptions Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Minor subscription doc clarifications Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Elaborate on subscription details Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> * Clarify consumer subscription on broker Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> --------- Signed-off-by: Christopher Beard <cbeard9@bloomberg.net> fix: enhanced detection of duplciate PUSHes (bloomberg#472) Signed-off-by: dorjesinpo <129227380+dorjesinpo@users.noreply.github.com> Fix ntf-core version in build_darwin.sh Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Add logAppsSubscriptionInfoCb into InMemoryStorage Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Add IT for capacity meter enhanced log Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Fix comments Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Fix [CI] ntf-core version for macosx build (bloomberg#473) Merge mwc into bmq MWC or "MiddleWare Core" was a package group developed to support a myriad of applications at Bloomberg. It's been useful to share common middleware components between similar technologies, but doesn't make much sense to support as its own open source library. Moving forward we are merging it into the BMQ package group to better maintain it for the BlazingMQ project. Signed-off-by: Taylor Foxhall <tfoxhall@bloomberg.net> Fix conflict Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Fix conflict Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net> Fix mwctst Signed-off-by: Aleksandr Ivanov <aivanov71@bloomberg.net>
When a queue starts to fill up, it is valuable to see information about which AppIds are impacted, and information about the messages in the queue.
Especially in the case of subscriptions (which we are enabling for everyone now), messages that match no subscription expression will build up in the put aside list.
To help make this situation clearer to operators and users (what apps are impacted, why are messages building up, how old is the head of the queue for each app, etc), we can log more information when the watermark alarm is triggered:
storage()->capacityMeter()->printShortSummary()
);This is to help debug why a message doesn't match a subscription.
Alarm log looks like this:
Implementation details:
QueueConsumptionMonitor
intoRootQueueEngine
class, where more data is available;QueueConsumptionMonitor
and called in case of alarm to log alarm data;