-
Notifications
You must be signed in to change notification settings - Fork 231
Bigger upstream merge batch #431
Bigger upstream merge batch #431
Conversation
Gatherer: add a new --min-db-size-mb flag to ignore empty DBs
Some new views are more nische, so don't add the to default configs though.
Expand only if the value starts with a "$" - can cause invalid passwords otherwise if a random generated password contains a $ in the middle. This still leaves a hole for cases where random generated password has $ on the 1st positiona, but still better than currently
Go 1.17.2, Ubuntu 20.04
other values. Live queries on dozens of hosts can still cause scrape timeouts
Also add --max-parallel-connections-per-db to make max connections tunable
Also init connections from main loop only now and bundle setting of statement_timeout with the real query
…onfig Also don't check for DB size on scrape if size limiting flag not enabled.
And downscale max. conns for dormant DBs
via PW2_MAX_PARALLEL_CONNECTIONS_PER_DB
Main idea of the feature is to be able to quickly free monitored DBs / network of any extra "monitoring effect" load. In highly automated K8s / IaC environments such a temporary change might involve pull requests, peer reviews, CI/CD etc which can all take too long vs "exec -it pgwatch2-pod -- touch /tmp/pgwatch2-emergency-pause". NB! After creating the file it can still take up to --servers-refresh-loop-seconds (2min def.) for change to take effect!
Currently the startup can freeze for long periods on incorrect host info or in case of real network problems / throttling
Skips metric definitions relying on helper functions. Makes working managed instances a bit more fluid with less errors in logs. Use the SU or superuser version of a metric immediately when available and not after the 1st failed call.
Even dormant DBs should be checked "live" in standard (non-async) Prom mode for "up" state
To put less stress on the monitoring system
This will help in detecting catalog bloat that can massively slow down session startup
Also correct min. version for the SQL definition
To identify the origin of queries one might see in pg_stat_statements
These will be anyway dropped on Prom scraper side with the message "Error on ingesting samples that are too old or are too far into the future"
…k_timeout of 5 seconds. This allows setting longer statement timeouts without worries
For YAML / Prom mode only
As seems that under some extreme workloads in connection pooling mode it was not guaranteed that "set stmt_timeout to X; select ... metrics;" was executed on a single connection and timeouts from other sessions became effective for the metric query. Also reduce lock_timeout to 100ms.
For systems with very slow FS - we just use the approximate table size based on relpages. NB! Might be very out of date if no Autovacuum / Vacuum recently
Worked only in sync mode and non-prom modes
Use approximation due to super slow FS access
To be used instead of 'db_size' on large Azure Single Server instances
No need for TX as sequential execution is guaranteed then
metric definition file also
…esult gave 0 result Currentlly we "lie" for up to 10min if no new rows are returned
Main use case is to reduce the annoying 'No such extension pg_stat_statements' errors, although the extension is activated but just not created. Also skipping superuser checks here deliberately, just try to create.
mode Removing ReadOnly flag from go_sql.TxOptions.
What is the purpose of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kudos for a huge amount of work! Are these changes were tested somewhere in a field?
@pashagolub Yes, it's running at Cognite in production for 100+ DBs with no problems...at least so far :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, my main concern was if the code base is tested in production, so I'm fine to merge it and work on small issues in separate branches to make life easier for everyone. Don't think we should bloat this PR with more commits.
Thanks for a hard work, @kmoppel-cognite! 🤞
Seems like a lot of stuff but should be no breaking changes. Main things added: