-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log the tables sizes and statistics with db activity #22665
Log the tables sizes and statistics with db activity #22665
Conversation
dc70e6b
to
41babce
Compare
output.warn("MIQ(#{name}.#{__method__}) Unable to log stats, '#{err.message}'") | ||
end | ||
def self.log_table_statistics(output = $log) | ||
stats = ApplicationRecord.connection.table_statistics.sort_by { |h| h['rows_live'] }.reverse! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just sort by table name and log in slices of 50 or so to avoid the log line limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean log separate CSVs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also how often is this logging? I'm concerned we're going to flood the logs and just cause other problems.
I'm wondering if it's better to just extract a simple hash of table_name => rows_live and log that with .inspect as a one-liner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was considering dropping CSV too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is run every 30 minutes... it's using the existing logging of pg stat activity
:db_diagnostics_interval: 30.minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's better to just extract a simple hash of table_name => rows_live and log that with .inspect as a one-liner
I can try pulling out just the fields we want but I guess we don't always know what we want.
table size
table_name,rows,pages,size,average_row_size
table stats
table_name,table_scans,sequential_rows_read,index_scans,index_rows_fetched,rows_inserted,rows_updated,rows_deleted,rows_hot_updated,rows_live,rows_dead,last_vacuum_date,last_autovacuum_date,last_analyze_date,last_autoanalyze_date
As it is now, we need rows and the the dates for last auto vacuum/analyze. I guess the rest could be useful. If it's every 30 minutes, it seems worthwhile to capture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we only get approximate row counts by looking at the live/dead rows from autovacuum logging, which we might not capture due to log rotation in the postgresql logs or for tables that haven't vacuumed in the time window of the logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is run every 30 minutes... it's using the existing logging of pg stat activity
I really don't like this going into the main log due to the size and frequency and it's very hard to grep -v it. Would prefer a separate log, or a dedicated file or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have stdout on podified. Yeah, for grep vs. readability, I don't know. All I know is we're blind right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, we could probably trim out the toasts and indexes but the mechanisms to just exclude them wasn't there so I didn't try that yet.
log_table_statistics(output) | ||
end | ||
|
||
def self.log_csv(keys, stats, label, output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be
def self.log_csv(keys, stats, label, output) | |
private_class_method def self.log_csv(keys, stats, label, output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need csv? The only reason we had this was because it was easier for the old log scraper from THenn back in the ManageIQ Inc days. I always thought it was weird that the logs had CSV inside them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was done as a CSV to avoid having to print the same keys over and over again. If you do a hash, at least half of each row of data will be printing the keys, unless we abbreviate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy I'm fine with changing formats. The reason CSV was used was because of the single header row instead of repeating the same header values in other structures such as keys in a hash or json. As it is though, the log line limit gets reached so we can't even log the CSV in a single line so we'll need to split up anyway.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a threshold for printing? Like maybe only the top 10, or perhaps any table over X rows/tuples, or something? |
Just to be clear - I'm 100% for this PR, but would prefer if we can get it into a format that's not onerous for diagnosing issues that don't need this information and also that doesn't increase the log sizes too dramatically |
Yeah, not sure what that threshold is. I need to see the vacuum/analyze information and it's not always based on row size. For example, with high churn on ext management systems, we'd never print vacuum/analyze or live/dead/row counts as it's not a high row table. This was an idea to start the discussion. |
It's helpful to log the tables sorted by row counts and other information such as the autovacuum/analzye times. Even if we truncate the line to 8k (default for the logger), the largest row tables should be represented.
41babce
to
9bd5b13
Compare
Checked commits jrafanie/manageiq@ec6569b~...9bd5b13 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint spec/models/vmdb_database_connection_spec.rb
|
@@ -462,6 +462,7 @@ def table_size | |||
FROM pg_class | |||
WHERE reltuples > 1 | |||
AND relname NOT LIKE 'pg_%' | |||
AND relkind = 'r' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the above two changes to eliminate system tables and only include ordinary tables and not indexes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may want to reltuples > 0
to only display rows with a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of WHERE reltuples > 1
that's there? I guess I can. It's basically the same thing, the idea was to keep the original query and only normal tables and not indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry - didn't see that. keep what is there
Just saw the example had a bunch of tables that it seems should not display (as they have 0 rows)
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the |
3 similar comments
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the |
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the |
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the |
Stale, closing |
It's helpful to log the tables sorted by row counts and other information such as the autovacuum/analzye times.
Even if we truncate the line to 8k (default for the logger), the largest row tables should be represented.
Note, this is run every 30 minutes and queued up for the server by the schedule worker. This will be run on all servers, even if they all use the same database, so hopefully, one server log or orchestrator log will provide the information on the high level row counts, sizes, vacuum/analyze, etc.
Here is an example of what it looks like locally using the many short lived containers database: