Skip to content

Fixes average stats bug #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions src/stats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,19 @@ impl Collector {
loop {
interval.tick().await;

for stats in SERVER_STATS.read().values() {
stats.address_stats().update_averages();
// Hold read lock for duration of update to retain all server stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. We only have one instance of the collector running, so there is only one task updating averages. How can we run update_averages() twice or more even with multiple server connections connected to the same address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue here comes from all the servers having the same reference to the AddressStats. When we update address stat's old_total. The next server that we try to update averages for will see no difference between the current total and the old total cause it was updated in the previous iteration of the loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an example of this in the description of the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah okay. Confusing! Arcs are tricky. Ok, sounds good, I think this is fine for now. Would be good to investigate further and maybe remove the leaked reference?

let server_stats = SERVER_STATS.read();

for stats in server_stats.values() {
if !stats.check_address_stat_average_is_updated_status() {
stats.address_stats().update_averages();
stats.set_address_stat_average_is_updated_status(true);
}
}

// Reset to false for next update
for stats in server_stats.values() {
stats.set_address_stat_average_is_updated_status(false);
}
}
});
Expand Down
14 changes: 8 additions & 6 deletions src/stats/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ pub struct AddressStats {
pub avg_xact_time: Arc<AtomicU64>,
pub avg_xact_count: Arc<AtomicU64>,
pub avg_wait_time: Arc<AtomicU64>,

// Determines if the averages have been updated since the last time they were reported
pub averages_updated: Arc<AtomicBool>,
}

impl IntoIterator for AddressStats {
Expand Down Expand Up @@ -114,15 +117,14 @@ impl AddressStats {

pub fn update_averages(&self) {
let (totals, averages, old_totals) = self.fields_iterators();
for data in itertools::izip!(totals, averages, old_totals) {
let (total, average, old_total) = data;
let total = total.load(Ordering::Relaxed);
let old = old_total.load(Ordering::Relaxed);
for (total, average, old_total) in itertools::izip!(totals, averages, old_totals) {
let total_value = total.load(Ordering::Relaxed);
let old_total_value = old_total.load(Ordering::Relaxed);
average.store(
(total - old) / (crate::stats::STAT_PERIOD / 1_000),
(total_value - old_total_value) / (crate::stats::STAT_PERIOD / 1_000),
Ordering::Relaxed,
); // Avg / second
old_total.store(total, Ordering::Relaxed);
old_total.store(total_value, Ordering::Relaxed);
}
}

Expand Down
11 changes: 11 additions & 0 deletions src/stats/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,17 @@ impl ServerStats {
self.address.stats.clone()
}

pub fn check_address_stat_average_is_updated_status(&self) -> bool {
self.address.stats.averages_updated.load(Ordering::Relaxed)
}

pub fn set_address_stat_average_is_updated_status(&self, is_checked: bool) {
self.address
.stats
.averages_updated
.store(is_checked, Ordering::Relaxed);
}

// Helper methods for show_servers
pub fn pool_name(&self) -> String {
self.pool_stats.database()
Expand Down
13 changes: 7 additions & 6 deletions tests/ruby/admin_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@
describe "SHOW STATS" do
context "clients connect and make one query" do
it "updates *_query_time and *_wait_time" do
connection = PG::connect("#{pgcat_conn_str}?application_name=one_query")
connection.async_exec("SELECT pg_sleep(0.25)")
connection.async_exec("SELECT pg_sleep(0.25)")
connection.async_exec("SELECT pg_sleep(0.25)")
connection.close
connections = Array.new(3) { PG::connect("#{pgcat_conn_str}?application_name=one_query") }
connections.each do |c|
Thread.new { c.async_exec("SELECT pg_sleep(0.25)") }
end
sleep(1)
connections.map(&:close)

# wait for averages to be calculated, we shouldn't do this too often
sleep(15.5)
admin_conn = PG::connect(processes.pgcat.admin_connection_string)
results = admin_conn.async_exec("SHOW STATS")[0]
admin_conn.close
expect(results["total_query_time"].to_i).to be_within(200).of(750)
expect(results["avg_query_time"].to_i).to_not eq(0)
expect(results["avg_query_time"].to_i).to be_within(20).of(50)

expect(results["total_wait_time"].to_i).to_not eq(0)
expect(results["avg_wait_time"].to_i).to_not eq(0)
Expand Down