Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VimPerformanceState do not bring back archived records #22589

Merged
merged 4 commits into from
Aug 1, 2023

Conversation

kbrock
Copy link
Member

@kbrock kbrock commented Jun 27, 2023

reference:

Some archived node references got into VimPerformanceState#state_data
When pulling back these records, this caused a lot of data to go over the wire

Before

VimPerformanceState#container_groups and containers brings back all nodes in state_data.

After

VimPerformanceState#container_groups and containers only brings back nodes in state_data that are active (deleted_at is not set) at the time of capture.

Numbers

#22579 made the query larger - lost some time there b/c there are so many records.
still concerned around this

  • last commit is after-fetch-time
  • before that commit is after-fetch
ms bytes objects query qry ms rows comments
1,945.5 41,916,080* 1,024,324 9 1,516.0 6,859 master-before-#22579
5,509.5 47,162,198* 1,115,818 9 5,146.4 6,859 master
1,699.1 19,618,021* 283,526 9 1,519.6 79 after-fetch
1,664.1 15,672,327* 225,826 5 1,550.8 75 after-fetch-time
14.4% >63% 78% 44% -2% 99% diff

Comments and explanation:

  • This example was chosen to highlight what happens when there are a lot of archived records in the list. Infra examples and providers with smaller churn will have smaller gains.
  • Do note the decrease in the number of rows brought back from the same number of queries. These are basically metrics and archived containers that were not brought back.
  • Only load performance states for the records of interest  #22579 introduces a query that contains the ids in question, so it is physically larger, and therefore increases the query time. Since after-fetch reduced the number of ids in question, the code no longer produced such a big query, and the timing went back down. But it is still bigger than if that PR had not been merged. (but potentially the number of rows brought back have been reduced in many cases)
  • The reduction in query count (after-fetch-time) is due to using the iso8601 format in our cache. We find the VimPerformanceState record on the parent object and do not need to fetch it multiple times.
  • The memory usage chart below says that on the master branch, it GC'd 280k objects, so the bytes displayed does not reflect those objects. The object count is correct. The memory used is probably 60M (you can get that number by percentage of objects or from the 283k objects number in after-fetch) and the memory savings was closer to 74%. Which again checks out from the object savings.
* Memory usage does not reflect 283,543 freed objects. => ["naq6czdxziv7ka4f1ifn"]
* Memory usage does not reflect 286,616 freed objects. => ["3vpqnqo79olcvjsk64x9"]
* Memory usage does not reflect 2 freed objects. => ["x5l6rlsllem3hltnrsok"]
* Memory usage does not reflect 2 freed objects. => ["b7w5q1426nhah0tkhwny"]

return nil unless self.respond_to?(:vim_performance_states)

@states_by_ts ||= {}
state = @states_by_ts[ts]
state = @states_by_ts[isots]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just for my education, we were using Time or String objects as keys in this hash and now we're using Time exclusively. Is that right?

Copy link
Member Author

@kbrock kbrock Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before:

@states_by_ts[time.to_s] => VimPerformanceState
# sometimes that is iso8601, other times a different format

After

@states_by_ts[time.iso8601] => VimPerformanceState

To ensure it is iso8601, we are passing in a date and the method is converting it for us.
When I was running this, it got rid of 4 lookups. But in other places, it could show up as well.

def vim_performance_state_for_ts(ts)
ts = Time.parse(ts).utc if ts.kind_of?(String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any callers still passing string objects? It looks like you are now passing Time objects in this PR.

Copy link
Member Author

@kbrock kbrock Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make the switch, but callers of this model use a big mix of String and Time objects. Not all of them can be forced to Time objects. Some of the callers use other date methods that only accept a String, so I don't want to change this and introduce issues.

app/models/metric/rollup.rb Outdated Show resolved Hide resolved
Comment on lines 9 to 11
scope :active_for, (lambda do |timestamp|
unscope(:where => :deleted_on).where(arel_table[:deleted_on].eq(nil).or(arel_table[:deleted_on].gteq(timestamp)))
end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lambda in parens thing looks weird to be, when we typically use -> notation. Can we do the following?

Suggested change
scope :active_for, (lambda do |timestamp|
unscope(:where => :deleted_on).where(arel_table[:deleted_on].eq(nil).or(arel_table[:deleted_on].gteq(timestamp)))
end)
scope :active_for, ->(timestamp) {
unscope(:where => :deleted_on).where(arel_table[:deleted_on].eq(nil).or(arel_table[:deleted_on].gteq(timestamp)))
}

Copy link
Member Author

@kbrock kbrock Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had -> and rubocop asked me to use lambda.

maybe drop the ()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried other formats, and that is what rubocop wants. The parens on the outside, the do/end, and the lambda

I'm updating to the format you suggested since that is what I originally had, and I agree, looks better.

maybe we'll update rubocop some day

Style/Lambda: Use the lambda method for multiline lambdas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should change rubocop haha - it's fine, then - we can worry about it later.

@@ -4,6 +4,11 @@ module ArchivedMixin
included do
scope :archived, -> { where.not(:deleted_on => nil) }
scope :active, -> { where(:deleted_on => nil) }

# records are active from the date where they were created until the date they are deleted
scope :active_for, (lambda do |timestamp|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is hard to wrap my head around. Probably, because active doesn't make sense to me for anything that has been deleted.

How about the defining the opposite:
deleted on older than timestamp. Maybe call it archived_older_than and just use a .not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or archived_before and use .not.archived_before(timestamp)

Copy link
Member Author

@kbrock kbrock Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the first scope. not sure how to get a scope and a not working together.
(It almost feels like we could use not.active for the archived.)

I can't figure out how to use a not with a scope. All of my attempts are invalid. I haven't even tried to do this with a parameter.

Container.where.not.active
Container.where.not(Container.active)

I was using but decided to drop:

    scope :archived_for, ->(timestamp) {
      unscope(:where => :deleted_on).where(arel_table[:deleted_on].lt(timestamp)))
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this new naming - the new name is complicated to me and the old one wasn't. Naming is hard™️ 😆

Let's discuss over voice - @jrafanie I'm having trouble understanding what was confusing about the original wording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss over voice - @jrafanie I'm having trouble understanding what was confusing about the original wording.

active_for including recently deleted entries along with not deleted is strange. Maybe recently_active_since would be fine if you prefer this logic.

expect(klass.active_for(2.years.ago)).to eq([archived_model])
end

it "detects already deleted records are not in the range" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These its here make it obvious to me that the naming is tough. I think this becomes easier to "describe" if it's archived_before or similar as it's clear, we're only looking for objects that have been deleted and the deletion date is before some value.

Copy link
Member Author

@kbrock kbrock Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so instead of active_for, we could use not_archived_before?

I liked active since that means it was both not archived before AND it was possibly not ever archived, but I pushed that and lets see how it goes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the other thread, let's discuss over voice. Switching to archived_before ignores the fact that created_on will play a part in the detection as well, it just happens to not play a part of the method right now. That is, objects created after will also be removed from the scope and that doesn't make sense with a method named archived_before. That being said, maybe it does make sense to make the naming match the current semantics, and then change it later? I'm not sure, hence why I want voice.

FWIW, I think the confusion lies in the word "deleted", because there are actually deleted record, and then there are archived records with a deleted_on value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we spoke about taking into account created_on. But that was when we were creating a VimPerformanceState in the past. If we create a record in the past, then new records will be created and we will need to make sure they do not get into the state_data.

Since we have agreed that for this current code, we will only run for Time.now, and we will never see future records. I wasn't able to come up with a use case where we would need to take into account created_on. So I felt ignoring that for now made sense.

If the logic to take into account were as simple as created_on, then that may make sense, but since we will not need that feature in the near future, adding that complexity did not make sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh but yea, I liked the original name too.
active means not archived and therefore deleted_on is not set. But really, I guess that means active(Time.now).

active_on(date) / active_for(daterange) means that on a particular date, it hasn't been deleted (and as discussed, it has been created / ...)

ASIDE: Another reason why I got out of the timerange game is because get_time_interval seems to suggest that the timeframe ends on the VimPerformanceState#timestamp rather than starts there. that confused me. Since this is more strategic, punting seemed like a plan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the name, but I'm not going to let it hold up the PR, so 👍

@kbrock kbrock force-pushed the capture_vim_performance_fetch branch 2 times, most recently from e186f7f to 1676c21 Compare June 27, 2023 23:09
@kbrock
Copy link
Member Author

kbrock commented Jun 27, 2023

update:

  • changed lambda for scope
  • changed active_for to not_archived_before
  • Added spacing and apostrophes to tests.
  • changed tsiso to ts_iso
  • simplified test helper that generates deleted_ids

@kbrock kbrock force-pushed the capture_vim_performance_fetch branch from 1676c21 to ad3051c Compare June 27, 2023 23:15
@kbrock kbrock force-pushed the capture_vim_performance_fetch branch from ad3051c to 555ed3b Compare June 28, 2023 17:25
@kbrock
Copy link
Member Author

kbrock commented Jun 28, 2023

update:

  • comment changes only

@kbrock kbrock changed the title Capture vim performance fetch VimPerformanceState do not bring back archived records Jun 28, 2023
@jrafanie
Copy link
Member

I ran this on a DB with 63,000+ containers of which nearly 57,000 were deleted (short lived containers) and I'm seeing between 20 and 30% improvement in the best and worst case scenarios.

irb(main):002:0> Container.where(:deleted_on =>nil).count
=> 7449
irb(main):011:0> Container.where.not(:deleted_on => nil).count
=> 56866

Before this PR

Initial call to ContainerImage.find(340).perf_rollup(timestamp, 'hourly'):

Best and worst case:

Elapsed Time timestamp
5.1 2023-06-21T21:00:00Z
12.8 2023-06-25T21:00:00Z

After this PR

Initial call to ContainerImage.find(340).perf_rollup(timestamp, 'hourly'):

Best and worst case:

Elapsed Time timestamp
4.1 2023-06-21T21:00:00Z
9 2023-06-26T21:00:00Z

@jrafanie
Copy link
Member

Here's more context for what I ran...

With a breakdown of deleted >=...

irb(main):003:0> Container.where("deleted_on >= '2023-06-26T21:00:00Z'").count
=> 0
irb(main):004:0> Container.where("deleted_on >= '2023-06-25T21:00:00Z'").count
=> 140
irb(main):005:0> Container.where("deleted_on >= '2023-06-24T21:00:00Z'").count
=> 383
irb(main):006:0> Container.where("deleted_on >= '2023-06-23T21:00:00Z'").count
=> 661
irb(main):007:0> Container.where("deleted_on >= '2023-06-22T21:00:00Z'").count
=> 940
irb(main):008:0> Container.where("deleted_on >= '2023-06-21T21:00:00Z'").count
=> 42244
irb(main):009:0> Container.where("deleted_on >= '2023-06-20T21:00:00Z'").count
=> 56866

Here's the details of what I ran:

git checkout db9f115b9a51feae766d5b0669afe71adde690b3

bin/rails c --sandbox

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-26T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
11.931384
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-26T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
5.681721

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-25T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
12.834763
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-25T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
6.197368

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-24T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
12.583063
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-24T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
6.065942

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-23T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
12.051762
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-23T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
5.615315

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-22T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
7.841103
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-22T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
5.694821

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-21T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
5.069923
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-21T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.49756

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-20T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
11.887683
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-20T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
5.173446



git checkout capture_vim_performance_fetch

bin/rails c --sandbox

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-26T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
8.999135
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-26T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.413216

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-25T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
8.947557
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-25T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.700756

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-24T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
8.920945
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-24T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.726082

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-23T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
8.886253
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-23T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.630454

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-22T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
4.698924
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-22T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.535702

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-21T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
4.07685
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-21T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
1.373919

irb(main):001:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-20T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
8.735797
irb(main):002:0> toggle_console_sql_logging; before = Time.now; ContainerImage.find(340).perf_rollup('2023-06-20T21:00:00Z', 'hourly'); puts (Time.now - before); toggle_console_sql_logging
2.546781

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 20-30% improvement in a db with high numbers of short lived containers seems like a nice win!

@jrafanie
Copy link
Member

LGTM, 20-30% improvement in a db with high numbers of short lived containers seems like a nice win!

EDIT:
I also measured memory... it seems to drop from around 925 MB BEFORE this PR and 775 MB after. After forced GC, it drops to around 675 MB before this PR and 560 MB after this PR.

@kbrock kbrock mentioned this pull request Jun 30, 2023
@kbrock kbrock force-pushed the capture_vim_performance_fetch branch from 555ed3b to 46877d0 Compare July 4, 2023 06:40
@kbrock
Copy link
Member Author

kbrock commented Jul 5, 2023

update: (sorry, should have commented 2 days ago)

@jrafanie
Copy link
Member

jrafanie commented Jul 5, 2023

update: (sorry, should have commented 2 days ago)

hmm, this links to this PR itself. Can you clarify? Is this ready to go?

@Fryguy
Copy link
Member

Fryguy commented Jul 5, 2023

Merged #22594 - I think it would help if this were rebased now.

@kbrock kbrock force-pushed the capture_vim_performance_fetch branch from 46877d0 to 410964f Compare July 7, 2023 20:15
This tells us if the record is active for a given date range
Don't return records that were captured, but are not active for this range

|       ms |       bytes |   objects |query |  qry ms |     rows |` comments`
|      ---:|         ---:|       ---:|  ---:|     ---:|      ---:|  ---
|  1,945.5 | 41,916,080* | 1,024,324 |    9 | 1,516.0 |    6,859 |`before-ManageIQ#22579`
|  5,509.5 | 47,162,198* | 1,115,818 |    9 | 5,146.4 |    6,859 |`before`
|  1,699.1 | 19,618,021* |   283,526 |    9 | 1,519.6 |       79 |`after`
@kbrock kbrock force-pushed the capture_vim_performance_fetch branch from 410964f to 4c51b13 Compare July 17, 2023 18:26
@kbrock
Copy link
Member Author

kbrock commented Jul 17, 2023

update:

  • removed already extracted/merged commits
  • converted all_container_groups into a scope

This is 3 commits and should be good to go

@miq-bot
Copy link
Member

miq-bot commented Jul 17, 2023

Checked commits kbrock/manageiq@80ca413~...4c51b13 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint
4 files checked, 1 offense detected

app/models/mixins/archived_mixin.rb

@Fryguy Fryguy merged commit f71fa63 into ManageIQ:master Aug 1, 2023
@kbrock kbrock deleted the capture_vim_performance_fetch branch August 1, 2023 22:44
@Fryguy
Copy link
Member

Fryguy commented Aug 2, 2023

Backported to petrosian in commit 68bde37.

commit 68bde37b8b9a49822aeee6c019576a48046d004e
Author: Jason Frey <fryguy9@gmail.com>
Date:   Tue Aug 1 15:16:57 2023 -0400

    Merge pull request #22589 from kbrock/capture_vim_performance_fetch
    
    VimPerformanceState do not bring back archived records
    
    (cherry picked from commit f71fa636653c8b90758b5409bf627e79b7d054c1)

Fryguy added a commit that referenced this pull request Aug 2, 2023
VimPerformanceState do not bring back archived records

(cherry picked from commit f71fa63)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants