-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Reconnect container images when seen again #14808
Conversation
found = reconnect_index.fetch(hash) if reconnect_index | ||
if found | ||
found.update_attributes!(hash.except(:id, :type)) | ||
deletes.delete(found) unless deletes.blank? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should not be in deletes, so we can remove the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too, but wasn't sure if deletes
may ever be larger than association
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deletes should equal association from start, then all that will be left in deletes, after the processing are disconnected. So already disconnected records should not be present, since the association and reconnect_from should be disjoint sets.
@@ -50,10 +50,13 @@ def save_inventory_multi(association, hashes, deletes, find_key, child_keys = [] | |||
remove_keys = Array.wrap(extra_keys) + child_keys | |||
|
|||
record_index = TypedIndex.new(association, find_key) | |||
reconnect_index = TypedIndex.new(reconnect_from, find_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this like can add another huge memory peak, if we collect enough disconnected_records
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, forgot to mention it. This will be heavy, negating the some/all of the win of not saving them :-(
The only other way I could think of was doing one-by-one queries for new records, which is also not good.
It seems refresh fundamentally requires an efficient DB upsert...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I rewrote the saving code in the graph refresh to not do Model.all, but Model.find_each. And we could also limit reconnect_from.where(:ems_ref => [:ems_refs_to_be_created])
but we can't do any of this, unless we would rewrite the old refresh saving code entirely, like in graph refresh. :-)
well, theoretically we could use new_records and the ems_refs there to limit the reconnect_from further, that should help a lot. Since I expect we will be loading just dozens of 'to be reconnected' records, comparing to possible 100k+ of disconnected records in total.
The initial refresh could also build a huge query, so might be good to do it in the batches
This pull request is not mergeable. Please rebase and repush. |
2017d24
to
f2fd447
Compare
All potential `reconnect_from` records are loaded up front, may be heavy? Found matches are added back to the `association`.
f2fd447
to
272e5f1
Compare
Note that if duplicate images have already been created, this will not merge them, will just reuse an arbitrary one.
272e5f1
to
2f6fee0
Compare
Checked commits cben/manageiq@c58a2e1~...2f6fee0 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
This pull request is not mergeable. Please rebase and repush. |
cc @zeari |
@simon3z If the only way we identify containers is by name then they might also re-appear this way and need the same change |
I think containers belong to a pod which has a GUID that will never reappear, no?
Can pod's containers change during its lifetime?
|
I thought they can. if theyre tightly coupled then theres no need. |
Closing as impractical. |
Bug: if we disconnect a container image and then encounter it again, we create a duplicate record.
Scenarios where we may disconnect an image not currently used in any container:
get_container_images
(Add config option to skip container_images #14606)store_unused_images setting
(if ems_refresh.openshift.store_unused_images setting #14662 lands)If then we see the image again, we'll create a duplicate record, which can confuse reports.
This PR makes refresh reconnect (ems_id) an existing image if it has the matching digest.
However, if duplicates have already been created it won't merge them, will just reuse arbitrary one of the copies.
EDIT: TODO: this retains
old_ems_id
anddeleted_on
. This is not OK.We need a more explicit API to reconnect than
assiciation.push
. Maybe add areconnect_inv
method?EDIT: as Ladas points out, loading the records back to RAM is heavy :-(
cc @Ladas @agrare @simon3z @enoodle
Steps for Testing/QA [Optional]
Refresh. Turn off get_container_images in Advanced Settings. Refresh. Turn on. Refresh.
TODO: a way to actually see the dups in reports?
@miq-bot add-label bug, providers/containers
https://bugzilla.redhat.com/show_bug.cgi?id=1488072