Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reconnect container images when seen again #14808

Closed
wants to merge 3 commits into from

Conversation

cben
Copy link
Contributor

@cben cben commented Apr 19, 2017

Bug: if we disconnect a container image and then encounter it again, we create a duplicate record.
Scenarios where we may disconnect an image not currently used in any container:

If then we see the image again, we'll create a duplicate record, which can confuse reports.

This PR makes refresh reconnect (ems_id) an existing image if it has the matching digest.
However, if duplicates have already been created it won't merge them, will just reuse arbitrary one of the copies.

EDIT: TODO: this retains old_ems_id and deleted_on. This is not OK.
We need a more explicit API to reconnect than assiciation.push. Maybe add a reconnect_inv method?
EDIT: as Ladas points out, loading the records back to RAM is heavy :-(
cc @Ladas @agrare @simon3z @enoodle

Steps for Testing/QA [Optional]

Refresh. Turn off get_container_images in Advanced Settings. Refresh. Turn on. Refresh.
TODO: a way to actually see the dups in reports?

@miq-bot add-label bug, providers/containers

https://bugzilla.redhat.com/show_bug.cgi?id=1488072

found = reconnect_index.fetch(hash) if reconnect_index
if found
found.update_attributes!(hash.except(:id, :type))
deletes.delete(found) unless deletes.blank?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should not be in deletes, so we can remove the line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too, but wasn't sure if deletes may ever be larger than association...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletes should equal association from start, then all that will be left in deletes, after the processing are disconnected. So already disconnected records should not be present, since the association and reconnect_from should be disjoint sets.

@@ -50,10 +50,13 @@ def save_inventory_multi(association, hashes, deletes, find_key, child_keys = []
remove_keys = Array.wrap(extra_keys) + child_keys

record_index = TypedIndex.new(association, find_key)
reconnect_index = TypedIndex.new(reconnect_from, find_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this like can add another huge memory peak, if we collect enough disconnected_records

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot to mention it. This will be heavy, negating the some/all of the win of not saving them :-(
The only other way I could think of was doing one-by-one queries for new records, which is also not good.
It seems refresh fundamentally requires an efficient DB upsert...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I rewrote the saving code in the graph refresh to not do Model.all, but Model.find_each. And we could also limit reconnect_from.where(:ems_ref => [:ems_refs_to_be_created])

but we can't do any of this, unless we would rewrite the old refresh saving code entirely, like in graph refresh. :-)

well, theoretically we could use new_records and the ems_refs there to limit the reconnect_from further, that should help a lot. Since I expect we will be loading just dozens of 'to be reconnected' records, comparing to possible 100k+ of disconnected records in total.

The initial refresh could also build a huge query, so might be good to do it in the batches

@miq-bot
Copy link
Member

miq-bot commented Apr 19, 2017

This pull request is not mergeable. Please rebase and repush.

All potential `reconnect_from` records are loaded up front, may be heavy?
Found matches are added back to the `association`.
@cben cben force-pushed the reconnect-container-images branch from f2fd447 to 272e5f1 Compare April 19, 2017 15:34
Note that if duplicate images have already been created,
this will not merge them, will just reuse an arbitrary one.
@cben cben force-pushed the reconnect-container-images branch from 272e5f1 to 2f6fee0 Compare April 19, 2017 15:56
@miq-bot
Copy link
Member

miq-bot commented Apr 19, 2017

Checked commits cben/manageiq@c58a2e1~...2f6fee0 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 0 offenses detected
Everything looks good. 🏆

@miq-bot
Copy link
Member

miq-bot commented Apr 21, 2017

This pull request is not mergeable. Please rebase and repush.

@enoodle
Copy link

enoodle commented Apr 23, 2017

cc @zeari

@zeari
Copy link

zeari commented Apr 23, 2017

@simon3z If the only way we identify containers is by name then they might also re-appear this way and need the same change

@cben
Copy link
Contributor Author

cben commented Apr 23, 2017 via email

@zeari
Copy link

zeari commented Apr 23, 2017

I think containers belong to a pod which has a GUID that will never
reappear, no?
Can pod's containers change during its lifetime?

I thought they can. if theyre tightly coupled then theres no need.

@cben
Copy link
Contributor Author

cben commented Sep 4, 2017

@cben cben closed this Sep 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants