Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update events that are used to determine VM session starts and stops #732

Merged
merged 17 commits into from
Dec 21, 2018

Conversation

eiffel777
Copy link
Contributor

@eiffel777 eiffel777 commented Nov 21, 2018

There are more events that are VM sessions start and stop events than we previously realized. This PR adds 6 new events, 3 start events and 2 end events. The start events added are unsuspend, unpause and power_on. The end events added are power_off and pause. Resume events in openstack were are also mapped to the RESUME event.

During testing a bug was also found in the aggregation where if an instance had more that one session then the details of the first session was added to the cloud_events_transient table and any other sessions just updated the first row instead of adding a new one because of the keys on the cloud_events_transient table. To fix this I am adding the start_time and end_time field to primary key on that table.

Motivation and Context

Adding the new stop and start events gives us more accuracy on how VM's are being used

Tests performed

Updated regression tests and ran them. Also, manually checked to make sure the new aggregation was correct.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@eiffel777 eiffel777 added Category:ETL Extract Transform Load Category:Cloud Cloud Realm labels Nov 21, 2018
@eiffel777 eiffel777 added this to the 8.1.0 milestone Nov 21, 2018
@eiffel777 eiffel777 self-assigned this Nov 21, 2018
"instance"
"instance",
"start_time",
"end_time"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these columns added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. why were both start and end added? Surely the start_time is sufficient to generate uniqueness. If you include the end_time does this have a problem when the data are aggregated tomorrow (since the end time of an instance may have changed - but not the start time).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note that the cloud transient action that populates this table uses "resource_id ASC, instance_id ASC, start_time ASC" as the order by clause.

Copy link
Contributor Author

@eiffel777 eiffel777 Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it, I think that just using start time would be enough but as it is I don't think there will be an issue when you run aggregation the next day. The cloud_events_transient table is truncated every time aggregation is run so if you have a different end time for a session than the day before you will only get one row for that session with the new end time not two rows with the same start time but different end times.

I have no real strong feelings about this either way since I think both will work fine. If someone wants me to change it to just start_time I can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the plan was to update the code to not truncate the tables every day. If you add the end_time in then it will just have to be removed again soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there was talk of making it so that you don't have to re-aggregate all the cloud data every time you run aggregation which would mean that the cloudfact tables would no longer be truncated everyday. I'm not sure what affect that would have on the cloud_events_transient table. If it's the case that it would no longer be truncated every time aggregation is run then the end_time would need to be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course the cloud_events_transient table would have to not be truncated - this table stores the facts. What design did you have in mind where the cloud_events_transient table could be truncated every day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a design in mind. I don't think I'm the one that is going to make this change so I haven't really thought about it which is why I said I wasn't sure how any changes to how the aggregation is done would affect the cloud_events_transient table. Since it seems that we are going to change how the aggregation happens and that means the cloud_events_transient will no longer be truncated then I can remove the end_time.

EOT
);
$runaggregation = $console->promptBool(
'Do you want to run cloud aggregation now?',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do some basic timing to allow them to know how long this might take.
This also runs ingestion.
We should check if the cloud is enabled before we bother prompting (already talked about that this ability is in another PR, so this can get updated at that point for all the things...)

$cloudLogDirectoryEntryAttempts++;
}

$cloudResourceName = $console->prompt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle multiple cloud types?

Generic and openStack?

@@ -353,7 +353,7 @@ public function aggregate(
*
* @param string $realm The realm you are checking to see if exists
*/
private function realmEnabled($realm)
public function realmEnabled($realm)
Copy link
Contributor

@plessbd plessbd Dec 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick future thing...
I want to rename this isRealmEnabled
reasoning:
isRealmEnabled to me means it is going to tell you true or false (I like to use the prefixes is and has)

vs realmEnabled makes me think of things like session_name which can be used to set things.

@eiffel777 eiffel777 merged commit cfed4d8 into ubccr:xdmod8.1 Dec 21, 2018
@eiffel777 eiffel777 added the enhancement Enhancement of the functionality of an existing feature label Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Cloud Cloud Realm Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants