Skip to content

Conversation

@Myztiq
Copy link
Contributor

@Myztiq Myztiq commented Aug 26, 2016

Added logic to cleanup any instance that is older than 7 days.

Anything that:
Is not a master pod.
Has a contextVersion created date older than 7 days.
Is not Isolated OR is the isolation group master.

log.info('CleanupInstances')

return Promise.using(mongodbHelper(), function (mongoClient) {
return mongoClient.fetchInstancesAsync({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To whomever reviews this. PLEASE PLEASE PLEASE make sure you think through this query properly. I don't wanna delete the wrong things on prod.

'contextVersion.created': { $lt: moment().subtract(7, 'days').toDate() },
$or: [
{ isolated: { $exists: false } },
{ isIsolationGroupMaster: true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does deleting isolation master delete the kids too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})
.then(function (instances) {
log.info({
instanceCount: instances.length
Copy link
Member

@thejsj thejsj Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put some more data here? Instance id? context version i maybe? When I look at the logs for an instance getting deleted it would be nice to be able to find how the instance got deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the problem, if I delete 500 instances this way how would you find your particular instance in that list? Do you want me to print a list of instance ID's?

})
}, 'Found instances to cleanup')
instances.forEach(function (instance) {
rabbitmq.publishTask('instance.delete', {
Copy link
Member

@podviaznikov podviaznikov Aug 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anandkumarpatel is this the way we are going? khronos will publish task that api will pickup. Before we were talking that services will emit events and not tasks to avoid coupling. What is your latest opinion here?

Shouldn't we emit event instance.expired here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we'd make a new event instance.expired and that'd get picked up by the API which would then emit instance.delete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anton your right, this should emit some event. instance.expired sounds good.

reasoning: tasks should only be used inside of an application. When you emit a task from 1 service for another to handle that couples the services. To send data to a different application we should use events. Using events makes flows difficult to understand, but it greatly improves our resilience and helps us write fault tolerant / gracefull degradation code upfront. Events do not have a direct action like a task does. A task is saying do this now. when you emit an event you don't know what is going to use it, and what it is going to do it. It might seem counter-intuitive make things more complex, but this actually helps strengthen our system. Khronos just knows that this instance is expired, but it does not know what to update. say we need to update the network, add a pr comment and send an email.
If we used a task, there would be 3 tasks here, and the person writing khronos would have to know how each service works making it harder to just update khronos. However making this event, khornos does not need to know what is going to happen, it just knows hey I check this at this time and it is expired.

the latest ponos actually forces this behavior by not allowing tasks to be used across applications (by auto-name spacing them)

TLDR; make this an event, handle delete in API


function CleanupInstances () {
var log = logger.child({
tx: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we dont need tx anymore, its already passed in logger

@anandkumarpatel
Copy link
Contributor

fix test, remove tx and this looks good to me

.then(function () {
sinon.assert.calledTwice(rabbitmq.publishTask)
sinon.assert.calledWith(
rabbitmq.publishTask,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

publishEvent

@Myztiq
Copy link
Contributor Author

Myztiq commented Aug 29, 2016

To test this I went ahead and did a dry-run to find a list of all the instances we were going to delete and tried to see if any of them were incorrect. To do this I ran the query, and looked at the resulting instances.

The goal is to find instances that have not seen their code touched in the past week. The biggest problem with that is since we're looking at contextVersions and not commit metadata we're really getting any container that's been migrated or re-built in the last week. This is actually okay since they took action on that container. Eventually we'll track "usage" which is deemed as viewed in navi/commit or viewed in runnable-angular.

@Myztiq
Copy link
Contributor Author

Myztiq commented Aug 29, 2016

When ran in production this will remove approximately 940 instances and spare 270 that have been built in the last week...

@Myztiq Myztiq merged commit 5d97db0 into master Aug 29, 2016
@Myztiq Myztiq deleted the SAN-4866-cleanup-old-instances branch August 29, 2016 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants