Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENT-11765: Changed cf-execd's sleep behavior so it attempts to wake up at the beginning of every minute #5498

Merged
merged 1 commit into from
May 10, 2024

Conversation

olehermanse
Copy link
Member

@olehermanse olehermanse commented May 6, 2024

This could fix issues if things are slow and agent runs end up getting skipped if cf-execd wakes up too late.

@olehermanse
Copy link
Member Author

@cf-bottom jenkins, please

@olehermanse
Copy link
Member Author

Backport: #5499

@olehermanse olehermanse added the WIP Work in Progress label May 6, 2024
@cfengine cfengine deleted a comment from cf-bottom May 6, 2024
@cf-bottom
Copy link

craigcomstock
craigcomstock previously approved these changes May 6, 2024
Copy link
Contributor

@craigcomstock craigcomstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable. I do think that two logs might be helpful

  • verbose log about how long environment took to load
  • warning log about environment load taking more than one minute

Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise.

cf-execd/cf-execd.c Show resolved Hide resolved
@@ -434,6 +434,17 @@ void ThisAgentInit(void)
umask(077);
}

// Return 2 to 62 seconds depending on how much time is left until the next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 to 62 and not the actual remaining number of seconds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered why 2 to 62 as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it seemed safer.

  • If there's 0 seconds left, wait a couple of seconds before triggering another agent run.
  • If we are a little bit off on our sleeping, targeting second :02 should mean it's less likely we accidentally wake up at :59 in the previous minute and end up doing 2 runs in the same minute. I guess this could happen if NTP is slightly adjusting the time while we're sleeping, for example? Anyways, it's just based on what I felt was safer, and getting a test build ready for the customer quickly, no real data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments.

Copy link
Contributor

@vpodzime vpodzime May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to use the monotonic clock on platforms that have it, NTP can easily jump multiple seconds forward or backwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it would give better guarantees about how long we actually sleep. But still it would be a bit awkward, since the time you wake up would maybe not match what is expected. (Same as now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are two separate things -- how often this code wakes up (monotonic clock) and when it happens (system time). We can only control the former, but the latter actually determines the defined classes and thus the agent run triggering. So... 🤷‍♂️

// accidentally skipping any runs if things are slow.
static inline time_t GetPulseTime()
{
const time_t current_second = (time(NULL) % 60);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsewi mentioned using CFPULSETIME here instead of 60 but if we want to start at the beginning of every minute I would say remove CFPULSETIME entirely and keep these hard-coded 60s here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, adjusted to use the constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change actually doesn't make sense to me. If we want to determine which second in a minute the clock currently is, % 60 is the right calculation. If CFPULSETIME is set to 120 or 180, its use in the calculation won't make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a bit about that, but it does make some sense. If you define a 120 second pulse, it will tell you how many seconds is left of the current pulse, even though that feels less intuitive to think about than the current setup.

…ginning of every minute

This could fix issues if things are slow and agent runs end up
getting skipped if cf-execd wakes up too late.

Changelog: Title
Ticket: ENT-11765
Signed-off-by: Ole Herman Schumacher Elgesem <ole@northern.tech>
Copy link
Contributor

@vpodzime vpodzime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an improvement.

@nickanderson
Copy link
Member

FWIW, word on the street is that this change has a positive impact.

@olehermanse olehermanse removed the WIP Work in Progress label May 10, 2024
@olehermanse olehermanse merged commit 4f5e8dd into cfengine:master May 10, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants