-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(runners): add configurable eviction strategy to idle config #3375
Conversation
lambdas/functions/control-plane/src/scale-runners/scale-down.ts
Outdated
Show resolved
Hide resolved
2274baf
to
0019895
Compare
@npalm I rebased and fixed the prettier failures as well as one typecheck failure. I'm running into failures for tests because of missing type coverage. Will wait to make sure you approve of the approach before proceeding to write tests. |
Thanks for the PR. WIll check the approach ASAP. |
@maschwenk one of the test case is failing for this PR. Can you please fix that? |
lambdas/functions/control-plane/src/scale-runners/scale-down.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. I understand the idea of keeping caches aroudn. But we see this as a risk as well. Supporting this eviction strategy is fine as long you esnure you add a test case.
@npalm 👍🏼 sounds good. Will fix. |
@npalm I've added a test to make sure the default config is extracted correctly @GuptaNavdeep1983 took on your suggestion as well 👍🏼 |
lambdas/functions/control-plane/src/scale-runners/scale-down-config.ts
Outdated
Show resolved
Hide resolved
@npalm Ahh I just realized it's failing due to the test coverage? I couldn't figure out why before. Happy to add a bit of coverage. |
ff3cbf0
to
b24497b
Compare
@npalm Added a test in |
@npalm 😅 the Code health thing did not like some duplication I had, cleaned that up. This should be ready for review! 🙏🏼 |
@npalm should be ready for another look 🙏🏼 sorry for the churn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks in gneral good, a few remarks. Will test asap
@npalm I think my comment above the eviction strategy was more confusing so I just moved it up into the paragraph above to give a more thoughtful explanation. Completely aside from the point of this PR, your note of:
The disk space clause makes sense to me, but I'd be curious how often your instances are carrying around "dead" RAM after jobs run on them? We definitely have RAM issues but because we don't run things containerized it often just toasts the box completely and we take it out of rotation versus leaking over time. Just curious about your experience there. Thanks for reviewing! |
@npalm were you able to take a look? |
Sorry not yet, seems also the coerage is dropped slightly. I wil have a look early next week |
Just ran the test suite locally, but see no coverage error. Will have a check later. |
Re-ran the build, seems no fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maschwenk thx for your contribution
🤖 I have created a release *beep* *boop* --- ## [4.1.0](v4.0.2...v4.1.0) (2023-08-08) ### Features * **runners:** add configurable eviction strategy to idle config ([#3375](#3375)) ([896f473](896f473)) ### Bug Fixes * **lambda:** bump the aws group in /lambdas with 5 updates ([#3413](#3413)) ([1acc8ba](1acc8ba)) * **runners:** retry aws metadata token download on Linux ([#3408](#3408)) ([ef46827](ef46827)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: forest-releaser[bot] <80285352+forest-releaser[bot]@users.noreply.github.com>
@npalm I'm seeing those failures you are getting on master, I think I have an idea of why they are happening. Will upstream a change if I figure it out. |
Do you mean the release build on main? For some reason the build is randomly failing. I have no clue why (yet). The documentation build is another problem. And a less a big issue. Do you have any clue why the release build is failing when building all lambda's? |
@npalm I don't yet, but my test was mutating |
- fix for randomly failing unit test after merging #3375 - add workspace config for vscode - increase coverage
I think I have solved the issue in #3418 |
Not yet, still failing |
@maschwenk I have tried earlier today to fix the test. I am not able to reproduce the failing test on my local. Assume it has something to do with the order. I added the SCALE_DOWN_CONFIG to other test to ensure the value is set right. At some moment got a error the coverage was not metting our settings. Fixed that as well. THought it was all good. But now the main is failing again. So ondering if you have still any ieda. |
@npalm still at a little bit of a loss. Is there any way we can get better logs out of the failing tests perhaps? Or maybe try forcing the Jest test to run in an order that will deterministically fail? |
Yes we can set a few things in jest (at least)
I will also dig in a bit more next week. |
This PR should fix the issue: #3432 |
@npalm ❤️ |
We do some on-instance caching so when we scale down we'd prefer to keep the older instances around instead of the new ones (because they will have a hotter cache). This adds a configurable setting to the idleConfig to pick a sorting strategy. Never contributed to this repo, so please tell me if I'm doing something wrong!