Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

che #14530 Changing 'che.workspace.pool.type' from 'fixed' to 'cached' by default #14631

Closed
wants to merge 1 commit into from

Conversation

ibuziuk
Copy link
Member

@ibuziuk ibuziuk commented Sep 23, 2019

What does this PR do?

Changing 'che.workspace.pool.type' from 'fixed' to 'cached' by default. This change will use CachedThreadPool:

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available. These pools will typically improve the performance of programs that execute many short-lived asynchronous tasks. Calls to execute will reuse previously constructed threads if available. If no existing thread is available, a new thread will be created and added to the pool. Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. Note that pools with similar properties but different details (for example, timeout parameters) may be created using ThreadPoolExecutor constructors

Originally implemented for 5.1.0 in #3701 the value has not been changed ever after. I suspect the default was applied mainly for local development and was overridden for codenvy.io / on-prem installations

What issues does this PR fix or reference?

#14614
Should also fix #14530

Docs PR

I'm not sure if the doc PR is needed since the configuration of the workspace pool type is suppose to be an internal detail. It might be interesting to document the recommended setup as part of Scalability, Performances and Working with high latency Roadmap item

… to 'cached' by default

Signed-off-by: Ilya Buziuk <ibuziuk@redhat.com>
@che-bot che-bot added status/code-review This issue has a pull request posted for it and is awaiting code review completion by the community. kind/bug Outline of a bug - must adhere to the bug report template. labels Sep 23, 2019
@ibuziuk ibuziuk requested a review from skabashnyuk September 23, 2019 10:26
@l0rd l0rd mentioned this pull request Sep 23, 2019
6 tasks
@skabashnyuk
Copy link
Contributor

Are there any facts that confirm that this will fix an issue #14530? Do you have any estimates on how it will affect thread/memory usage? Should we wait for #14601 and #14600 before merging this to get a better understanding of the problem and solution that you are going to provide?

@che-bot
Copy link
Contributor

che-bot commented Sep 23, 2019

E2E Happy path tests of Eclipse Che Single User on K8S (minikube v1.1.1) has been successful:

  • build details
  • "che-server" docker image: maxura/che-server:14631

@ibuziuk
Copy link
Member Author

ibuziuk commented Sep 23, 2019

Should we wait for #14601 and #14600 before merging this to get a better understanding of the problem and solution that you are going to provide?

@skabashnyuk +1 for having monitoring first. In the meantime, I will apply this config to prod-preview

Do you have any estimates on how it will affect thread/memory usage

No, but according to docs the should be no issues with threads / ram on the distance:

Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources.

@che-bot
Copy link
Contributor

che-bot commented Sep 23, 2019

E2E tests of Eclipse Che Multiuser on OCP has failed:

@@ -58,7 +58,7 @@ che.workspace.auto_start=true
# operations that require asynchronous execution e.g. starting/stopping

# possible values are 'fixed', 'cached'
che.workspace.pool.type=fixed
che.workspace.pool.type=cached
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we had cached thread pool for JSON RPC messages and if I'm not mistaken it caused OutOfMemory crashing...
If we set cached without any limit - I assume it can lead to OutOfMemory crashing as well.
With JSON RPC it was easier to reproduce, but still...

Maybe it makes sense to consider this value to be configured along with the memory that Che container gets.
Like in che.properties it's fixed and every deployment provides his own values according to provided container memory.
To be more safe we could instead of reusing default java cached pool, introduce our own cached pool with a maximum size. We could estimate some maximum size according to container memory.

I'm not a big expert in this area, and it's just thoughts. In reality, my comments may be not so critical or not relevant at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this sounds relevant - having no ability of spesifying maximum size is potentially risky. but again, according to the docs:

Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources

So, it looks like technically we should not care about maximum size (if we should than there is an issue on the other level e.g. RAM misconfig etc)

@ibuziuk
Copy link
Member Author

ibuziuk commented Sep 23, 2019

after discussion with @skabashnyuk and @skryzhny it looks like the situation with cached thread pool is not obvious:

  • codenvy.io is using fixed and does not seem to have any issues with asyncStop / asyncStart
  • using the cached could potentially be risky due to no upper limit of the thread number, even though docs are saying that Threads that have not been used for sixty seconds are terminated and removed from the cache. Thus, a pool that remains idle for long enough will not consume any resources. in reality the situation might be different and potentially result in OOM

@ibuziuk
Copy link
Member Author

ibuziuk commented Sep 23, 2019

Closing for further investigation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Outline of a bug - must adhere to the bug report template. status/code-review This issue has a pull request posted for it and is awaiting code review completion by the community.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Occasionally, workspace start / stop will hang for minutes before workspace async start / stop is scheduled
5 participants