-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc][KubeRay]: Redis eviction suggestions when ENABLE_GCS_FT_REDIS_CLEANUP=false
#40949
[Doc][KubeRay]: Redis eviction suggestions when ENABLE_GCS_FT_REDIS_CLEANUP=false
#40949
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind providing a YAML file in your PR description and sharing more details about the expected behavior?
* `maxmemory=<your_memory_limit>` | ||
* `maxmemory-policy=allkeys-lru` | ||
|
||
These two options instruct Redis to delete least recently used keys when it reaches the `maxmemory` limit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with Redis. What's the definition of "used keys" in Redis? Will the timestamp be updated for a running RayCluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the access timestamp will be updated when any redis operation touches the key.
A RayCluster stores all metadata into one Redis key and keeps updating it, therefore, setting maxmemory-policy=allkeys-lru
will make a running RayCluster to be less likely evicted by redis.
To better demonstrate the expected behavior, I first wrote a small ray program to figure out how to fill up GCS usage. Based on my observation from redis side, Ray GCS will store all information in one hash set, and among all hash members, the I used the following program to verify the idea: import os
import ray
import redis
def new_actor(n):
@ray.remote(num_cpus=0)
class MyActor:
data = bytes(n)
return MyActor
if __name__ == "__main__":
redis_address = os.getenv("RAY_REDIS_ADDRESS") # ex. redis://localhost:6379
redis_client = redis.from_url(redis_address)
ray.init()
print(redis_client.memory_usage("default", 0))
for _ in range(30):
actor = new_actor(1024**2)
for _ in range(100):
actor.remote()
print(redis_client.memory_usage("default", 0)) This program defined 30 actors and each of them takes about 1MB. And then started 100 replicas for each actor. It printed out redis memory usage after each replica was registered with As you can see from the plot, the memory usage jumps 1MB up whenever a new actor definition is registered. This result shows that users should take their actor definitions, not the number of replicas, into consideration when they want to estimate how much redis memory they have to have. Expected behavior of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thank you for the detailed explanations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some comments to improve clarity. Let me know if you have questions.
@@ -310,6 +310,20 @@ Refer to [this section](kuberay-external-storage-namespace-example) in the earli | |||
|
|||
* `ENABLE_GCS_FT_REDIS_CLEANUP`: The feature gate `ENABLE_GCS_FT_REDIS_CLEANUP` is true by default, and users can turn if off by setting the environment variable in [KubeRay operator's Helm chart](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/values.yaml). | |||
|
|||
```{admonition} Setup Key Eviction on Redis | |||
If users turn `ENABLE_GCS_FT_REDIS_CLEANUP` off but still want GCS metadata to be removed automatically, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If users turn `ENABLE_GCS_FT_REDIS_CLEANUP` off but still want GCS metadata to be removed automatically, | |
If you disable `ENABLE_GCS_FT_REDIS_CLEANUP` but still want Redis to remove GCS metadata automatically, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. Thank you for clarification and I think removing the word “still” will be better.
@@ -310,6 +310,20 @@ Refer to [this section](kuberay-external-storage-namespace-example) in the earli | |||
|
|||
* `ENABLE_GCS_FT_REDIS_CLEANUP`: The feature gate `ENABLE_GCS_FT_REDIS_CLEANUP` is true by default, and users can turn if off by setting the environment variable in [KubeRay operator's Helm chart](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/values.yaml). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `ENABLE_GCS_FT_REDIS_CLEANUP`: The feature gate `ENABLE_GCS_FT_REDIS_CLEANUP` is true by default, and users can turn if off by setting the environment variable in [KubeRay operator's Helm chart](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/values.yaml). | |
* `ENABLE_GCS_FT_REDIS_CLEANUP`: True by default. You can turn this feature off by setting the environment variable in the [KubeRay operator's Helm chart](https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/values.yaml). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between disabling this feature and turning it off by setting the environment variable in the Helm chart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the environment variable in the Helm chart is the only way to disable the feature if you deploy kuberay with helm.
…CLEANUP=false` Signed-off-by: Rueian <rueiancsie@gmail.com>
a61d2dd
to
7ab0a7a
Compare
All suggestions applied. Thank you @angelinalg! |
docs/readthedocs.com:anyscale-ray due to a warning. The warning seems to be unrelated to this PR. @angelinalg do you have any idea? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I'm fine with merging this!
set these two options on Redis: | ||
|
||
* `maxmemory=<your_memory_limit>` | ||
* `maxmemory-policy=allkeys-lru` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Where exactly do you write these options? (Is it redis.conf
?) It might be good to be totally explicit, for beginners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You can store the configurations in a ConfigMap and then start the Redis server using that config file (example).
- You can also directly specify the options in the container command. See [Doc][KubeRay]: Redis eviction suggestions when
ENABLE_GCS_FT_REDIS_CLEANUP=false
#40949 (comment) as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @rueian can add a link to #40949 (comment) in the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info! I think this is important to include then. (I think a nontrivial fraction of users may get stuck/confused without this info)
I guess this is the warning you're talking about: https://buildkite.com/ray-project/premerge/builds/11949#018bd7f5-dc11-4198-8ba4-e9e9b175a5ad/6-102
I'm guessing it's a transient error, because |
bc07b1f
to
63723db
Compare
Signed-off-by: Rueian <rueiancsie@gmail.com>
63723db
to
4fa56b9
Compare
Hi @architkulkarni, thank you for merging the master branch. And your suggestion is also applied. |
…CLEANUP=false` (ray-project#40949) As discussed with @kevin85421, it would be better if we could provide a guide as well as a warning in the documentation about using Redis native eviction instead of KubeRay's Redis cleanup. --------- Signed-off-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Why are these changes needed?
As discussed with @kevin85421, it would be better if we could provide a guide as well as a warning in the documentation about using Redis native eviction instead of KubeRay's Redis cleanup.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.