-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Fix security issue for retry package #48767
[core] Fix security issue for retry package #48767
Conversation
Signed-off-by: dentiny <dentinyhao@gmail.com>
e99dbb1
to
b956372
Compare
python/ray/util/retry.py
Outdated
def retry( | ||
max_retry_count=_RAY_DEFAULT_MAX_RETRY_COUNT, | ||
init_delay_sec=_RAY_DEFAULT_INIT_DELAY_SEC, | ||
max_delay_sec=_RAY_DEFAULT_MAX_DELAY_SEC, | ||
backoff=_RAY_DEFAULT_BACKOFF, | ||
jitter_sec=_RAT_DEFAULT_JITTER_SEC, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a unit test for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know where should I place the unit test? I don't see test
folder nearby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this retry
thing is only used in release
at that one place, so can you just leave it in the file that uses it and do not put it in ray
package?
the unit test for the cluster_manager
file is here:
https://github.com/ray-project/ray/blob/master/release/BUILD.bazel#L422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this retry thing is only used in release at that one place
Yeah, that's true; but I somehow feel it could be used elsewhere in the future, considering it's so common to have something to retry.
I would create a new folder under util
, so in the future all unit test on utils go there.
Let me know if you don't like it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tested added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is built as a stand-alone package, and does not depend on ray (it does not require pip install ray to work
https://buildkite.com/ray-project/premerge/builds/31058#019337be-867e-4fc4-b592-824e1f929966/186-553
[2024-11-17T01:43:23Z] ModuleNotFoundError: No module named 'ray'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we need to reuse it in the future, it is easy to make a move a make a copy when we actually want to reuse it.
I don't quite agree with your argument.
The least visibility is to make the retry
function as just a function inside of the release.py
file;
but the problem is, without putting it into a centralized place, people are less likely to know about it and reuse.
A common consequence is everybody makes his/her own wheels for retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, if a function is independent of application logic, and the implementation/interface is general enough for reuse, it should be placed under a common folder for better visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is built as a stand-alone package, and does not depend on ray (it does not require pip install ray to work
I'm quite confused on the argument. Does that mean, if I want to use the same retry logic in runtime env agent and release package, I have to write two identical functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to write two identical functions?
it is recommended for this case.
A little copying is better than a little dependency.
https://www.youtube.com/watch?v=PAAkCSZUG1c&t=568s
or if there is a retry
package that is secure, you can also import it I guess..
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented. adding retry.py
under ray/util
is the wrong place.
4eb389a
to
df54ed0
Compare
Signed-off-by: dentiny <dentinyhao@gmail.com>
df54ed0
to
f336c57
Compare
Thanks @aslonnie ! Discussed offline, release and ray core have two build systems, move the |
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! thanks
release/ray_release/retry.py
Outdated
_DEFAULT_MAX_DELAY_SEC = 30 | ||
_DEFAULT_BACKOFF = 2 | ||
_DEFAULT_JITTER_SEC = 1 | ||
_DEFAULT_EXCEPTIONS = (Exception,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not set (Exception,)
as default? maybe use (TimeoutError,)
or (IOError,)
? or force user to provide a non-empty list? or define an RetryError
in this file and use that as default?
also this is a tuple, so you cannot use this as the default arg value, you need to use None
, and:
if exceptions is None:
exceptions = ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not set (Exception,) as default? maybe use (TimeoutError,) or (IOError,) ? or force user to provide a non-empty list?
I think one thing we miss in our error handling is a centralized status code.
One example might be grpc status code: https://grpc.github.io/grpc/core/md_doc_statuscodes.html
which defines which codes are retriable and which are not, also used in tensorflow and abseil.
Here I'm using the Exception
by default, since I'm mimic retry.retry
: https://pypi.org/project/retry/, and expects a no-op change.
Let me know if you think it acceptable (at least we've specified exception to retry on the release package).
Signed-off-by: dentiny <dentinyhao@gmail.com>
Hi @aslonnie , if you think ok, could you please help merge the PR? Happy to discuss more about it if any :) Thank you! |
_DEFAULT_MAX_RETRY_COUNT: int = 10 | ||
_DEFAULT_INIT_DELAY_SEC: int = 1 | ||
_DEFAULT_MAX_DELAY_SEC: int = 30 | ||
_DEFAULT_BACKOFF: int = 2 | ||
_DEFAULT_JITTER_SEC: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these assignment types are implied by the way.
I randomly find lancedb issue: lancedb/lancedb#1480 which discloses a high-severity CVE Considering as lancedb, ray only has one use case for `retry` package, I took the same approach as lancedb/lancedb#1749, which names all variables better with unit and default value. --------- Signed-off-by: dentiny <dentinyhao@gmail.com> Signed-off-by: hjiang <dentinyhao@gmail.com>
I randomly find lancedb issue: lancedb/lancedb#1480
which discloses a high-severity CVE
Considering as lancedb, ray only has one use case for
retry
package, I took the same approach as lancedb/lancedb#1749, which names all variables better with unit and default value.