Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Fix security issue for retry package #48767

Merged
merged 12 commits into from
Nov 17, 2024

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Nov 16, 2024

I randomly find lancedb issue: lancedb/lancedb#1480
which discloses a high-severity CVE

Considering as lancedb, ray only has one use case for retry package, I took the same approach as lancedb/lancedb#1749, which names all variables better with unit and default value.

@dentiny dentiny requested review from pcmoritz and jjyao November 16, 2024 11:02
@dentiny dentiny requested a review from a team as a code owner November 16, 2024 11:02
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Nov 16, 2024
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny force-pushed the hjiang/fix-retry-security branch from e99dbb1 to b956372 Compare November 16, 2024 11:18
release/requirements_buildkite.txt Show resolved Hide resolved
Comment on lines 13 to 19
def retry(
max_retry_count=_RAY_DEFAULT_MAX_RETRY_COUNT,
init_delay_sec=_RAY_DEFAULT_INIT_DELAY_SEC,
max_delay_sec=_RAY_DEFAULT_MAX_DELAY_SEC,
backoff=_RAY_DEFAULT_BACKOFF,
jitter_sec=_RAT_DEFAULT_JITTER_SEC,
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a unit test for this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know where should I place the unit test? I don't see test folder nearby

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this retry thing is only used in release at that one place, so can you just leave it in the file that uses it and do not put it in ray package?

the unit test for the cluster_manager file is here:
https://github.com/ray-project/ray/blob/master/release/BUILD.bazel#L422

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this retry thing is only used in release at that one place

Yeah, that's true; but I somehow feel it could be used elsewhere in the future, considering it's so common to have something to retry.

I would create a new folder under util, so in the future all unit test on utils go there.
Let me know if you don't like it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tested added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is built as a stand-alone package, and does not depend on ray (it does not require pip install ray to work

https://buildkite.com/ray-project/premerge/builds/31058#019337be-867e-4fc4-b592-824e1f929966/186-553

[2024-11-17T01:43:23Z] ModuleNotFoundError: No module named 'ray'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we need to reuse it in the future, it is easy to make a move a make a copy when we actually want to reuse it.

I don't quite agree with your argument.
The least visibility is to make the retry function as just a function inside of the release.py file;
but the problem is, without putting it into a centralized place, people are less likely to know about it and reuse.
A common consequence is everybody makes his/her own wheels for retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if a function is independent of application logic, and the implementation/interface is general enough for reuse, it should be placed under a common folder for better visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is built as a stand-alone package, and does not depend on ray (it does not require pip install ray to work

I'm quite confused on the argument. Does that mean, if I want to use the same retry logic in runtime env agent and release package, I have to write two identical functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to write two identical functions?

it is recommended for this case.

A little copying is better than a little dependency.

https://www.youtube.com/watch?v=PAAkCSZUG1c&t=568s

or if there is a retry package that is secure, you can also import it I guess..

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from aslonnie November 17, 2024 01:16
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented. adding retry.py under ray/util is the wrong place.

@dentiny dentiny force-pushed the hjiang/fix-retry-security branch from 4eb389a to df54ed0 Compare November 17, 2024 02:30
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny force-pushed the hjiang/fix-retry-security branch from df54ed0 to f336c57 Compare November 17, 2024 02:31
@dentiny
Copy link
Contributor Author

dentiny commented Nov 17, 2024

commented. adding retry.py under ray/util is the wrong place.

Thanks @aslonnie ! Discussed offline, release and ray core have two build systems, move the retry file to release folder.

@dentiny dentiny requested review from aslonnie November 17, 2024 02:44
release/ray_release/retry.py Outdated Show resolved Hide resolved
release/ray_release/cluster_manager/minimal.py Outdated Show resolved Hide resolved
release/ray_release/cluster_manager/minimal.py Outdated Show resolved Hide resolved
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from aslonnie November 17, 2024 03:56
Copy link
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thanks

release/ray_release/retry.py Outdated Show resolved Hide resolved
_DEFAULT_MAX_DELAY_SEC = 30
_DEFAULT_BACKOFF = 2
_DEFAULT_JITTER_SEC = 1
_DEFAULT_EXCEPTIONS = (Exception,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not set (Exception,) as default? maybe use (TimeoutError,) or (IOError,) ? or force user to provide a non-empty list? or define an RetryError in this file and use that as default?

also this is a tuple, so you cannot use this as the default arg value, you need to use None, and:

if exceptions is None:
   exceptions = ()

Copy link
Contributor Author

@dentiny dentiny Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not set (Exception,) as default? maybe use (TimeoutError,) or (IOError,) ? or force user to provide a non-empty list?

I think one thing we miss in our error handling is a centralized status code.
One example might be grpc status code: https://grpc.github.io/grpc/core/md_doc_statuscodes.html
which defines which codes are retriable and which are not, also used in tensorflow and abseil.

Here I'm using the Exception by default, since I'm mimic retry.retry: https://pypi.org/project/retry/, and expects a no-op change.
Let me know if you think it acceptable (at least we've specified exception to retry on the release package).

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny
Copy link
Contributor Author

dentiny commented Nov 17, 2024

Hi @aslonnie , if you think ok, could you please help merge the PR? Happy to discuss more about it if any :) Thank you!

Comment on lines +8 to +12
_DEFAULT_MAX_RETRY_COUNT: int = 10
_DEFAULT_INIT_DELAY_SEC: int = 1
_DEFAULT_MAX_DELAY_SEC: int = 30
_DEFAULT_BACKOFF: int = 2
_DEFAULT_JITTER_SEC: int = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these assignment types are implied by the way.

@aslonnie aslonnie merged commit ae0aae4 into ray-project:master Nov 17, 2024
5 checks passed
dentiny added a commit to dentiny/ray that referenced this pull request Dec 7, 2024
I randomly find lancedb issue:
lancedb/lancedb#1480
which discloses a high-severity CVE

Considering as lancedb, ray only has one use case for `retry` package, I
took the same approach as lancedb/lancedb#1749,
which names all variables better with unit and default value.

---------

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: hjiang <dentinyhao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants