Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Handle multiple changed objects per LongPollHost.listen_for_change RPC #48803

Conversation

JoshKarpel
Copy link
Contributor

@JoshKarpel JoshKarpel commented Nov 19, 2024

Why are these changes needed?

Currently, in the LongPollHost/LongPollClient, if multiple objects are updated that a listen_for_change request is waiting for before the async task in the host can run again, only one of those updated objects will be returned. This is inefficient because the LongPollClient will immediately do a listen_for_change RPC again, and that will see outdated snapshot IDs for the updates that weren't returned and get all of the missed updates.

This is because of an asymmetry between

# If there are any keys with outdated snapshot ids,
# return their updated values immediately.
updated_objects = {}
for key, client_snapshot_id in keys_to_snapshot_ids.items():
try:
existing_id = self.snapshot_ids[key]
except KeyError:
# The caller may ask for keys that we don't know about (yet),
# just ignore them.
# This can happen when, for example,
# a deployment handle is manually created for an app
# that hasn't been deployed yet (by bypassing the safety checks).
continue
if existing_id != client_snapshot_id:
updated_objects[key] = UpdatedObject(
self.object_snapshots[key], existing_id
)
if len(updated_objects) > 0:
self._count_send(updated_objects)
return updated_objects
, which looks for all outdated keys, and
updated_object_key: str = async_task_to_watched_keys[done.pop()]
, which only looks at a single complete Event, even if multiple events completed during the wait.

To prove that the wait can indeed see multiple completed Events, see this example:

from asyncio import wait, Event, run, create_task, FIRST_COMPLETED


async def main():
    a = Event()
    b = Event()

    wait_for_a = create_task(a.wait())
    wait_for_b = create_task(b.wait())

    a.set()
    b.set()

    done, pending = await wait([wait_for_a, wait_for_b], return_when=FIRST_COMPLETED)

    print(f"{len(done)=}")
    print(f"{len(pending)=}")

run(main())

# len(done)=2
# len(pending)=0

Generally this won't be a big issue because most listen_for_change requests in the current Serve setup are asking for a very small number of keys and are likely to only get one key update anyway. But, as I've been discussing with @edoakes and @zcin on Slack, I'd like to group up the DeploymentHandle listen_for_change RPCs under a single LongPollClient, which will be requesting many keys and is therefore more likely to hit this situation.

To complement this change, I also changed LongPollHost.notify_changed so that it takes multiple updates at the same time.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Josh Karpel <josh.karpel@gmail.com>
@JoshKarpel JoshKarpel force-pushed the handle-multiple-updated-keys-in-long-poll-host branch from 2dcc0fb to 89cf841 Compare November 19, 2024 17:05
Signed-off-by: Josh Karpel <josh.karpel@gmail.com>
@JoshKarpel JoshKarpel marked this pull request as ready for review November 19, 2024 19:58
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Nov 19, 2024
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edoakes edoakes enabled auto-merge (squash) November 19, 2024 22:14
@edoakes edoakes merged commit c6d26fd into ray-project:master Nov 19, 2024
7 checks passed
@JoshKarpel JoshKarpel deleted the handle-multiple-updated-keys-in-long-poll-host branch November 20, 2024 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants