-
Notifications
You must be signed in to change notification settings - Fork 7k
[core][onevent] implement even merge logic at export time #58070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces event merging logic at export time to optimize performance by reducing message size and network usage. Events are grouped by entity ID and type, and then merged. The implementation correctly maintains the order of events, addressing an issue from a previous attempt. I've added a couple of suggestions to further improve performance by reducing string copies during the event grouping process. The new test case for event merging is well-written and covers the new functionality appropriately.
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, | ||
| std::vector<std::unique_ptr<RayEventInterface>>> | ||
| event_groups; | ||
| std::vector<std::pair<std::string, rpc::events::RayEvent::EventType>> group_order; | ||
|
|
||
| for (auto &event : buffer_) { | ||
| rpc::events::RayEvent ray_event = std::move(*event).Serialize(); | ||
| auto key = std::make_pair(event->GetEntityId(), event->GetEventType()); | ||
| auto [it, inserted] = event_groups.try_emplace(key); | ||
| if (inserted) { | ||
| group_order.push_back(key); | ||
| } | ||
| it->second.push_back(std::move(event)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve performance and reduce string copies, you can change group_order to store pointers to the keys in event_groups instead of copying the keys. absl::flat_hash_map guarantees pointer stability on insertion, so this is safe. Additionally, using std::move with try_emplace is more efficient as it avoids copying the key if it's a new entry and does nothing if the key already exists.
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, | |
| std::vector<std::unique_ptr<RayEventInterface>>> | |
| event_groups; | |
| std::vector<std::pair<std::string, rpc::events::RayEvent::EventType>> group_order; | |
| for (auto &event : buffer_) { | |
| rpc::events::RayEvent ray_event = std::move(*event).Serialize(); | |
| auto key = std::make_pair(event->GetEntityId(), event->GetEventType()); | |
| auto [it, inserted] = event_groups.try_emplace(key); | |
| if (inserted) { | |
| group_order.push_back(key); | |
| } | |
| it->second.push_back(std::move(event)); | |
| } | |
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, | |
| std::vector<std::unique_ptr<RayEventInterface>>> | |
| event_groups; | |
| std::vector<const std::pair<std::string, rpc::events::RayEvent::EventType>*> group_order; | |
| for (auto &event : buffer_) { | |
| auto key = std::make_pair(event->GetEntityId(), event->GetEventType()); | |
| auto [it, inserted] = event_groups.try_emplace(std::move(key)); | |
| if (inserted) { | |
| group_order.push_back(&it->first); | |
| } | |
| it->second.push_back(std::move(event)); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i tried this and it segfault; the iter of the absl::flat_hash_map is not safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we store the event order in a list (linked_list) instead of vector and store iterators to the item in the list in the hash_map?
iterators to items in linked lists are always stable, iterators to items in flat_hash_map can change which was probably why you encountered a segmentation fault.
and we can maybe do the merge in the first loop itself (if an event_key already exists in event_groups then we just merge with the item instead of differing it to later?
something like:
absl::flat_hash_map<RayEventKey,std::list<std::unique_ptr<RayEventInterface>>::iterator> event_groups;
std::list<std::unique_ptr<RayEventInterface>> grouped_events;
for (auto &event : buffer_) {
auto event_key = RayEventKey(event->GetEntityId(), event->GetEventType());
auto it = event_groups.find(event_key);
if (it == event_groups.end()) {
// first occurrence of the eventKey
grouped_events.emplace_back(std::move(event));
event_groups.emplace(event_key, grouped_events.end());
}
else{
// entry already exists merge
it->second->Merge(std::move(*event));
}
}so grouped_events should have the merged events in the right order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's nit, let me give it a try
| for (auto &key : group_order) { | ||
| auto &events = event_groups[key]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with the change of group_order storing pointers, this loop should be updated to dereference the key pointers. Using at() is also slightly safer here than operator[] as it expresses the intent that the key must exist.
| for (auto &key : group_order) { | |
| auto &events = event_groups[key]; | |
| for (auto *key : group_order) { | |
| auto &events = event_groups.at(*key); |
| // events with the same resource ID into a single event. | ||
| // group the event in the buffer_ by their entity id and type; then for each group, | ||
| // merge the events into a single event. maintain the order of the events in the buffer. | ||
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we define a dataclass instead of using a pair here for better readability
struct RayEventKey {
std::string entity_id;
rpc::events::RayEvent::EventType event_type;
}
and then use that in
absl::flat_hash_map<RayEventKey,std::vector<std::unique_ptr<RayEventInterface>>> event_groups;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally
|
Should we also consider merging on the aggregator side?
In this case, the aggregator ends up with one event for DEPENDENCIES_UNREADY and another merged event for the remaining states. To address this, we could consider performing merging on the aggregator side as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also consider merging on the aggregator side?
We could yeah, though I think merging is only for perf (especially the part about converting protobuf to json on the aggregator side) and not a functional requirement, so it needs to be measured if it worths it
| // events with the same resource ID into a single event. | ||
| // group the event in the buffer_ by their entity id and type; then for each group, | ||
| // merge the events into a single event. maintain the order of the events in the buffer. | ||
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally
| absl::flat_hash_map<std::pair<std::string, rpc::events::RayEvent::EventType>, | ||
| std::vector<std::unique_ptr<RayEventInterface>>> | ||
| event_groups; | ||
| std::vector<std::pair<std::string, rpc::events::RayEvent::EventType>> group_order; | ||
|
|
||
| for (auto &event : buffer_) { | ||
| rpc::events::RayEvent ray_event = std::move(*event).Serialize(); | ||
| auto key = std::make_pair(event->GetEntityId(), event->GetEventType()); | ||
| auto [it, inserted] = event_groups.try_emplace(key); | ||
| if (inserted) { | ||
| group_order.push_back(key); | ||
| } | ||
| it->second.push_back(std::move(event)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's nit, let me give it a try
Signed-off-by: Cuong Nguyen <can@anyscale.com>
dea341e to
704bf13
Compare
|
@sampan-s-nayak's comments |
sampan-s-nayak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! changes LGTM!
…t#58070) RayEvent provides a special API, merge, which allows multiple events to be combined into a single event. This reduces gRPC message size, network bandwidth usage, and is essential for scaling task event exports. This PR leverages that feature. Specifically, it clusters events into groups based on (i) entity ID and (ii) event type. Each group is merged into a single event, which is then added to the gRPC message body. The EntityId is a user-defined function, implemented by the event class creator, that determines which events can be safely merged. ``` Note: this is a redo of ray-project#56558 which gets converted because it randomize the order the events that get exported, lead to flaky tests etc. This attempt maintain the order even after merging. ``` Test: - CI Signed-off-by: Cuong Nguyen <can@anyscale.com>
…t#58070) RayEvent provides a special API, merge, which allows multiple events to be combined into a single event. This reduces gRPC message size, network bandwidth usage, and is essential for scaling task event exports. This PR leverages that feature. Specifically, it clusters events into groups based on (i) entity ID and (ii) event type. Each group is merged into a single event, which is then added to the gRPC message body. The EntityId is a user-defined function, implemented by the event class creator, that determines which events can be safely merged. ``` Note: this is a redo of ray-project#56558 which gets converted because it randomize the order the events that get exported, lead to flaky tests etc. This attempt maintain the order even after merging. ``` Test: - CI Signed-off-by: Cuong Nguyen <can@anyscale.com>
…t#58070) RayEvent provides a special API, merge, which allows multiple events to be combined into a single event. This reduces gRPC message size, network bandwidth usage, and is essential for scaling task event exports. This PR leverages that feature. Specifically, it clusters events into groups based on (i) entity ID and (ii) event type. Each group is merged into a single event, which is then added to the gRPC message body. The EntityId is a user-defined function, implemented by the event class creator, that determines which events can be safely merged. ``` Note: this is a redo of ray-project#56558 which gets converted because it randomize the order the events that get exported, lead to flaky tests etc. This attempt maintain the order even after merging. ``` Test: - CI Signed-off-by: Cuong Nguyen <can@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
RayEvent provides a special API, merge, which allows multiple events to be combined into a single event. This reduces gRPC message size, network bandwidth usage, and is essential for scaling task event exports. This PR leverages that feature.
Specifically, it clusters events into groups based on (i) entity ID and (ii) event type. Each group is merged into a single event, which is then added to the gRPC message body. The EntityId is a user-defined function, implemented by the event class creator, that determines which events can be safely merged.
Test: