Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC support for MultiJoin, improved support for RangeJoin #5153

Merged
merged 10 commits into from
Feb 16, 2024

Conversation

lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Feb 14, 2024

This PR adds GRPC support for MultiJoin, allowing clients to efficiently join multiple tables. It also improves support for RangeJoin by exposing a string input for specifying the match criteria.

Note that this implementation directly returns a Table as opposed to the server-side calls which return MultiJoinTable objects. The more complex interface can be exposed to GRPC along-side this implementation when we add some additional desired functionality to MultiJoinTable.

Closes #4281

nbauernfeind
nbauernfeind previously approved these changes Feb 15, 2024

message MultiJoinTablesRequest {
Ticket result_id = 1;
// It is considered an error if both `source_ids` and `multi_join_inputs` are provided.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a comment, this will appear on the source_ids field in each language, rather than be general for the body of the message. Consider moving to just before the message itself, or duplicate it to the multi_join_inputs as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...that said, this might be better expressed as a oneof? Reading the logic in the java server impl, if multi_join_inputs is provided, neither source_ids nor columns_to_match can be specified (i.e. "will be empty")?

I'm not above just making the runtime logic encode this instead of the message format, but there is something to be said for making the serialization api simply not support invalid states.

Copy link
Contributor Author

@lbooker42 lbooker42 Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @devinrsmith pointed out, it's really (source_ids + columns_to_match) ^ multi_join_inputs that needs to be enforced. oneof doesn't seem to work here without adding another nesting layer which I think would break backward compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "break backward compatibility" - this is ostensibly a brand new message, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. RangeJoin is published, MultiJoin is new.

// Specifies the range match parameters as a parseable string. Providing `range_match` in the GRPC call is the
// alternative to detailed range match parameters provided in the `range_start_column`, `range_start_rule`,
// `right_range_column`, `range_end_rule`, and `left_end_column` fields.
string range_match = 11;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to expose this to io.deephaven.qst.table.RangeJoinTable? (The answer may be no.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, this adds a shortcut / additional way to populate the RangeJoinMatch but doesn't replace it.

repeated string columns_to_add = 4;
}

message MultiJoinTablesRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is now the time to plumb this through io.deephaven.qst / io.deephaven.api.TableOperations? If not, please create a ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #5158

devinrsmith
devinrsmith previously approved these changes Feb 16, 2024
Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved of proto structure, documentation.

@lbooker42 lbooker42 enabled auto-merge (squash) February 16, 2024 20:32
@lbooker42 lbooker42 merged commit a957322 into deephaven:main Feb 16, 2024
19 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 16, 2024
@lbooker42 lbooker42 deleted the lab-mj-grpc branch June 26, 2024 19:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GRPC support for MultiJoin feature
5 participants