Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: User Defined Aggregate Functions #70

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

wangrunji0408
Copy link

No description provided.

Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
rfcs/0000-user-defined-aggregate-functions.md Outdated Show resolved Hide resolved
Signed-off-by: Runji Wang <wangrunji0408@163.com>
Signed-off-by: Runji Wang <wangrunji0408@163.com>
@wangrunji0408 wangrunji0408 force-pushed the wrj/user-defined-aggregate-function branch from 8c48501 to 7a90897 Compare August 3, 2023 07:44

User defined functions are running in a separate process called UDF server. The RisingWave kernel communicates with the UDF server through Arrow Flight RPC to exchange data.

To avoid trouble in fault tolerance, the UDF server should be **stateless** even if the aggregate function is stateful. This means the aggregate state should be maintained by the kernel. However, exchanging the state in each RPC call (batch aggregation) is not efficient, especially when the state is large. On the other hand, kernel doesn't need to know the state or the aggregate result before a barrier arrives. Therefore, we **create a streaming RPC `doExchange` to call the aggregate function, and sync the state every time a barrier arrives**. When the connection to the UDF server is broken, the executor raises an error and may retry or pause the stream.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a fatal problem that the current design doesn't take group aggregation into consideration. We still have to send intermediate states for each group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants