-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add external function #12
Conversation
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
@skyzh Can you ignore the HackMD bot in CLA checker? |
fbb2b18
to
ef77e42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you also provide more context why the interface should be async?
- How to understand cross-chunk concurrency? Do you mean if udf1 and udf2 both need chunk A, then they shared read this chunk?
Take |
Do we allow external functions in places other than projection? (e.g., filter?) |
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Yes, I hope everywhere. |
Before introducing external functions to the system, we should also consider deterministic problems. Considering the following query:
Suppose we get the following events from table:
If udf is not deterministic, suppose it gets 1 -> 11 for the first message, and 1 -> 10 for the second message. Many things will go wrong. The filter will emit:
which ignores the second delete. One feasible solution is to have a special kind of projection called MaterializeProjection, where we materialize data by their pk, so that UDF will only be computed once at the first time when the generated event goes into the system. |
That is to say, |
Good catch! I have no idea how to resolve that, how about only allowing all UDFs on the append-only stream? |
There may be RPC calls during the evaluation. |
Append-only stream should be fine. But this means that people can only do UDF before aggregation / TopN, which seems to be a common case in SQL. Not sure about the use case. |
A basic idea: Users must specify their UDFs are deterministic explicitly, otherwise, the UDFs can only be applied on an append-only stream or be materialized implicitly. I will add this to the RFC. |
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Signed-off-by: TennyZhuang <zty0826@gmail.com>
Can we merge this RFC? as most of its proposals have been implemented now. |
You can give an approval. |
Signed-off-by: TennyZhuang zty0826@gmail.com