-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Force specifying watermark for sources #6750
Comments
The following question is: do we still allow to specify wartermark in tvf format? |
Agree with the Flink part. |
I want to remove it eventually. |
I'm ok with this change, so we still will emit watermark even if there is no need for state cleaning? For example, a simple filter query. I think it's ok to me if it has no much effect on performance. |
By enforcing, do you mean CREATE SOURCE will fail if user doesn't specify watermark? What happen to the source without a timestamp column? |
Yes. I want to prohibit that case. |
Or, could we ask users to define a watermark on the I also thought maybe we can have default based on |
Agree. In fact, Flink enforces this by only allowing watermark column to be used in window functions. |
This proposal can be divided into two parts
I think 1 is not controversial and we could start working on it now. |
Updated. Processing time should be an alternative. |
We had agreed on this in the last discussion. Let's do it later on @yuhao-su |
Background
In our current design, Watermark is introduced with a TVF (table-value function) e.g.
and then
See RFC#2 The WatermarkFilter and StreamSort operator for details.
Proposal
According to RFC#4: Unify the materialized source and table, in the future, we will only have 2 kinds of objects in databases: table and source. I think in most cases,
So, shall we enforce this by forcing users to define a watermark on creating a source? For example,
Referred to the Flink document.
Then, users don't have to define the
Watermark
when writing queriesNote that this proposal is all about syntax rather than implementation.
Alternatively, users can also choose to use processing time as the time column. In this case, processing time has a implicit perfect watermark i.e. events are perfectly ordered and watermark latency is 0.
See also #7209.
Users must choose one from above two when creating a source, otherwise an error will be displayed to guide him/her.
Benefits
The first and foremost reason is that the original
watermark
syntax is not compulsory, so users will be likely to ignore it. By enforcing it on defining sources, users would be easier when writing queries. No unfamiliar syntax anymore.Another benefit of avoiding
watermark
syntax is - Strictly speaking,watermark
is not a SQL function because all SQL functions operate data itself, whilewatermark
operates on some kind of metadata. This obscure semantics cause it wired to executewatermark
in batch queries.The text was updated successfully, but these errors were encountered: