-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming regarding watermarking and status #16294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #70182 has started for PR 16294 at commit |
|
Test build #70218 has finished for PR 16294 at commit
|
| <div data-lang="scala" markdown="1"> | ||
|
|
||
| {% highlight scala %} | ||
| import spark.implicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, will there be a complete example in the examples folder? In documents like ML, SQL, the code is cited from the example file instead of hard code in the document. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasnt planning to adding examples in this PR to keep this just about docs. I am happy to see simple examples contributed as PRs from the community.
|
Hi, @tdas . Is it possible for you to update this statement for Apache Spark 2.1?
|
|
will do. i am rewriting parts of this PR right now. |
|
Test build #70500 has finished for PR 16294 at commit
|
|
Test build #70525 has finished for PR 16294 at commit
|
|
Test build #70524 has finished for PR 16294 at commit
|
|
^^ The last failure on build 70524 is for a older commit. I reverted the culprit change and build 70525 passed. |
zsxwing
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just some nits.
| } | ||
| @Overrides void onQueryProgress(QueryProgressEvent queryProgress) { | ||
| System.out.println("Query made progress: " + queryProgress.queryStatus); | ||
| System.out.println("Query made progress: " + queryProgress.progress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: queryProgress.progress()
| } | ||
| @Overrides void onQueryTerminated(QueryTerminatedEvent queryTerminated) { | ||
| System.out.println("Query terminated: " + queryTerminated.queryStatus.name); | ||
| System.out.println("Query terminated: " + queryTerminated.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: queryTerminated.id()
|
|
||
| @Overrides void onQueryStarted(QueryStartedEvent queryStarted) { | ||
| System.out.println("Query started: " + queryTerminated.queryStatus.name); | ||
| System.out.println("Query started: " + queryTerminated.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: queryStarted.id()
|
|
||
| override def onQueryStarted(queryStarted: QueryStartedEvent): Unit = { | ||
| println("Query started: " + queryTerminated.queryStatus.name) | ||
| println("Query started: " + queryTerminated.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: queryTerminated -> queryStarted
| {% highlight python %} | ||
| query = ... // a StreamingQuery | ||
| query = ... # a StreamingQuery | ||
| print(query.progress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lastProgress
| latency.getBatch.source: 20 | ||
| Sink status - MySink | ||
| Committed offsets: [1, -] | ||
| System.out.println(query.status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: query.status()
| StreamingQuery query = ... | ||
|
|
||
| System.out.println(query.status); | ||
| System.out.println(query.progress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: query.lastProgress()
| {% highlight scala %} | ||
| val query: StreamingQuery = ... | ||
|
|
||
| println(query.progress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: query.lastProgress
| preserves all data in the Result Table. | ||
| </td> | ||
| </tr> | ||
| <tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reason as the other place.
| </tr> | ||
| </tr> | ||
| <tr> | ||
| <td></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty row to make sure that there is line after the table when rendered. otherwise looks odd.
| } | ||
| @Overrides void onQueryProgress(QueryProgressEvent queryProgress) { | ||
| System.out.println("Query made progress: " + queryProgress.progress); | ||
| System.out.println("Query made progress: " + queryProgress.lastProgress()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the method is QueryProgressEvent.progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. replace fail.
|
Test build #70529 has finished for PR 16294 at commit
|
|
LGTM pending tests |
|
Test build #70530 has finished for PR 16294 at commit
|
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #16294 from tdas/SPARK-18669. (cherry picked from commit 092c672) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
This version is built from the docs source code generated by applying apache/spark#16294 to v2.1.0 (so, other changes in branch 2.1 will not affect the doc).
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#16294 from tdas/SPARK-18669.
…egarding watermarking and status ## What changes were proposed in this pull request? - Extended the Window operation section with code snippet and explanation of watermarking - Extended the Output Mode section with a table showing the compatibility between query type and output mode - Rewrote the Monitoring section with updated jsons generated by StreamingQuery.progress/status - Updated API changes in the StreamingQueryListener example TODO - [x] Figure showing the watermarking ## How was this patch tested? N/A ## Screenshots ### Section: Windowed Aggregation with Event Time <img width="927" alt="screen shot 2016-12-15 at 3 33 10 pm" src="https://cloud.githubusercontent.com/assets/663212/21246197/0e02cb1a-c2dc-11e6-8816-0cd28d8201d7.png">  <img width="929" alt="screen shot 2016-12-15 at 3 33 46 pm" src="https://cloud.githubusercontent.com/assets/663212/21246202/1652cefa-c2dc-11e6-8c64-3c05977fb3fc.png"> ---------------------------- ### Section: Output Modes  ---------------------------- ### Section: Monitoring   Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#16294 from tdas/SPARK-18669.
What changes were proposed in this pull request?
TODO
How was this patch tested?
N/A
Screenshots
Section: Windowed Aggregation with Event Time
Section: Output Modes
Section: Monitoring