-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
main (development)
Query engine
Spark
Please describe the bug 🐞
In rate limiting for structured streaming PR - #4479.
According to line https://github.com/apache/iceberg/pull/4479/files#diff-26782bf5c27f69e5cc9cd4a9363f601a97d1c9f97fe0c1a7fb927da7c60c014fR169 unit test, it says the stream get stuck if number of records exceed SparkReadOptions.STREAMING_MAX_ROWS_PER_MICRO_BATCH. It's is a major blocker to consume this feature. If the stream is stuck, then no further advancement of stream takes place even if new snapshots comes in.
E.g.:
STREAMING_MAX_ROWS_PER_MICRO_BATCH - 2
Snapshot1 - (2 records, 1 data file) - Read fully in Microbatch-1
Snapshot2 - (3 records, 1 data file) - Can never be read as 3 records > STREAMING_MAX_ROWS_PER_MICRO_BATCH ( Stuck forever )
Snapshot3 - 3 records
Please let me know if this is intended behavior or is it expected to change.