Skip to content

Rate limiting feature for structured streaming #7885

@namrathamyske

Description

@namrathamyske

Apache Iceberg version

main (development)

Query engine

Spark

Please describe the bug 🐞

In rate limiting for structured streaming PR - #4479.
According to line https://github.com/apache/iceberg/pull/4479/files#diff-26782bf5c27f69e5cc9cd4a9363f601a97d1c9f97fe0c1a7fb927da7c60c014fR169 unit test, it says the stream get stuck if number of records exceed SparkReadOptions.STREAMING_MAX_ROWS_PER_MICRO_BATCH. It's is a major blocker to consume this feature. If the stream is stuck, then no further advancement of stream takes place even if new snapshots comes in.

E.g.:
STREAMING_MAX_ROWS_PER_MICRO_BATCH - 2
Snapshot1 - (2 records, 1 data file) - Read fully in Microbatch-1
Snapshot2 - (3 records, 1 data file) - Can never be read as 3 records > STREAMING_MAX_ROWS_PER_MICRO_BATCH ( Stuck forever )
Snapshot3 - 3 records
Please let me know if this is intended behavior or is it expected to change.

@singhpk234 @jackye1995 @RussellSpitzer @rdblue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions