-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: BigtableSource "Desired bundle size 0 bytes must be greater than 0" #28793
Comments
Thanks for reporting the issue. Interesting edge case, since it has been root caused, would you mind entering a PR for fix, thanks for the contribution! |
The scenario that makes this edge case pop up for us is running pipelines in CI against the Bigtable emulator. The key of course being tiny tables. I'll enter a PR! |
The question is, should it be fixed in |
Fix BoundedReadEvaluatorFactory
Fix BigtableSource
Based on this understanding I would vote for Fix BigtableSource for now. |
Fixes apache#28793 in the way suggested in apache#28793 (comment): - `BoundedReadEvaluatorFactory#getInitialInputs` may still calculate a `bytesPerBundle` of `0`; but - `BigtableSource#split` will interpret it as `1` in order to not violate the `checkArgument()` in https://github.com/apache/beam/blob/71c8459633ec86e576eca080a26be9f42474ecb2/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L1623-L1626
In short,
targetParallelism
≥BigtableSource#getEstimatedSizeBytes
; thendesiredBundleSizeBytes
is set to0
; whichBigtableSource#splitKeyRangeIntoBundleSizedSubranges
angry.What happened?
Imagine a case where in:
beam/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
Lines 215 to 217 in 282d027
targetParallelism
is32
; andsource.getEstimatedByteSize()
is10
then
bytesPerBundle
will be0
so
beam/runners/direct-java/src/main/java/org/apache/beam/runners/direct/BoundedReadEvaluatorFactory.java
Line 217 in 282d027
will be called with the values:
split.source(0L, options)
In
OffsetBasedSource#split
, this desired-0-sized split is handled:beam/sdks/java/core/src/main/java/org/apache/beam/sdk/io/OffsetBasedSource.java
Lines 115 to 116 in 282d027
But
BigtableSource#split
does not seem to handle the desired-0-sized split:beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
Lines 1328 to 1333 in 282d027
so a few frames down the road from
BigtableSource#split
you'll end up violating thischeckArgument
inBigtableSource#splitKeyRangeIntoBundleSizedSubranges
:beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
Lines 1623 to 1626 in 71c8459
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: