Skip to content

Commit

Permalink
Force minimum value of desiredBundleSizeBytes to be 1
Browse files Browse the repository at this point in the history
Fixes apache#28793 in the way suggested in apache#28793 (comment):

- `BoundedReadEvaluatorFactory#getInitialInputs` may still calculate a
  `bytesPerBundle` of `0`; but
- `BigtableSource#split` will interpret it as `1` in order to not
  violate the `checkArgument()` in
  https://github.com/apache/beam/blob/71c8459633ec86e576eca080a26be9f42474ecb2/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java#L1623-L1626
  • Loading branch information
joar committed Oct 4, 2023
1 parent 282d027 commit 7ed57ef
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
* Fixed exception chaining issue in GCS connector (Python) ([#26769](https://github.com/apache/beam/issues/26769#issuecomment-1700422615)).
* Fixed streaming inserts exception handling, GoogleAPICallErrors are now retried according to retry strategy and routed to failed rows where appropriate rather than causing a pipeline error (Python) ([#21080](https://github.com/apache/beam/issues/21080)).
* Fixed a bug in Python SDK's cross-language Bigtable sink that mishandled records that don't have an explicit timestamp set: [#28632](https://github.com/apache/beam/issues/28632).
* Fixed "Desired bundle size 0 bytes must be greater than 0" in BigtableSource when you have more cores than bytes to read [#28793](https://github.com/apache/beam/issues/28793).


## Security Fixes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1326,7 +1326,11 @@ public List<BigtableSource> split(long desiredBundleSizeBytes, PipelineOptions o
long maximumNumberOfSplits = 4000;
long sizeEstimate = getEstimatedSizeBytes(options);
desiredBundleSizeBytes =
Math.max(sizeEstimate / maximumNumberOfSplits, desiredBundleSizeBytes);
Math.max(
sizeEstimate / maximumNumberOfSplits,
// BoundedReadEvaluatorFactory may provide us with a desiredBundleSizeBytes of 0
// https://github.com/apache/beam/issues/28793
Math.max(1, desiredBundleSizeBytes));

// Delegate to testable helper.
List<BigtableSource> splits =
Expand Down

0 comments on commit 7ed57ef

Please sign in to comment.