Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: expose SolaceIO watermark policy and parameters #32107

Closed
2 of 17 tasks
ppawel opened this issue Aug 8, 2024 · 7 comments
Closed
2 of 17 tasks

[Feature Request]: expose SolaceIO watermark policy and parameters #32107

ppawel opened this issue Aug 8, 2024 · 7 comments

Comments

@ppawel
Copy link

ppawel commented Aug 8, 2024

What would you like to happen?

Currently, there is no public API in SolaceIO to configure watermark behavior - classes like WatermarkPolicy and WatermarkParameters are package-private.

In effect, the user is required to use the default/hardcoded watermark idle threshold of 30 seconds (see org.apache.beam.sdk.io.solace.read.WatermarkParameters#STANDARD_WATERMARK_IDLE_DURATION_THRESHOLD). This is too long for my use case as a stuck watermark causes downstream windows to be stuck.

Example: I have a 10-second session window right after the SolaceIO.Read transform which needs the watermark to be moving more often than 30 seconds, otherwise the elements are not moving through the whole pipeline as expected if there are no incoming Solace messages.

CC @bzablocki

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@bzablocki
Copy link
Contributor

Thanks @ppawel for reporting! I can work on that

@bzablocki
Copy link
Contributor

.take-issue

@bzablocki
Copy link
Contributor

@ppawel Is the 'idleDurationThreshold' the only parameter you'd like to have control over or do you want to be able to override the entire WatermarkPolicy object?
I can either

  1. expose the withWatermarkIdleDurationThreshold() configuration to the SolaceIO.read().withWatermarkIdleDurationThreshold() or
  2. expose the WatermarkPolicy, where you would have to override the getWatermark() and update() methods if you'd opt in for your own implementation. You would set it then via SolaceIO.read().withWatermarkPolicy(...).

I'm leaning towards ad 1, where the user would have one simple setting to tweak, which seems consistent with other connectors.

@ppawel
Copy link
Author

ppawel commented Aug 8, 2024

@bzablocki For me personally, either one is fine, though option 1 seems cleaner and should be enough for my needs 👍

@bzablocki
Copy link
Contributor

Waiting for a review in #32109

@ppawel
Copy link
Author

ppawel commented Sep 30, 2024

@bzablocki I guess this can be closed now? We are using this new parameter and seems to be working fine.

@bzablocki
Copy link
Contributor

Yes, feel free to close this. Thank you!

@ppawel ppawel closed this as completed Sep 30, 2024
@github-actions github-actions bot added this to the 2.60.0 Release milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants