KAFKA-7080: replace numSegments with segmentInterval#5257
KAFKA-7080: replace numSegments with segmentInterval#5257guozhangwang merged 4 commits intoapache:trunkfrom vvcephei:KAFKA-7080-fix-window-store-builder
Conversation
There was a problem hiding this comment.
fixes a lurking potential bug: using a mutable field in equals/hashcode.
There was a problem hiding this comment.
This test didn't actually test... anything.
There was a problem hiding this comment.
This test duplicates an test a little earlier in this file.
guozhangwang
left a comment
There was a problem hiding this comment.
LGTM overall except a question about 60_000L default value.
There was a problem hiding this comment.
Could we use a const for 60_000L than putting the magic number scattered in multiple places?
There was a problem hiding this comment.
Yes, we certainly can. Similar to my comment on the maintain-ms default, I think it's valuable for each decoupled component to have an independent constant. But in this case, I think we would actually benefit from having the constant, as it would have more than one use.
There was a problem hiding this comment.
@guozhangwang While thinking about this, I had a more fundamental question... Why is 1 minute the minimum segment interval?
Through some archaeology, I've determined that the old segment file naming scheme encoded the segment as a date, with one minute granularity. So I think that the restriction was originally put into place for this reason.
Nowadays, though, I don't think there is any such restriction. Perhaps instead of encoding the magic value, we can just banish it instead, by lifting this restriction?
There was a problem hiding this comment.
Honestly I cannot remember the exact reasons why we use 1minute as the magic number...
Just using retentionPeriod / num.segments (enforcing num.segments > 2) should be fine.
There was a problem hiding this comment.
If it's ok with you, I'd like to tackle this issue in a follow-on PR.
I've got the code to loosen the restriction and edited the tests to prove that everything still works with other segmentIntervals, but the diff is quite large and orthogonal to KIP-319.
Can we just leave it as a magic number for now?
There was a problem hiding this comment.
Why do we want to not expose Windows.DEFAULT_MAINTAIN_DURATION_MS any more?
There was a problem hiding this comment.
In this particular usage, it's simply that SessionWindows, which is otherwise unrelated to Windows, doesn't need the coupling. If we want to expose the default values as constants, I'd want to put one on Windows and another on SessionWindows. The premise is that if components don't need to be coupled, they shouldn't be.
The second order question is whether we should expose the default constants at all. I'd ask the inverse, why should we? Looking at the diff, you can see that that DEFAULT_MAINTAIN_DURATION_MS was only used three places, all tests. One of the tests was invalid, and the other two work perfectly fine without it. I removed it on the premise that if some state doesn't need to be exposed, it shouldn't be.
There was a problem hiding this comment.
This is a meta comment: currently we are leaking some segmented bytes store specific information into the general interface, such as segmentInterval, retainDuplicates. Suppose a user customized window bytes store, it does not necessarily need to be segmented at all. In addition, the windowSize and retentionPeriod are used today only because library's specific caching / logging layer relies on it, which is not generally applicable.
Ideally all of these parameters should not need to be instantiated by users, we could consider refactoring them a bit for better custom store support.
There was a problem hiding this comment.
Yes, I agree. It seems that working to encapsulate these parameters would make the interfaces more general and easier to plug in new implementations for.
I also wonder whether the cache really needs to be segmented at all.
We have recently created https://issues.apache.org/jira/browse/KAFKA-7106 to further clean up the Window definition. Would you like a more general ticket to tidy up these interfaces, or would you rather just bear them in mind?
There was a problem hiding this comment.
7106 should be good enough. This is just a self-reminder than an actual comment :)
There was a problem hiding this comment.
nit: I just realized that in SessionBytesStoreSupplier we have segmentIntervalMs, maybe we should make them consistent? Such nit renaming does not need to recast the vote.
There was a problem hiding this comment.
Oops! Thanks for the catch.
|
LGTM! cc @bbejeck for another view. |
| false), | ||
| INTEGER_SERDE, BYTE_SERDE); | ||
| false, | ||
| 60_000L |
There was a problem hiding this comment.
This is the arithmetic solution of the previous formula max((retention / (numSegments - 1)), 60,000) = max((1000 * 3 / (3 - 1)), 60,000) = max(3,000/2, 60,000) = 60,000. I point this out because it's important not to change any parameters on the benchmark.
|
|
||
| builder.addStateStore( | ||
| Stores.windowStoreBuilder( | ||
| Stores.persistentWindowStore("store", 30_000L, 10_000L, false), |
There was a problem hiding this comment.
in cases like this where we didn't actually care how many segments or what segment interval, I substituted in the new persistentWindowStore that doesn't take any of the segment parameters.
| numSegments, | ||
| windowSize, | ||
| retainDuplicates), | ||
| Stores.persistentWindowStore(windowName, retentionPeriod, windowSize, retainDuplicates, segmentInterval), |
There was a problem hiding this comment.
Oops! I found a couple of spots where we were using numSegments when we meant segmentInterval. Clearly, the segment size doesn't matter for these tests (because it was wrong), so maybe we should use the non-parameterized version....
There was a problem hiding this comment.
Actually, I re-read this section. It was indeed incorrect before, but it still worked because the segmentInterval we specified (3ms) divides the one we tested for (60_000ms). Nevertheless, the tests do depend on a specified segmentInterval, so I think we should specify it here.
|
@guozhangwang , I think this PR is ready for a final review. |
|
Merged to trunk. @vvcephei thanks! Please feel free to update KIP-319's adoption version on the wiki. |
See also KIP-319. Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available. Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
See also KIP-319.
Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.
Committer Checklist (excluded from commit message)