Skip to content

KAFKA-7080: replace numSegments with segmentInterval#5257

Merged
guozhangwang merged 4 commits intoapache:trunkfrom
vvcephei:KAFKA-7080-fix-window-store-builder
Jul 2, 2018
Merged

KAFKA-7080: replace numSegments with segmentInterval#5257
guozhangwang merged 4 commits intoapache:trunkfrom
vvcephei:KAFKA-7080-fix-window-store-builder

Conversation

@vvcephei
Copy link
Contributor

@vvcephei vvcephei commented Jun 20, 2018

See also KIP-319.

Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes a lurking potential bug: using a mutable field in equals/hashcode.

Copy link
Contributor Author

@vvcephei vvcephei Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test didn't actually test... anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test duplicates an test a little earlier in this file.

Copy link
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall except a question about 60_000L default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a const for 60_000L than putting the magic number scattered in multiple places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we certainly can. Similar to my comment on the maintain-ms default, I think it's valuable for each decoupled component to have an independent constant. But in this case, I think we would actually benefit from having the constant, as it would have more than one use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guozhangwang While thinking about this, I had a more fundamental question... Why is 1 minute the minimum segment interval?

Through some archaeology, I've determined that the old segment file naming scheme encoded the segment as a date, with one minute granularity. So I think that the restriction was originally put into place for this reason.

Nowadays, though, I don't think there is any such restriction. Perhaps instead of encoding the magic value, we can just banish it instead, by lifting this restriction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I cannot remember the exact reasons why we use 1minute as the magic number...

Just using retentionPeriod / num.segments (enforcing num.segments > 2) should be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's ok with you, I'd like to tackle this issue in a follow-on PR.

I've got the code to loosen the restriction and edited the tests to prove that everything still works with other segmentIntervals, but the diff is quite large and orthogonal to KIP-319.

Can we just leave it as a magic number for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to not expose Windows.DEFAULT_MAINTAIN_DURATION_MS any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular usage, it's simply that SessionWindows, which is otherwise unrelated to Windows, doesn't need the coupling. If we want to expose the default values as constants, I'd want to put one on Windows and another on SessionWindows. The premise is that if components don't need to be coupled, they shouldn't be.

The second order question is whether we should expose the default constants at all. I'd ask the inverse, why should we? Looking at the diff, you can see that that DEFAULT_MAINTAIN_DURATION_MS was only used three places, all tests. One of the tests was invalid, and the other two work perfectly fine without it. I removed it on the premise that if some state doesn't need to be exposed, it shouldn't be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a meta comment: currently we are leaking some segmented bytes store specific information into the general interface, such as segmentInterval, retainDuplicates. Suppose a user customized window bytes store, it does not necessarily need to be segmented at all. In addition, the windowSize and retentionPeriod are used today only because library's specific caching / logging layer relies on it, which is not generally applicable.

Ideally all of these parameters should not need to be instantiated by users, we could consider refactoring them a bit for better custom store support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. It seems that working to encapsulate these parameters would make the interfaces more general and easier to plug in new implementations for.

I also wonder whether the cache really needs to be segmented at all.

We have recently created https://issues.apache.org/jira/browse/KAFKA-7106 to further clean up the Window definition. Would you like a more general ticket to tidy up these interfaces, or would you rather just bear them in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7106 should be good enough. This is just a self-reminder than an actual comment :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I just realized that in SessionBytesStoreSupplier we have segmentIntervalMs, maybe we should make them consistent? Such nit renaming does not need to recast the vote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Thanks for the catch.

@guozhangwang
Copy link
Contributor

LGTM! cc @bbejeck for another view.

false),
INTEGER_SERDE, BYTE_SERDE);
false,
60_000L
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the arithmetic solution of the previous formula max((retention / (numSegments - 1)), 60,000) = max((1000 * 3 / (3 - 1)), 60,000) = max(3,000/2, 60,000) = 60,000. I point this out because it's important not to change any parameters on the benchmark.


builder.addStateStore(
Stores.windowStoreBuilder(
Stores.persistentWindowStore("store", 30_000L, 10_000L, false),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in cases like this where we didn't actually care how many segments or what segment interval, I substituted in the new persistentWindowStore that doesn't take any of the segment parameters.

numSegments,
windowSize,
retainDuplicates),
Stores.persistentWindowStore(windowName, retentionPeriod, windowSize, retainDuplicates, segmentInterval),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! I found a couple of spots where we were using numSegments when we meant segmentInterval. Clearly, the segment size doesn't matter for these tests (because it was wrong), so maybe we should use the non-parameterized version....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I re-read this section. It was indeed incorrect before, but it still worked because the segmentInterval we specified (3ms) divides the one we tested for (60_000ms). Nevertheless, the tests do depend on a specified segmentInterval, so I think we should specify it here.

@vvcephei vvcephei changed the title KAFKA-7080: DRAFT: fix WindowStoreBuilder KAFKA-7080: replace numSegments with segmentInterval Jun 28, 2018
Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR @vvcephei

@mjsax mjsax added the streams label Jun 29, 2018
@vvcephei
Copy link
Contributor Author

vvcephei commented Jul 2, 2018

@guozhangwang , I think this PR is ready for a final review.

@guozhangwang guozhangwang merged commit 64fff8b into apache:trunk Jul 2, 2018
@guozhangwang
Copy link
Contributor

Merged to trunk. @vvcephei thanks!

Please feel free to update KIP-319's adoption version on the wiki.

@vvcephei vvcephei deleted the KAFKA-7080-fix-window-store-builder branch July 2, 2018 23:22
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
See also KIP-319.

Replace number-of-segments parameters with segment-interval-ms parameters in various places. The latter was always the parameter that several components needed, and we accidentally supplied the former because it was the one available.

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
@mjsax mjsax added the kip Requires or implements a KIP label Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants