-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Optimize seeking by timestamp #22152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @dao-jun . I think we need test cases to cover the various corner cases with this solution.
Let's avoid cherry-picking this change so that we don't introduce regressions for current maintenance branches. |
I agree with you, but before I complete the tests, I need to know if this change is reasonable. |
The general approach looks good to me. |
@dao-jun one challenge is that the ledger's timestamp is the broker's clock, but the seek uses the message publish time which is using the client's (publisher's) clock. There might be additional corner cases because of this. |
@lhotari Even though users could use Perhaps as you said, we need to add a new configuration to determine use But, how about using |
I think it's better that the admin of the broker could decide about this. A possible mitigation would be to track the publish timestamps of the first and last message and possible also min and max and store that in the ledger metadata. If this metadata is present, that could be used in the initial ledger selection. |
it might be a problem, 'cause we don't know the timestamp of |
I would assume that in the current interface, it's always the publish timestamp. |
@lhotari |
I don't think that this should be exposed to clients at all. It should be a broker level config. The alternative is to add the metadata of publish time min, max, first, last to the ledger metadata which would always provide the correct answer. |
msg publishTime is a broker level concept, pass it to ML doesn't make sense. Add a new config is better. |
I disagree that "pass it to ML doesn't make sense". it's already passed in and used to evaluate seek. There would need to to have the min, max, first and last publishTime in the ledger metadata to reliably optimize the current solution. It does need a PIP most likely, but that's not a problem to make a PIP. |
@lhotari So I believe add a new configuration to brokerConf is a better way, what do you think? |
@lhotari I've updated the implementation, PTAL |
What is the status of this PR? I'm interested in it. I can help to impl it too. |
replaced by #22792 |
Fixes #22129
Main Issue: #xyz
PIP: #xyz
Motivation
Optimize seeking by timestamp
Modifications
Add a new method to ManageCursor.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: dao-jun#8