Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an optimization for get and exist to skip holders by their timestamp #702

Closed
ikopylov opened this issue Dec 24, 2022 · 0 comments · Fixed by #740
Closed

Implement an optimization for get and exist to skip holders by their timestamp #702

ikopylov opened this issue Dec 24, 2022 · 0 comments · Fixed by #740

Comments

@ikopylov
Copy link
Member

The current implementation reads data from all holders, which can be slow when there are many records for the desired key. We can implement an optimization that skips holders whose maximum record timestamp in that holder is less than the record timestamp already read. The proper implementation requires tracking max and min record timestamp within holder and tracked by issue #694. But we can implement it faster in simplified and unsafe way.

The problem is that now the holder's end_timestamp is calculated as start_timestamp + self.settings.timestamp_period_as_secs(). If suddenly the value of timestamp_period in the config changes, the calculation will be incorrect.

With proposing unsafe optimization, we can try to estimate the maximum period of creation of existed holders, multiply it by some safe coefficient, and also adjust it by the value of the new config parameter.

First, we should introduce new config parameter skip_holders_by_timestamp_step_when_reading. By default it should be None (means that optimization is turned off), but it can be configured with the human readable time period (e.g. 1d).

Procedure of calculating of safe timestamp step:

  1. If skip_holders_by_timestamp_step_when_reading is None then return None
  2. Find maximum period from existed holders:
    2.1. Copy start_timestamp from holders collection to a vector
    2.2. Order vector by ascending
    2.3. Add current timestamp to an end
    2.4. Find maximum difference between 2 timestamps in that vector
  3. Calculate safe step by formula max(settings.skip_holders_by_timestamp_step_when_reading, max(2 * max_period_from_existed_holders, 5 * settings.timestamp_period)) + 1h

Then within get and exist functions we can implement smart skipping by compare timestamp of already read record and end_timestamp of current holder with addition of safe timestamp step (https://github.com/qoollo/bob/blob/master/bob-backend/src/pearl/group.rs#L264):

if max_timestamp == None || safe_timestamp_step == None || holder.end_timestamp() + safe_timestamp_step.unwrap() >= max_timestamp {
  // Process this holder
}

This issue should be done after #621 will be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants