Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ManagedRangeCache's Last-access-time Datastructure - RangeLru #27

Closed
wants to merge 2 commits into from

Conversation

jon-chuang
Copy link

No description provided.

@jon-chuang jon-chuang changed the title The Design of ManagedRangeCache's Last-access-time Oracle - RangeLru RFC: ManagedRangeCache's Last-access-time Oracle - RangeLru Dec 1, 2022
@jon-chuang jon-chuang changed the title RFC: ManagedRangeCache's Last-access-time Oracle - RangeLru RFC: ManagedRangeCache's Last-access-time Datastructure - RangeLru Dec 1, 2022

At any point in time, the `RangeCache` owns a contiguous range.

In-line with our non-OOM and memory-efficiency requirements, we require a way to evict ranges that have not been requested by the operator for a long time. We choose to do so in a similar fashion to `ManagedLru` - based on the last-accessed time of a given subrange (the subranges being a partition of the total range owned by `RangeCache`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the subranges are divided? Considering the rows kept in DynamicFilter can be inserted/deleted continuously, it may be non-trivial 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why not maintain the epoch per row?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why not maintain the epoch per row?

This would violate the integrity of range cache since it guarantees to cache all rows in a certain range.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the subranges are divided?

It's quite straightforward. Everytime some range is accessed, it would be considered a new subrange with a given last-accessed time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why not maintain the epoch per row?

Actually, on further thought, perhaps ManagedLru cache can work since it will update all the rows in a range with new timestamp on every read. So it will achieve the same thing but be less efficient since it has to track each row rather than the access of logical range.

# The Design of `ManagedRangeCache`'s Last-access-time Oracle - `RangeLru`

## Summary
The design of a mechanism for evicting ranges last accessed before a certain epoch for the `DynamicFilter` operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "access" include both writes and reads?

Note that for DynamicFilter the writes are from outer side and the reads are from inner side (i.e. the 1-row side)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking for a while, I suppose that the "access" should only include reads from inner-side (moving the "row pointer" up or down), because the cache is designed used for reads only. The outer side is basically identical to a MaterializeExecutor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, access only indicates reads.

fn evict_before(&self, epoch: u64) -> Vec<ScalarRange>
```

1. The main data structure (main index) is a `BTreeMap { epoch -> Vec<ScalarRange> } ` to facilitate the `evict_before` operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there should be only one access for each epoch (right?), Vec is not needed here


Notes:
1. `evict_before` needs to delete all ranges below the watermark from both the main index and the secondary index
2. `access` needs to overwrite any range that overlaps with the accessed range in both the main and secondary indexes. In the former, it will modify the endpoints of the ranges that it partially overlaps with, and delete ranges that it fully overlaps with. In the latter, it will delete the left bounds that it fully overlaps with.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the last-access time of subranges are not ordered? For example

1.  0 ~ 10  last access epoch = 100
2. 10 ~ 20  last access epoch = 200
3. 20 ~ 30  last access epoch = 150
4. 30 ~ 40  last access epoch = 300

We want to evict_before(160) and 1 & 3 will be evicted, but 2 & 4 is not continuous subrange.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every contiguous range of epochs maps onto a contiguous range in the cache. So this scenario will never happen.

In the impl, we include many sanity checks regarding this continuity property.

Copy link
Member

@fuyufjh fuyufjh Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every contiguous range of epochs maps onto a contiguous range in the cache

Wow, I suddenly understand what you mean!

For DynamicFilter, the ranges accessed must be always continuous - the end of the range at epoch n must be the start of the range at epoch n+1. Your algorithm is based on this fact.

Copy link

@yuhao-su yuhao-su Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having another case:

1. 10~30    epoch = 1
2. 20~30   epoch = 2
3. 20~25    epoch = 3

If we evict epoch 1 and 2, then only (20~25) will be kept and (10~20, 25~30) will be evicted. (Not sure if I got this correctly)

Copy link
Author

@jon-chuang jon-chuang Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having another case

Oh, indeed. The range that is evicted is not a continuous range in this case, so I made an error.

So evict before can indeed return two contiguous ranges (when the remaining range splits up the old range).

However, the property that the remaining range is always contiguous is still correct.

@fuyufjh
Copy link
Member

fuyufjh commented Dec 5, 2022

Please allow me also to propose my idea here. It's too simple to be written into an RFC.

First of all, caching is helpful but not critical to DynamicFilter. I'll show this by discussion in 2 cases:

  1. Simple aggregation as the inner side. As we know, the aggregation will only emit the result on every barrier.
  2. NowExecutor as the inner side. Similarly, the current time will only be updated on barriers come.

In summary, for major cases of DynamicFilter, changes only come on barriers, so it's not as frequent as other operators like HashAgg or HashJoin. Obviously, cache miss per barrier has much less impact on performance.

Second, based on that conclusion, I would prefer either no caching or a very simple caching implementation.

(lot of deleted words)

After thinking for a while, I felt that even a fixed-size caching is not trivial to implement, so I would like to propose no caching at all.

@jon-chuang
Copy link
Author

jon-chuang commented Dec 7, 2022

so I would like to propose no caching at all.

Indeed, it may be ok to rely on the block cache.

Operator-side cache incurs memory (plus code and runtime) overhead that can only be justified by a random read scenario, whereas ranges would probably display a good block-level cache-locality.

(Although I'm not sure if the range interface to storage incurs a network latency cost to scan S3 even if the relevant blocks are in cache.)

Furthermore, the cacheing mechanism here is designed for reads caused by a pointer that moves repeatedly up and down. It is completely ineffective for a TemporalFilter-like/monotonic increasing scenario where data is read back exactly once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants