-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container: introduce chunked vector #16257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool.
I think it's worth adding reserve()
, and perhaps capacity()
.
This is a great idea, but I would strongly encourage you to share as much implementation and testing as possible with Proposal: you could still have the newly named Any new methods or improvements could be in frag vector, perhaps with a few exceptions (e.g., maybe |
Force push: Fold test rewrite and benchmark into existing fragmented_vector. Note to reviewers: please hold off, I'm still working on this. Will re-request reviews in a bit |
I took a stab at what it looks like to not use the inner Anyways I've chose not to do that here, but instead have introduced chunked_vector and kept the inner vectors. |
cee46cc
to
88b4196
Compare
Force push: rebase against dev because of wierd build failure about unused variables in code that hasn't been touched in months. |
Force push: remove unused fields the build was complaining about |
For some reason my PR started picking up on these fields as unused which causes build failures. So remove them. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Force push: add GTEST arg to new test |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44319#018d42e7-acac-4772-aa56-92bdfe17a52c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44319#018d430e-db3a-451e-afe7-7ede2be65abf ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44319#018d430e-db2f-45d3-a577-764938e18765 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44336#018d4419-201a-48fa-9902-2e5e03d0a670 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44336#018d4419-2017-4646-bdb0-47be76a2856b ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44434#018d5620-6516-4b52-83a4-7356fbca5c6c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44434#018d5631-d0b2-4fef-93ad-2ed5fa437b70 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44628#018d6856-b93e-43d5-85aa-55fb20ad2798 |
Currently shrink_to_fit doesn't amend _capacity, so the inconsistency checker breaks when this is used in an expanded test suite. We can't simply adjust that because `maybe_add_capacity` doesn't know how to fix the reservation of the last fragment. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Expand the test suite as well - this is what caught the previous "bug" although I think that only causes extra allocations and isn't actually breaking anything. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Initial results in markdown format: | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | VectorBenchTest_std_vector_int64_t_64.Sort | 621861 | 858.023ns | 0.249ns | 857.774ns | 858.403ns | 0.000 | 0.000 | 4197.2 | | VectorBenchTest_std_vector_int64_t_64.Fifo | 1831959 | 307.819ns | 0.369ns | 306.906ns | 315.131ns | 7.000 | 0.000 | 2353.0 | | VectorBenchTest_std_vector_int64_t_64.Lifo | 1857517 | 296.914ns | 0.049ns | 296.592ns | 296.963ns | 7.000 | 0.000 | 2094.0 | | VectorBenchTest_std_vector_int64_t_64.Fill | 1846128 | 296.901ns | 0.259ns | 296.153ns | 297.159ns | 7.000 | 0.000 | 2003.0 | | VectorBenchTest_std_vector_int64_t_64.RandomAccess | 186400 | 439.242ns | 0.101ns | 439.050ns | 439.616ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Sort | 341678 | 2.162us | 2.248ns | 2.156us | 2.164us | 0.000 | 0.000 | 35443.6 | | VectorBenchTest_fragmented_vector_int64_t_64.Fifo | 1664893 | 340.922ns | 3.395ns | 336.754ns | 347.513ns | 2.000 | 0.000 | 2541.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Lifo | 1704456 | 345.840ns | 0.128ns | 345.409ns | 345.968ns | 2.000 | 0.000 | 2009.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Fill | 1773568 | 306.196ns | 0.519ns | 305.677ns | 307.849ns | 2.000 | 0.000 | 1378.0 | | VectorBenchTest_fragmented_vector_int64_t_64.RandomAccess | 144045 | 808.571ns | 0.340ns | 806.897ns | 810.017ns | 0.000 | 0.000 | 20279.0 | | VectorBenchTest_std_vector_sstring_64.Sort | 92992 | 3.375us | 4.492ns | 3.370us | 3.381us | 0.000 | 0.000 | 17911.8 | | VectorBenchTest_std_vector_sstring_64.Fifo | 1014603 | 504.909ns | 0.132ns | 504.737ns | 505.041ns | 56.252 | 0.000 | 6478.3 | | VectorBenchTest_std_vector_sstring_64.Lifo | 1017873 | 635.929ns | 0.672ns | 634.737ns | 636.691ns | 56.247 | 0.000 | 9099.8 | | VectorBenchTest_std_vector_sstring_64.Fill | 1021960 | 499.435ns | 0.102ns | 499.333ns | 499.955ns | 56.235 | 0.000 | 6213.1 | | VectorBenchTest_std_vector_sstring_64.RandomAccess | 82688 | 489.810ns | 0.549ns | 488.052ns | 490.747ns | 0.000 | 0.000 | 5262.0 | | VectorBenchTest_fragmented_vector_sstring_64.Sort | 79471 | 5.108us | 0.780ns | 5.107us | 5.117us | 0.000 | 0.000 | 53124.9 | | VectorBenchTest_fragmented_vector_sstring_64.Fifo | 1034088 | 484.560ns | 0.256ns | 484.055ns | 485.536ns | 51.242 | 0.000 | 6550.0 | | VectorBenchTest_fragmented_vector_sstring_64.Lifo | 1026044 | 612.407ns | 0.409ns | 611.551ns | 613.201ns | 51.254 | 0.000 | 8261.0 | | VectorBenchTest_fragmented_vector_sstring_64.Fill | 1096024 | 440.699ns | 0.575ns | 439.896ns | 441.366ns | 51.226 | 0.000 | 4875.9 | | VectorBenchTest_fragmented_vector_sstring_64.RandomAccess | 73778 | 865.995ns | 0.376ns | 865.584ns | 866.425ns | 0.000 | 0.000 | 20274.0 | | VectorBenchTest_std_vector_large_struct_64.Sort | 28093 | 3.290us | 3.995ns | 3.278us | 3.294us | 0.000 | 0.000 | 24874.6 | | VectorBenchTest_std_vector_large_struct_64.Fifo | 381820 | 1.336us | 8.468ns | 1.328us | 1.348us | 172.450 | 0.000 | 20307.6 | | VectorBenchTest_std_vector_large_struct_64.Lifo | 389338 | 1.780us | 1.125ns | 1.779us | 1.784us | 172.455 | 0.000 | 29956.8 | | VectorBenchTest_std_vector_large_struct_64.Fill | 382291 | 1.345us | 1.399ns | 1.338us | 1.348us | 172.416 | 0.000 | 20117.4 | | VectorBenchTest_std_vector_large_struct_64.RandomAccess | 26071 | 511.621ns | 0.415ns | 510.995ns | 512.432ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_large_struct_64.Sort | 26862 | 5.190us | 2.528ns | 5.184us | 5.192us | 0.000 | 0.000 | 60374.6 | | VectorBenchTest_fragmented_vector_large_struct_64.Fifo | 417009 | 1.125us | 2.022ns | 1.123us | 1.130us | 167.340 | 0.000 | 18262.8 | | VectorBenchTest_fragmented_vector_large_struct_64.Lifo | 430846 | 1.554us | 1.574ns | 1.548us | 1.560us | 167.417 | 0.000 | 26969.9 | | VectorBenchTest_fragmented_vector_large_struct_64.Fill | 426661 | 1.082us | 2.236ns | 1.070us | 1.084us | 167.390 | 0.000 | 16661.7 | | VectorBenchTest_fragmented_vector_large_struct_64.RandomAccess | 26098 | 1.109us | 0.298ns | 1.109us | 1.110us | 0.000 | 0.000 | 26271.0 | | VectorBenchTest_std_vector_int64_t_10000.Sort | 3125 | 240.627us | 219.086ns | 240.359us | 240.846us | 0.000 | 0.000 | 1140047.9 | | VectorBenchTest_std_vector_int64_t_10000.Fifo | 127488 | 7.555us | 1.547ns | 7.549us | 7.562us | 15.000 | 0.000 | 152371.0 | | VectorBenchTest_std_vector_int64_t_10000.Lifo | 166602 | 5.710us | 1.791ns | 5.706us | 5.712us | 15.000 | 0.000 | 112368.0 | | VectorBenchTest_std_vector_int64_t_10000.Fill | 166161 | 5.719us | 4.030ns | 5.714us | 5.723us | 15.000 | 0.000 | 102309.0 | | VectorBenchTest_std_vector_int64_t_10000.RandomAccess | 11265 | 578.126ns | 2.257ns | 575.869ns | 585.891ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_10000.Sort | 1076 | 845.409us | 936.043ns | 844.362us | 846.644us | 0.000 | 0.000 | 13666281.1 | | VectorBenchTest_fragmented_vector_int64_t_10000.Fifo | 82221 | 11.848us | 7.116ns | 11.840us | 11.855us | 15.000 | 0.000 | 331858.1 | | VectorBenchTest_fragmented_vector_int64_t_10000.Lifo | 90748 | 10.748us | 0.408ns | 10.748us | 10.749us | 15.000 | 0.000 | 242348.1 | | VectorBenchTest_fragmented_vector_int64_t_10000.Fill | 138294 | 6.929us | 1.677ns | 6.927us | 6.931us | 15.000 | 0.000 | 151842.0 | | VectorBenchTest_fragmented_vector_int64_t_10000.RandomAccess | 11211 | 871.327ns | 0.253ns | 870.767ns | 872.816ns | 0.000 | 0.000 | 20279.0 | | VectorBenchTest_std_vector_sstring_10000.Sort | 455 | 1.065ms | 98.121ns | 1.065ms | 1.065ms | 0.000 | 0.000 | 6184226.4 | | VectorBenchTest_std_vector_sstring_10000.Fifo | 12242 | 49.417us | 183.526ns | 49.194us | 49.693us | 7693.974 | 0.000 | 988937.3 | | VectorBenchTest_std_vector_sstring_10000.Lifo | 11948 | 83.540us | 201.201ns | 83.294us | 84.008us | 7695.449 | 0.000 | 1628478.9 | | VectorBenchTest_std_vector_sstring_10000.Fill | 12664 | 46.303us | 111.620ns | 46.086us | 46.476us | 7718.569 | 0.000 | 950916.7 | | VectorBenchTest_std_vector_sstring_10000.RandomAccess | 871 | 729.175ns | 4.036ns | 712.643ns | 733.210ns | 0.000 | 0.000 | 5262.0 | | VectorBenchTest_fragmented_vector_sstring_10000.Sort | 346 | 1.739ms | 525.595ns | 1.738ms | 1.740ms | 0.000 | 0.000 | 19590219.1 | | VectorBenchTest_fragmented_vector_sstring_10000.Fifo | 13055 | 44.043us | 29.928ns | 43.891us | 44.218us | 7696.624 | 0.000 | 1022046.7 | | VectorBenchTest_fragmented_vector_sstring_10000.Lifo | 13340 | 74.668us | 70.483ns | 74.584us | 75.043us | 7687.319 | 0.000 | 1510438.4 | | VectorBenchTest_fragmented_vector_sstring_10000.Fill | 14191 | 37.701us | 134.520ns | 37.566us | 37.934us | 7738.071 | 0.000 | 765164.5 | | VectorBenchTest_fragmented_vector_sstring_10000.RandomAccess | 874 | 983.747ns | 1.429ns | 981.854ns | 985.834ns | 0.000 | 0.000 | 20274.1 | | VectorBenchTest_std_vector_large_struct_10000.Sort | 158 | 1.223ms | 753.449ns | 1.222ms | 1.224ms | 0.000 | 0.000 | 9232974.5 | | VectorBenchTest_std_vector_large_struct_10000.Fifo | 3287 | 187.146us | 117.684ns | 186.935us | 187.860us | 25786.828 | 0.000 | 3426736.0 | | VectorBenchTest_std_vector_large_struct_10000.Lifo | 3267 | 302.477us | 577.335ns | 300.767us | 304.888us | 25903.583 | 0.000 | 5573287.9 | | VectorBenchTest_std_vector_large_struct_10000.Fill | 3346 | 184.322us | 471.959ns | 183.659us | 184.839us | 25939.088 | 0.000 | 3409226.3 | | VectorBenchTest_std_vector_large_struct_10000.RandomAccess | 197 | 769.574ns | 8.680ns | 760.081ns | 794.858ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_large_struct_10000.Sort | 142 | 1.963ms | 241.887ns | 1.962ms | 1.964ms | 0.000 | 0.000 | 23384934.2 | | VectorBenchTest_fragmented_vector_large_struct_10000.Fifo | 3904 | 136.533us | 736.243ns | 135.797us | 137.703us | 26011.799 | 0.000 | 2975962.3 | | VectorBenchTest_fragmented_vector_large_struct_10000.Lifo | 4054 | 248.171us | 1.129us | 247.042us | 251.449us | 26030.332 | 0.000 | 4970862.6 | | VectorBenchTest_fragmented_vector_large_struct_10000.Fill | 4017 | 129.728us | 490.590ns | 129.238us | 130.384us | 26028.086 | 0.000 | 2727463.1 | | VectorBenchTest_fragmented_vector_large_struct_10000.RandomAccess | 199 | 1.403us | 6.101ns | 1.397us | 1.416us | 0.000 | 0.000 | 26271.0 | | VectorBenchTest_std_vector_int64_t_1048576.Sort | 22 | 38.664ms | 31.955us | 38.619ms | 38.719ms | 0.000 | 0.000 | 169800697.5 | | VectorBenchTest_std_vector_int64_t_1048576.Fifo | 1389 | 719.860us | 79.396ns | 719.665us | 720.042us | 21.000 | 0.000 | 14424256.9 | | VectorBenchTest_std_vector_int64_t_1048576.Lifo | 1907 | 523.657us | 68.929ns | 523.588us | 523.885us | 21.000 | 0.000 | 10229949.4 | | VectorBenchTest_std_vector_int64_t_1048576.Fill | 1901 | 525.549us | 349.382ns | 525.174us | 525.945us | 21.000 | 0.000 | 9181290.4 | | VectorBenchTest_std_vector_int64_t_1048576.RandomAccess | 122 | 1.108us | 7.385ns | 1.099us | 1.153us | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Sort | 7 | 140.207ms | 155.607us | 140.045ms | 140.824ms | 0.000 | 0.000 | 2212428804.2 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Fifo | 817 | 1.216ms | 272.930ns | 1.215ms | 1.220ms | 1035.000 | 0.000 | 34750586.1 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Lifo | 902 | 1.108ms | 244.687ns | 1.108ms | 1.109ms | 1035.000 | 0.000 | 25422362.1 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Fill | 1405 | 705.132us | 1.262us | 702.095us | 706.412us | 1035.000 | 0.000 | 15875192.7 | | VectorBenchTest_fragmented_vector_int64_t_1048576.RandomAccess | 124 | 1.739us | 17.758ns | 1.699us | 1.804us | 0.000 | 0.000 | 20279.0 | Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Add iterator invalidation checks to fragmented_vector, this is similar to absl::btree, whenever there is a mutation, we increment a generation and iterators can only be used in the same generation that they were created. These checks are only enabled in debug mode to prevent the overhead during releases. ``` [ RUN ] Vector.GenerationChecks ERROR 2024-01-24 12:06:40,763 [shard 0:main] assert - Assert failure: (/home/rockwood/Workspace/redpanda2/src/v/container/include/container/fragmented_vector.h:338) '_vec->_generation == _my_generation' Attempting to use an invalidated iterator. The corresponding fragmented_vector container has been mutated since this iterator was constructed. ``` Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Force push: fix a bug in |
I am happy to do this, but would like to do it async since I may be OOO any day now. So if it's okay with you I would like to merge this as is, then start the thread with the team to see what people think of switching everything over to this. As for shrink to fit, there is only once usage of that method in the leadership balancer, I don't suspect it to be a big deal at the moment. There are ways to implement shrink to fit.
I think so, except for the situation where the performance is needed (you can't accept the indirection) and you know you will never have an oversized allocation. I think there are probably few if any use cases like this, but I don't know all the dark corners of Redpanda. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I expect more than one passing review before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I expect more than one passing review before merge.
Thanks, I know @travisdowns is planning on looking too |
BOOST_REQUIRE(v._capacity < std::numeric_limits<size_t>::max() / 2); | ||
|
||
static AssertionResult validate(const fragmented_vector<T, S>& v) { | ||
if (v._size > v._next_fragment_needed_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a request for change, but does gtest not have handy assertions which can check > and then output a good error message which also includes the values that failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really in "helper" validation, otherwise if you're in a TEST
macro you can directly use EXPECT_LT
, etc, but for building helpers you're on the hook for all the logic, which is a little annoying, but at least AssertionResult has good support for usage in EXPECT_TRUE(...)
.
// } | ||
} | ||
|
||
void reserve(size_t v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this public method?
On vector reserve() has at least two purposes; (a) performance and (b) iterator and pointer stability: if you reserve space for N elements it is guaranteed that you can push back N elements without breaking pointer stability.
If have the same or different or no guarantee around that we should document it.
We should document what it does on the performance size too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to make things easier conceptually (especially for new folks) we should make the exact same guarantee as std::vector
. To me this class should be "std::vector" but under the hood doesn't have oversized allocs.
void reserve(size_t v) { | ||
// For fixed size fragments we noop, as we already reserve the full size | ||
// of vector | ||
if constexpr (is_chunked_vector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just make this method available only on the fixed-size vector if it does nothing in the dynamic variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to reverse that? How would propose to do this? Make a subclass for the dynamic variant? I'm honestly hoping that the non-dynamic version goes away eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant to reverse that.
You could use SFINAE/enable_if but the easiest is probably just static_assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see suggestion above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly hoping that the non-dynamic version goes away eventually.
hard agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a static assert for now.
@@ -320,15 +426,24 @@ class fragmented_vector { | |||
|
|||
private: | |||
void maybe_add_capacity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for chunked vector this relies on the first fragment going up to exactly the right amount without our explicit control?
Can't we just manage memory like we did before, with _capacity
but with an additional condition when there is 1 frag && is_chunked_vector?
This would "solve" the problem mentioned in shrink_to_fit, and the over-allocation in reserve()
which both can waste a fair amount of memory: having reserve
is a really nice feature of chunked_vector.
This would also solve the problem on relying on the exactly behavior of std::vector
to avoid overshoot: I'm not sure how possible that is especially as we add methods: sure it may double in general when growing, but with other types of mutation it seems hard to predict exactly what the vector will do (plus implementation may due some rounding/adjusting as we do ).
I'm thinking something along these lines:
void add_capacity() {
if constexpr (is_chunked_vector) {
if (
_frags.size() == 1 && _frags.back().capacity() < elems_per_frag) {
auto new_cap = std::min(
elems_per_frag, _frags.back().capacity() * 2);
_capacity += new_cap - _frags.back().capacity();
_frags.back().reserve(new_cap);
return;
} else if (_frags.empty()) {
_capacity = 10;
_frags.emplace_back().reserve(_capacity);
return;
}
}
_frags.emplace_back().reserve(elems_per_frag);
_capacity += elems_per_frag;
}
[[gnu::always_inline]] void maybe_add_capacity() {
if (unlikely(_size == _capacity)) {
add_capacity();
}
}
So yeah the downside is extra complexity in this method (but less elsewhere, maybe).
10 should be replaced by a constexpr
initial capacity which is like how many elements fit in 32 bytes or something like that since doing many allocations for the first elements, when they are small is quite wasteful since the memory savings is slow as there is already a significant fixed overhead in the structure itself.
This benchmarks about the same as the existing one (using Lifo and Fifo tests only), though some of that may be related to other changes I made (emplace_back()
over push_back(std::move(frag))
does save time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for chunked vector this relies on the first fragment going up to exactly the right amount without our explicit control?
Yupp. There are tests that the "right thing" happens, but agreed anything seems fragile that isn't re-creating the inner container to not be std::vector but something custom. That feels like a risker change that would probably take more care and time than I can devote at the moment.
but with other types of mutation it seems hard to predict exactly what the vector will do (plus implementation may due some rounding/adjusting as we do ).
Yeah in general there is not garentees by the stdlib, but practically we can test (or just not use the stdlib).
I've adapted this method, I think it is less error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would "solve" the problem mentioned in shrink_to_fit
Only when calling shrink_to_fit on _frags.size() == 1
, otherwise if you have 5 fragments allocated, call shrink to fit on the last and then push back, you could get an overalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rockwotj wrote:
Yupp. There are tests that the "right thing" happens, but agreed anything seems fragile that isn't re-creating the inner container to not be std::vector but something custom.
Right, I don't want you to re-create the inner vector, but I think you can handle it by managing capacity along the lines I suggested. Current push looks good on this aspect modulo comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, the main thing is about the way we maintain the capacity of the first vector.
Introduce a fragmented_vector that doubles in size for the first chunk, then after that allocates full chunks. The chunk size is always fixed to our maximum recommended allocation size (128KB). This implmentation relies on the underlying vector using a 2x (elements) growth rate for it's underlying capacity, as to ensure we never overshoot our max alloc size. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This gives us a nice 2x speed up. I originally had kept the bounds checks in debug mode, but ASAN should be catching bad memory accesses in debug mode, so this seems simpler. Benchmarks show this brings fragmented_vector much closer to std::vector in terms of performance for use cases like sorting which does a lot of memory accesses. | test | iterations | median | mad | min | max | allocs | tasks | inst | | - | - | - | - | - | - | - | - | - | | VectorBenchTest_std_vector_int64_t_64.Sort | 619721 | 858.159ns | 1.013ns | 856.948ns | 862.491ns | 0.000 | 0.000 | 4197.4 | | VectorBenchTest_std_vector_int64_t_64.Fifo | 1885505 | 296.765ns | 0.041ns | 296.658ns | 296.954ns | 7.000 | 0.000 | 2353.0 | | VectorBenchTest_std_vector_int64_t_64.Lifo | 1900511 | 288.742ns | 0.237ns | 288.453ns | 289.003ns | 7.000 | 0.000 | 2094.0 | | VectorBenchTest_std_vector_int64_t_64.Fill | 1921621 | 285.968ns | 0.203ns | 285.764ns | 286.674ns | 7.000 | 0.000 | 2003.0 | | VectorBenchTest_std_vector_int64_t_64.RandomAccess | 186419 | 421.798ns | 0.119ns | 421.678ns | 422.006ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Sort | 495515 | 1.263us | 1.857ns | 1.258us | 1.265us | 0.000 | 0.000 | 12735.3 | | VectorBenchTest_fragmented_vector_int64_t_64.Fifo | 1814534 | 312.678ns | 0.184ns | 312.092ns | 312.862ns | 2.000 | 0.000 | 2088.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Lifo | 1809837 | 316.770ns | 0.253ns | 316.261ns | 317.568ns | 2.000 | 0.000 | 2009.0 | | VectorBenchTest_fragmented_vector_int64_t_64.Fill | 1903485 | 286.634ns | 0.076ns | 286.042ns | 286.765ns | 2.000 | 0.000 | 1378.0 | | VectorBenchTest_fragmented_vector_int64_t_64.RandomAccess | 152349 | 571.928ns | 0.104ns | 571.699ns | 572.054ns | 0.000 | 0.000 | 10271.0 | | VectorBenchTest_chunked_vector_int64_t_64.Sort | 490492 | 1.264us | 0.166ns | 1.263us | 1.266us | 0.000 | 0.000 | 12734.8 | | VectorBenchTest_chunked_vector_int64_t_64.Fifo | 1668653 | 359.417ns | 0.290ns | 358.853ns | 359.707ns | 8.000 | 0.000 | 3210.0 | | VectorBenchTest_chunked_vector_int64_t_64.Lifo | 1658706 | 367.894ns | 0.506ns | 367.374ns | 368.564ns | 8.000 | 0.000 | 3133.0 | | VectorBenchTest_chunked_vector_int64_t_64.Fill | 1709065 | 343.436ns | 0.115ns | 343.321ns | 343.971ns | 8.000 | 0.000 | 2500.0 | | VectorBenchTest_chunked_vector_int64_t_64.RandomAccess | 148856 | 578.044ns | 0.604ns | 575.469ns | 578.648ns | 0.000 | 0.000 | 10271.0 | | VectorBenchTest_std_vector_sstring_64.Sort | 95732 | 3.290us | 3.222ns | 3.285us | 3.294us | 0.000 | 0.000 | 17910.5 | | VectorBenchTest_std_vector_sstring_64.Fifo | 1036811 | 497.699ns | 0.235ns | 497.464ns | 499.942ns | 56.229 | 0.000 | 6476.7 | | VectorBenchTest_std_vector_sstring_64.Lifo | 1045502 | 614.899ns | 0.241ns | 613.723ns | 615.140ns | 56.242 | 0.000 | 9099.2 | | VectorBenchTest_std_vector_sstring_64.Fill | 1042004 | 492.807ns | 0.280ns | 492.325ns | 493.204ns | 56.238 | 0.000 | 6213.3 | | VectorBenchTest_std_vector_sstring_64.RandomAccess | 84856 | 481.779ns | 0.209ns | 481.570ns | 482.213ns | 0.000 | 0.000 | 5262.0 | | VectorBenchTest_fragmented_vector_sstring_64.Sort | 91172 | 3.871us | 1.650ns | 3.864us | 3.877us | 0.000 | 0.000 | 24548.4 | | VectorBenchTest_fragmented_vector_sstring_64.Fifo | 1066499 | 463.996ns | 0.204ns | 463.711ns | 464.439ns | 51.250 | 0.000 | 5587.5 | | VectorBenchTest_fragmented_vector_sstring_64.Lifo | 1056011 | 591.878ns | 0.246ns | 591.633ns | 592.681ns | 51.228 | 0.000 | 8258.0 | | VectorBenchTest_fragmented_vector_sstring_64.Fill | 1096842 | 441.743ns | 0.319ns | 441.182ns | 442.062ns | 51.225 | 0.000 | 4875.8 | | VectorBenchTest_fragmented_vector_sstring_64.RandomAccess | 77476 | 581.115ns | 0.578ns | 580.537ns | 1.008us | 0.000 | 0.000 | 10268.0 | | VectorBenchTest_chunked_vector_sstring_64.Sort | 90034 | 3.860us | 1.299ns | 3.858us | 3.864us | 0.000 | 0.000 | 24544.3 | | VectorBenchTest_chunked_vector_sstring_64.Fifo | 971086 | 553.555ns | 0.391ns | 552.819ns | 554.117ns | 57.215 | 0.000 | 7541.8 | | VectorBenchTest_chunked_vector_sstring_64.Lifo | 957865 | 691.826ns | 0.149ns | 691.193ns | 692.049ns | 57.227 | 0.000 | 10211.4 | | VectorBenchTest_chunked_vector_sstring_64.Fill | 1002811 | 525.656ns | 0.331ns | 525.299ns | 526.156ns | 57.243 | 0.000 | 6715.7 | | VectorBenchTest_chunked_vector_sstring_64.RandomAccess | 76972 | 570.100ns | 3.326ns | 566.773ns | 581.070ns | 0.000 | 0.000 | 10268.0 | | VectorBenchTest_std_vector_large_struct_64.Sort | 29219 | 3.273us | 1.440ns | 3.265us | 3.274us | 0.000 | 0.000 | 24874.6 | | VectorBenchTest_std_vector_large_struct_64.Fifo | 382700 | 1.354us | 0.382ns | 1.353us | 1.354us | 172.371 | 0.000 | 20348.4 | | VectorBenchTest_std_vector_large_struct_64.Lifo | 387417 | 1.811us | 0.572ns | 1.809us | 1.814us | 172.378 | 0.000 | 29969.9 | | VectorBenchTest_std_vector_large_struct_64.Fill | 382464 | 1.338us | 0.325ns | 1.336us | 1.339us | 172.386 | 0.000 | 20163.7 | | VectorBenchTest_std_vector_large_struct_64.RandomAccess | 27106 | 468.947ns | 0.211ns | 462.696ns | 469.159ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_large_struct_64.Sort | 28921 | 3.902us | 0.474ns | 3.901us | 3.914us | 0.000 | 0.000 | 32531.2 | | VectorBenchTest_fragmented_vector_large_struct_64.Fifo | 426410 | 1.085us | 0.908ns | 1.083us | 1.088us | 167.426 | 0.000 | 17422.4 | | VectorBenchTest_fragmented_vector_large_struct_64.Lifo | 438059 | 1.527us | 1.565ns | 1.523us | 1.529us | 167.343 | 0.000 | 26984.5 | | VectorBenchTest_fragmented_vector_large_struct_64.Fill | 437154 | 1.034us | 1.030ns | 1.033us | 1.037us | 167.401 | 0.000 | 16710.4 | | VectorBenchTest_fragmented_vector_large_struct_64.RandomAccess | 27493 | 672.081ns | 0.348ns | 671.147ns | 672.773ns | 0.000 | 0.000 | 11271.0 | | VectorBenchTest_chunked_vector_large_struct_64.Sort | 28539 | 3.887us | 1.106ns | 3.885us | 3.890us | 0.000 | 0.000 | 32522.5 | | VectorBenchTest_chunked_vector_large_struct_64.Fifo | 366534 | 1.443us | 1.211ns | 1.441us | 1.446us | 173.427 | 0.000 | 21269.1 | | VectorBenchTest_chunked_vector_large_struct_64.Lifo | 383031 | 1.852us | 1.293ns | 1.850us | 1.853us | 173.419 | 0.000 | 30819.1 | | VectorBenchTest_chunked_vector_large_struct_64.Fill | 378747 | 1.370us | 2.722ns | 1.368us | 1.373us | 173.451 | 0.000 | 20561.1 | | VectorBenchTest_chunked_vector_large_struct_64.RandomAccess | 27171 | 683.280ns | 0.134ns | 682.664ns | 683.465ns | 0.000 | 0.000 | 11271.0 | | VectorBenchTest_std_vector_int64_t_10000.Sort | 3130 | 239.962us | 317.335ns | 239.604us | 240.447us | 0.000 | 0.000 | 1140047.5 | | VectorBenchTest_std_vector_int64_t_10000.Fifo | 128187 | 7.531us | 0.366ns | 7.531us | 7.532us | 15.000 | 0.000 | 152371.0 | | VectorBenchTest_std_vector_int64_t_10000.Lifo | 168081 | 5.681us | 0.461ns | 5.680us | 5.682us | 15.000 | 0.000 | 112368.0 | | VectorBenchTest_std_vector_int64_t_10000.Fill | 167748 | 5.690us | 0.736ns | 5.690us | 5.692us | 15.000 | 0.000 | 102309.0 | | VectorBenchTest_std_vector_int64_t_10000.RandomAccess | 11234 | 575.352ns | 0.385ns | 574.280ns | 575.921ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_10000.Sort | 2109 | 394.940us | 202.817ns | 394.671us | 395.142us | 0.000 | 0.000 | 3664438.3 | | VectorBenchTest_fragmented_vector_int64_t_10000.Fifo | 98692 | 9.837us | 2.170ns | 9.835us | 9.850us | 15.000 | 0.000 | 261844.0 | | VectorBenchTest_fragmented_vector_int64_t_10000.Lifo | 91114 | 10.706us | 0.342ns | 10.706us | 10.707us | 15.000 | 0.000 | 242348.1 | | VectorBenchTest_fragmented_vector_int64_t_10000.Fill | 138925 | 6.909us | 0.564ns | 6.906us | 6.911us | 15.000 | 0.000 | 151842.0 | | VectorBenchTest_fragmented_vector_int64_t_10000.RandomAccess | 11242 | 643.697ns | 0.752ns | 642.945ns | 645.730ns | 0.000 | 0.000 | 10271.0 | | VectorBenchTest_chunked_vector_int64_t_10000.Sort | 2106 | 393.988us | 5.436ns | 393.982us | 394.504us | 0.000 | 0.000 | 3664458.7 | | VectorBenchTest_chunked_vector_int64_t_10000.Fifo | 84284 | 11.591us | 1.012ns | 11.589us | 11.592us | 16.000 | 0.000 | 291948.1 | | VectorBenchTest_chunked_vector_int64_t_10000.Lifo | 79283 | 12.342us | 0.802ns | 12.340us | 12.344us | 16.000 | 0.000 | 272106.1 | | VectorBenchTest_chunked_vector_int64_t_10000.Fill | 113322 | 8.566us | 4.252ns | 8.561us | 8.570us | 16.000 | 0.000 | 181942.0 | | VectorBenchTest_chunked_vector_int64_t_10000.RandomAccess | 11072 | 662.164ns | 1.176ns | 660.987ns | 672.008ns | 0.000 | 0.000 | 10271.0 | | VectorBenchTest_std_vector_sstring_10000.Sort | 468 | 1.038ms | 653.985ns | 1.035ms | 1.039ms | 0.000 | 0.000 | 6183699.5 | | VectorBenchTest_std_vector_sstring_10000.Fifo | 12562 | 46.618us | 243.543ns | 46.147us | 46.889us | 7693.554 | 0.000 | 989071.9 | | VectorBenchTest_std_vector_sstring_10000.Lifo | 12527 | 79.748us | 83.624ns | 79.426us | 79.946us | 7697.765 | 0.000 | 1629339.0 | | VectorBenchTest_std_vector_sstring_10000.Fill | 12727 | 46.292us | 84.832ns | 45.903us | 57.945us | 7736.223 | 0.000 | 952464.1 | | VectorBenchTest_std_vector_sstring_10000.RandomAccess | 909 | 752.008ns | 6.272ns | 745.736ns | 761.407ns | 0.000 | 0.000 | 5262.0 | | VectorBenchTest_fragmented_vector_sstring_10000.Sort | 426 | 1.252ms | 491.293ns | 1.249ms | 1.252ms | 0.000 | 0.000 | 8534702.6 | | VectorBenchTest_fragmented_vector_sstring_10000.Fifo | 14012 | 38.489us | 66.605ns | 38.333us | 38.555us | 7716.408 | 0.000 | 873580.7 | | VectorBenchTest_fragmented_vector_sstring_10000.Lifo | 13401 | 74.579us | 90.594ns | 73.654us | 74.854us | 7734.678 | 0.000 | 1518865.6 | | VectorBenchTest_fragmented_vector_sstring_10000.Fill | 14210 | 37.138us | 195.759ns | 36.796us | 37.489us | 7724.522 | 0.000 | 764263.9 | | VectorBenchTest_fragmented_vector_sstring_10000.RandomAccess | 914 | 790.419ns | 0.744ns | 789.675ns | 792.869ns | 0.000 | 0.000 | 10268.0 | | VectorBenchTest_chunked_vector_sstring_10000.Sort | 426 | 1.244ms | 1.474us | 1.240ms | 1.246ms | 0.000 | 0.000 | 8541119.6 | | VectorBenchTest_chunked_vector_sstring_10000.Fifo | 12124 | 49.219us | 143.253ns | 48.958us | 49.363us | 7705.552 | 0.000 | 1016646.8 | | VectorBenchTest_chunked_vector_sstring_10000.Lifo | 11977 | 82.392us | 67.740ns | 82.072us | 82.459us | 7679.186 | 0.000 | 1653990.9 | | VectorBenchTest_chunked_vector_sstring_10000.Fill | 13630 | 41.786us | 521.551ns | 40.338us | 42.308us | 7704.601 | 0.000 | 886772.1 | | VectorBenchTest_chunked_vector_sstring_10000.RandomAccess | 909 | 783.381ns | 3.446ns | 779.935ns | 789.689ns | 0.000 | 0.000 | 10268.0 | | VectorBenchTest_std_vector_large_struct_10000.Sort | 163 | 1.221ms | 558.037ns | 1.220ms | 1.222ms | 0.000 | 0.000 | 9232913.6 | | VectorBenchTest_std_vector_large_struct_10000.Fifo | 3297 | 185.445us | 435.807ns | 185.009us | 186.338us | 25748.091 | 0.000 | 3423035.8 | | VectorBenchTest_std_vector_large_struct_10000.Lifo | 3324 | 300.556us | 1.159us | 298.931us | 304.622us | 25951.823 | 0.000 | 5584114.3 | | VectorBenchTest_std_vector_large_struct_10000.Fill | 3288 | 189.389us | 99.941ns | 187.857us | 189.769us | 25870.839 | 0.000 | 3403211.8 | | VectorBenchTest_std_vector_large_struct_10000.RandomAccess | 204 | 758.971ns | 5.123ns | 753.848ns | 774.863ns | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_large_struct_10000.Sort | 159 | 1.468ms | 1.424us | 1.465ms | 1.470ms | 0.000 | 0.000 | 12256841.7 | | VectorBenchTest_fragmented_vector_large_struct_10000.Fifo | 3942 | 131.888us | 317.022ns | 130.926us | 132.409us | 26036.117 | 0.000 | 2837780.9 | | VectorBenchTest_fragmented_vector_large_struct_10000.Lifo | 3990 | 252.455us | 525.213ns | 249.613us | 252.980us | 26051.714 | 0.000 | 4976153.5 | | VectorBenchTest_fragmented_vector_large_struct_10000.Fill | 4048 | 128.603us | 545.802ns | 127.828us | 129.148us | 26114.617 | 0.000 | 2734305.9 | | VectorBenchTest_fragmented_vector_large_struct_10000.RandomAccess | 207 | 1.025us | 13.585ns | 1.010us | 1.043us | 0.000 | 0.000 | 11271.0 | | VectorBenchTest_chunked_vector_large_struct_10000.Sort | 158 | 1.456ms | 498.209ns | 1.454ms | 1.458ms | 0.000 | 0.000 | 12254418.1 | | VectorBenchTest_chunked_vector_large_struct_10000.Fifo | 4025 | 133.432us | 149.170ns | 132.473us | 134.499us | 25875.435 | 0.000 | 2867653.3 | | VectorBenchTest_chunked_vector_large_struct_10000.Lifo | 3982 | 249.761us | 1.624us | 246.874us | 254.130us | 25873.820 | 0.000 | 4990941.0 | | VectorBenchTest_chunked_vector_large_struct_10000.Fill | 3967 | 132.043us | 288.577ns | 131.755us | 133.324us | 25914.085 | 0.000 | 2760626.2 | | VectorBenchTest_chunked_vector_large_struct_10000.RandomAccess | 206 | 970.743ns | 0.791ns | 969.951ns | 1.014us | 0.000 | 0.000 | 11271.0 | | VectorBenchTest_std_vector_int64_t_1048576.Sort | 22 | 38.668ms | 4.663us | 38.621ms | 38.683ms | 0.000 | 0.000 | 169825784.2 | | VectorBenchTest_std_vector_int64_t_1048576.Fifo | 1386 | 721.083us | 107.830ns | 720.975us | 722.157us | 21.000 | 0.000 | 14424265.9 | | VectorBenchTest_std_vector_int64_t_1048576.Lifo | 1902 | 525.095us | 746.150ns | 524.348us | 530.522us | 21.000 | 0.000 | 10229958.4 | | VectorBenchTest_std_vector_int64_t_1048576.Fill | 1899 | 526.696us | 66.866ns | 525.806us | 526.763us | 21.000 | 0.000 | 9181299.4 | | VectorBenchTest_std_vector_int64_t_1048576.RandomAccess | 122 | 1.115us | 14.066ns | 1.101us | 1.164us | 0.000 | 0.000 | 5266.0 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Sort | 14 | 63.517ms | 91.118us | 63.403ms | 63.824ms | 0.000 | 0.000 | 549816788.1 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Fifo | 979 | 1.017ms | 344.919ns | 1.016ms | 1.017ms | 1035.000 | 0.000 | 27409585.6 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Lifo | 903 | 1.110ms | 178.002ns | 1.110ms | 1.111ms | 1035.000 | 0.000 | 2542249.4 | | VectorBenchTest_fragmented_vector_int64_t_1048576.Fill | 1404 | 707.385us | 468.632ns | 706.122us | 707.854us | 1035.000 | 0.000 | 15875252.8 | | VectorBenchTest_fragmented_vector_int64_t_1048576.RandomAccess | 124 | 1.446us | 6.992ns | 1.434us | 1.460us | 0.000 | 0.000 | 10271.0 | | VectorBenchTest_chunked_vector_int64_t_1048576.Sort | 14 | 63.276ms | 25.314us | 63.239ms | 63.320ms | 0.000 | 0.000 | 549948277.3 | | VectorBenchTest_chunked_vector_int64_t_1048576.Fifo | 986 | 1.014ms | 319.595ns | 1.013ms | 1.014ms | 85.000 | 0.000 | 27312181.6 | | VectorBenchTest_chunked_vector_int64_t_1048576.Lifo | 916 | 1.092ms | 210.484ns | 1.090ms | 1.092ms | 85.000 | 0.000 | 25225868.8 | | VectorBenchTest_chunked_vector_int64_t_1048576.Fill | 1436 | 695.407us | 222.157ns | 695.155us | 696.050us | 85.000 | 0.000 | 15777838.8 | | VectorBenchTest_chunked_vector_int64_t_1048576.RandomAccess | 124 | 1.190us | 26.806ns | 1.163us | 1.247us | 0.000 | 0.000 | 10271.0 | Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Force push: Respond to Travis' comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There's the oustanding discussion about whether to "remove" reserve()
for !is_chunked_vector
but I think it's a nit so add the static assert or not, up to you.
Excited about this change! |
I removed it. Thanks for the review! I'm excited too, and hopefully we can encourage usage of this everywhere we've been using chunked_fifo or the static fragmented_vector. |
@travisdowns I had to revert the static_assert due to it breaking serde and some template magic that looks for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Introduce chunked_vector: like a fragmented_vector but with a better allocation profile for small vectors.
Additionally improve the implementation by making indexing faster (removing bounds checks) and adding a debug mode that checks for invalid iterator usage.
Backports Required
Release Notes