-
-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix multivalue ff indexing regression #1543
Conversation
fixes multivalue ff regression by avoiding using `get_val`. Line::train calls repeatedly get_val, but get_val implementation on Column for multivalues is very slow. The fix is to use the iterator instead. Longterm fix should be to remove get_val access in serialization. Old Code test fastfield::bench::bench_multi_value_ff_merge_few_segments ... bench: 46,103,960 ns/iter (+/- 2,066,083) test fastfield::bench::bench_multi_value_ff_merge_many_segments ... bench: 83,073,036 ns/iter (+/- 4,373,615) est fastfield::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 64,178,576 ns/iter (+/- 1,466,700) Current running 3 tests test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments ... bench: 57,379,523 ns/iter (+/- 3,220,787) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments ... bench: 90,831,688 ns/iter (+/- 1,445,486) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 158,313,264 ns/iter (+/- 28,823,250) With Fix running 3 tests test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments ... bench: 57,635,671 ns/iter (+/- 2,707,361) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments ... bench: 91,468,712 ns/iter (+/- 11,393,581) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 73,909,138 ns/iter (+/- 15,846,097)
sample_positions | ||
.iter() | ||
.cloned() | ||
.map(|pos| (pos, ys.get_val(pos))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much more efficient to remove the get_val
call and use the iterator here.
let step = /* .. */;
let index = (0..n).step_by(step);
let vals = column.step_by(step);
let positions_and_values = index.zip(vals);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(that requires removing Line::estimate
and call Line::train_from
directly in the codec.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This are only 20 samples, but span the whole Column, so I'm not so sure this is more efficient. It depends on the iterator implementation if it can skip efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default impl calls next()
https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#284-289, that's probably much slower for large indexes.
fixes multivalue ff regression by avoiding using
get_val
. Line::train calls repeatedly get_val, but get_val implementation on Column for multivalues is very slow. The fix is to use the iterator instead. Longterm fix should be to remove get_val access in serialization.