-
Notifications
You must be signed in to change notification settings - Fork 1.7k
data: perform downsampling in multiplexer provider #3272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: We add a `sampling_hints` attribute to the `TBContext` magic container, which is populated with the parsed form of the `--samples_per_plugin` flag. Existing plugins’ generic data modes are updated to read from this map instead of using hard-coded thresholds. Test Plan: This change is not actually observable as is, because the multiplexer data provider ignores its downsampling argument. But after patching in a change to make the data provider respect the downsampling argument, this change has the effect that increasing the `--samples_per_plugin` over the default (e.g., `images=20`) now properly increases the number of samples shown in generic data mode, whereas previously it had no effect. wchargin-branch: data-downsampling-flag wchargin-source: 50998be15abd790a0915458bac76091c79823f0f
Summary:
The `MultiplexerDataProvider` now respects its `downsample` parameter,
even though the backing `PluginEventMultiplexer` already performs its
own sampling. This serves two purposes:
- It enforces that clients are always specifying the `downsample`
argument, which is required.
- It enables us to test plugins’ downsampling parameters to verify
that they will behave correctly with other data providers.
Test Plan:
Unit tests included. Note that changing the `_DEFAULT_DOWNSAMPLING`
constant in (e.g.) the scalars plugin to a small number (like `5`) now
actually causes charts in the frontend to be downsampled.
wchargin-branch: data-mux-downsample
wchargin-source: 116ce4c206613e25e09fec31102b90ad80282496
|
(Reassigning just to spread load.) |
stephanwlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It enforces that clients are always specifying the downsample
argument, which is required.
Is this saying all data_provider, in the future, will require downsample argument? Or is it only true for MultiplexerDataProvider?
|
|
||
| def test_inorder(self): | ||
| xs = list(range(10000)) | ||
| actual = data_provider._downsample(xs, k=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No AI. Simply noting: there is non-zero chance where this passes even if it behaved more like random.sample. :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but that chance is 1 / (100!), which is somewhat small.
wchargin-branch: data-mux-downsample wchargin-source: c6edc8bfebc0d4c040c346e8ca9729e399512f25
wchargin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick reviews!
Is this saying all data_provider, in the future, will require
downsampleargument? Or is it only true forMultiplexerDataProvider?
It’s required by the data provider interface (e.g., read_scalars),
so any particular data provider implementation is permitted to not
require it, but code that is written against arbitrary data providers
must provide it.
|
|
||
| def test_inorder(self): | ||
| xs = list(range(10000)) | ||
| actual = data_provider._downsample(xs, k=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but that chance is 1 / (100!), which is somewhat small.
Summary: As the comment says, the histograms plugin used to manually downsample because the multiplexer data provider didn’t. As of #3272, it does, so this code is vestigial. Test Plan: Note that changing the default histogram sampling constants (in the same file) to very small numbers causes the frontend to show less data. Unit tests for the histograms plugin also cover this: removing downsampling from the multiplexer data provider causes the histogram tests to fail. wchargin-branch: histogram-rm-downsample wchargin-source: 33928fd1feb314d955ed06f25df5067c39d7f779
Summary: As the comment says, the histograms plugin used to manually downsample because the multiplexer data provider didn’t. As of #3272, it does, so this code is vestigial. Test Plan: Note that changing the default histogram sampling constants (in the same file) to very small numbers causes the frontend to show less data. Unit tests for the histograms plugin also cover this: removing downsampling from the multiplexer data provider causes the histogram tests to fail. wchargin-branch: histogram-rm-downsample
Summary:
The
MultiplexerDataProvidernow respects itsdownsampleparameter,even though the backing
PluginEventMultiplexeralready performs itsown sampling. This serves two purposes:
downsampleargument, which is required.
that they will behave correctly with other data providers.
Test Plan:
Unit tests included. Note that changing the
_DEFAULT_DOWNSAMPLINGconstant in (e.g.) the scalars plugin to a small number (like
5) nowactually causes charts in the frontend to be downsampled.
wchargin-branch: data-mux-downsample