Challenges visualizing pool that contains keyless records #2516

philrz · 2022-09-02T20:45:43Z

Repro is with Brim commit ef351e6.

The issues described here are not new. However, they become relevant again now that the Zeek bar chart is restored (#2472), we'll likely need to consider this for other types of data (#1541, #1540), and we have more powerful Zed concepts to apply to the problem space.

Recall our old friend the Zeek loaded_scripts log. These records reflect which Zeek scripts were in effect at the time packets were processed by Zeek to generate parsed protocol logs. By their nature they lack individual ts timestamp values. The Brimcap-bundled Zeek that's shipped with the Brim/Zui app actually filters these out before importing logs to the Zed lake precisely because they cause the problems covered here. However, as they're a common part of Zeek log output, non-pcap users that bring their own Zeek logs are likely to sometimes import these, either intentionally or accidentally. Also, at a higher level, such "keyless data" is going to be something we'll keep encountering as users bring increasingly diverse data sets to the tools. (See brimdata/super#3803 for some Zed-level thoughts on keyless data.)

Given all that, consider the experience in the video below taken in current Brim. The ZNG data associated with a small pcap is imported to form a pool and we see the populated bar chart we expect. But when a loaded_scripts.log is then added to the pool and we hit Enter to re-execute the default "whole pool" query, we see how the bar chart effectively plots these as points having a ts value in the year 1970 (i.e., start of epoch) which has the side effect of dwarfing the bar for the "real" that has recent timestamps. If the user was hip to what's going on they could set their time range to avoid the keyless data. But this is not a good UX at the moment.

Repro.mp4

If we look at the Zed layer, we can see in more detail what's going on. Here's a manual run of the query that's used to get the data to populate the bar chart.

$ zed query 'from 'ifconfig."' | count() by every(30d), _path'
{ts:2021-04-03T00:00:00Z,_path:"capture_loss",count:1(uint64)}
{ts:2021-04-03T00:00:00Z,_path:"stats",count:2(uint64)}
{ts:2021-04-03T00:00:00Z,_path:error("missing"),count:1(uint64)}
{ts:2021-04-03T00:00:00Z,_path:"files",count:1(uint64)}
{ts:2021-04-03T00:00:00Z,_path:"http",count:1(uint64)}
{ts:2021-04-03T00:00:00Z,_path:"conn",count:1(uint64)}
{ts:error("every: time arg required"),_path:"loaded_scripts",count:465(uint64)}

That bottom record explains it: The ts of the aggregation result ends up as an error value. It would be pretty easy to filter these out by appending | not is_error(ts) to the query that's used to gather data from the bar chart:

https://github.com/brimdata/brim/blob/ef351e63046be5b2bb4ee4343cb26d3eb4a085d2/src/js/state/Histogram/build-query.ts#L35

But what if the user actually has interest in knowing the counts of this keyless data relative to their "real" data? Since it doesn't always have an obvious home at any one spot on the X axis, perhaps a bar for keyless data should be prominently displayed "off to the side" in an obvious way.

This keyless data also affects the display of records in the main events window. As the video showed, in this case the keyless records landed at the bottom of the events window. This might be totally desirable in some cases, e.g., if the keyless records are effectively "noise" in the eyes of the user and hence worth keeping out of sight unless explicitly targeted in a query. OTOH, for other use cases, keyless records may be a sign of a problem (e.g., a failure of shaping logic) and the user might want them more prominently displayed, either at the top of the events window or maybe in some wholly separate panel.

In thinking more about how that might be addressed, I'm reminded of how the sort operator in Zed already has an option [-nulls first|last] that allows the user to affect where null values are displayed relative to other values, and this configurable option appears in SQL tools as well. Perhaps the Zed layer could offer something similar to let the user similarly pick where keyless records appear when outputting a pool's records in the default key-sorted order. This could then perhaps be paired with options in the app's key range picker (i.e., currently the "Time Range pin") to let the user select if keyless records should be displayed at top, bottom, or excluded entirely.

The text was updated successfully, but these errors were encountered:

philrz · 2022-09-02T22:16:07Z

In the time since I filed this issue, I've re-found existing/relevant issue brimdata/super#2088.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenges visualizing pool that contains keyless records #2516

Challenges visualizing pool that contains keyless records #2516

philrz commented Sep 2, 2022

philrz commented Sep 2, 2022

Challenges visualizing pool that contains keyless records #2516

Challenges visualizing pool that contains keyless records #2516

Comments

philrz commented Sep 2, 2022

philrz commented Sep 2, 2022