On Windows, Space is incorrectly given 1970 start time #906

philrz · 2020-06-13T20:07:46Z

I'm pretty sure I've seen this issue in the past, but I've only now come up with a formula for reproducing it with some reliability. This repro was in GA Brim tagged v0.11.0 talking to zqd tagged v0.15.0. I've only been able to reproduce it on Windows. However, because it doesn't repro 100% of the time, it makes me uncertain if it's Windows-specific in a code-centric way or if it's just down to timing, i.e. somehow a side-effect of things running a bit slower on Windows.

I've found I can reliably reproduce this with the (uncompressed) pcap at https://archive.wrccdc.org/pcaps/2018/wrccdc.2018-03-23.010014000000000.pcap.gz. The trick seems to be dragging the pcap into the first tab, then quickly dragging another pcap into one or more other tabs before bouncing back to the first tab. I've previously repro'ed it by dragging the wrccdc pcap into the first tab and a different pcap into a second tab, but since it repros with the same wrccdc pcap used over & over, I've stuck with that here for simplicity.

As you can see in the attached video, what happens is that the start time for the created Space incorrectly shows a start time of 1970. We've seen this happen when time-free Zeek events have been generated that get assigned a timestamp of ts=0, but that's not what's going on here, evidenced by the fact that:

If you let the pcap load on its own in a single tab without disturbing it (not shown in the video, but you can do this on your own), you never get any such events.
Even with the 1970 start times shown on all four tabs as in the video, once they were all done loading, I re-dragged one of their all.zng files into yet another tab and the Start time in the picker did not show 1970, indicating no such ts=0 events were found.

In the video I show an error that seems to pop up in Dev Tools. I also show in the video that the info.json created by zqd shows a min_time of 0, so this seems to absolve the Brim app from any wrongdoing.

Once again, I'll caution that this is not reproducible 100% of the time, so don't be surprised if you get it wrong on the first try. Since it took me a while to come up with this recipe and I respect that others may struggle to get it right, I'm game to do more repros to help with debug if people tell me what they need done.

Repro.mp4.zip

The text was updated successfully, but these errors were encountered:

mikesbrown · 2020-06-16T20:28:23Z

This can happen in CI conditions too, e.g., https://github.com/brimsec/brim/runs/777954531 . It manifests itself with the histogram test failing because it's not rendered properly. Screen shot shows a histogram starting in 1970.

I unpacked the artifacts and found the Brim directory and space directory for the offending space (the histogram test). You can put its contents into the brim.git/run directory and run npm run electron to see the problem. Also,

$ jq . run/data/spaces/sp_1dPrjZaiZdAaduXZauBV5rKBoyn/info.json
{
  "min_time": {
    "sec": 0,
    "ns": 0
  },
  "max_time": {
    "sec": 1582646597,
    "ns": 838527001
  }
}

906.zip if you want it.

philrz · 2020-06-16T20:37:57Z

Thanks @mikesbrown... that's surely good info to know. As long as we're piling on evidence, I'll share something else CI-related I mentioned to @mattnibs and @alfred-landrum offline. At one point when I trying lots of variations to see how reliably I could reproduce the "1970" symptom, I managed to trigger an "Access is Denied" message similar to the one described in #613. It only happened that one time, and maybe it was a coincidence and has a separate root cause. Or... maybe we're lucky & they have the same root cause. I know @alfred-landrum has been unable to repro #613 when he's added extra logging in branches. So maybe we'll fix this issue and find the other will go away as well. 🤞

philrz · 2020-06-22T21:19:55Z

Now that #613 has a fix, I created a current Windows artifact to see if this issue was chased away, but alas, it's still with us. A note-to-self or others who may try to repro in the future: It seems like deleting the appState.json each time I close & re-launch Brim (i.e. simulating a user coming to the app "fresh" for the very first time) is helpful in getting a more reliable repro.

mikesbrown · 2020-06-24T20:33:02Z

Happened here https://github.com/brimsec/brim/runs/804911944?check_suite_focus=true

alfred-landrum · 2020-06-25T00:22:53Z

The root cause here is:

when a pcap is ingested, we first set the time range for the space using pcap index data, before we actually ingest any records
when we begin feeding the pcap data to zeek, there can be a brief moment where there are zeek log files (dns.log, etc) present, but empty
when we read the empty files, we then tried to update the space time range by joining it with the existing time (from the pcap) with a zero time (because no records were read)

@alfred-landrum

… no records" by alfred-landrum This is an auto-generated commit with a zq dependency update. The zq PR brimdata/super#938, authored by @alfred-landrum, has been merged. dont update the filestore span from a rewrite with no records Closes brimdata/super#906

philrz · 2020-06-25T22:37:12Z

Verified on Brim commit 26332a9 talking to zqd commit d04c83e.

With this class of bug that has an unreliable repro, it's always a little tough to know for sure that we've chased it away. However, the approach of deleting appState.json before each launch as I described in my last comment seemed to give a reliable repro, so here with a newly-created Windows artifact I did five repro attempts in a row using that approach and never saw the symptom again. So between that and having faith in the analysis that led to the fix, I think we're good. 😄

Thanks @alfred-landrum!

philrz added the bug Something isn't working label Jun 13, 2020

philrz added this to the Brim v0.12.0 milestone Jun 15, 2020

mikesbrown mentioned this issue Jun 16, 2020

zq update through "Groupby: remove deterministic output order" by henridf brimdata/zui#877

Merged

alfred-landrum self-assigned this Jun 24, 2020

alfred-landrum mentioned this issue Jun 24, 2020

dont update the filestore span from a rewrite with no records #938

Merged

alfred-landrum closed this as completed in #938 Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On Windows, Space is incorrectly given 1970 start time #906

On Windows, Space is incorrectly given 1970 start time #906

philrz commented Jun 13, 2020

mikesbrown commented Jun 16, 2020

philrz commented Jun 16, 2020

philrz commented Jun 22, 2020

mikesbrown commented Jun 24, 2020

alfred-landrum commented Jun 25, 2020

philrz commented Jun 25, 2020

On Windows, Space is incorrectly given 1970 start time #906

On Windows, Space is incorrectly given 1970 start time #906

Comments

philrz commented Jun 13, 2020

mikesbrown commented Jun 16, 2020

philrz commented Jun 16, 2020

philrz commented Jun 22, 2020

mikesbrown commented Jun 24, 2020

alfred-landrum commented Jun 25, 2020

philrz commented Jun 25, 2020