-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry-pick #16849 to 7.x: [Winlogbeat] Add experimental event log reader with increased performance #17022
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pinging @elastic/siem (Team:SIEM) |
leehinman
approved these changes
Mar 18, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One of the files I was using for testing also needed a backport so I opened: #17090 |
…#16849) This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s (cherry picked from commit d81ef73)
andrewkroh
force-pushed
the
backport_16849_7.x
branch
from
March 18, 2020 23:55
232c6f8
to
b328f3f
Compare
jenkins, test this |
2 similar comments
jenkins, test this |
jenkins, test this |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick of PR #16849 to 7.x branch. Original message:
What does this PR do?
This PR adds a new event log reader implementation that's behind a
feature flag for now. It achieves higher event throughput than the current
reader by not using XML and by heavily caching static metadata about events.
To enable it add
api
to each event log reader.The existing reader requests each event as XML and then must unmarshal the XML
document. EvtFormatMessage is used to get the XML document from Windows. Then the
Go stdlib encoder/xml package is used to parse it. Both of these operations are
relatively slow (see golang/go#21823 about encoding/xml).
This new reader utilizes the publisher metadata APIs to fetch and cache metadata
about all event IDs associated with a provider. It does this the first time it
encounters a provider ID while reading events. Risk: Caching this info could
lead to having stale information in memory if metadata changes via software
update (see Edge Cases). It caches the names of the event data parameters
and a templatized version of the message string.
To get the data for an event this reader receives EVT_VARIANT structs containing
the parameters rather than receiving and parsing XML. This is more efficient because
there are fewer and smaller memory allocations and no XML encoding or decoding.
To get the message for an event it utilizes the cached text/template it has
for the event ID and passes it the list of parameter values.
Edge Cases
There is no provider metadata installed on the host. Could happen for forwarded
events or reading from .evtx files.
the message with EvtFormatMessage for each event.
Software is updated and an event ID changes it's event data parameters. Saw this
between Sysmon versions 9 and 10 with event ID 5.
names.
Benchmark Comparison
Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations.
Why is it important?
Some users were experiencing a bottleneck with Winlogbeat reading events. Most often this was with the Windows Event Collector architecture where all events were configured to flow into a single machine and single channel. Eventually this could lead to a data loss if the rates were sustained for so long that Winlogbeat could not catch up and log rotation occurred.
Checklist
I have made corresponding change to the default configuration filesN/AAuthor's Checklist
include_xml
. I documented this as not supported.Author's Other Notes
How to test this PR locally
Related issues
Screenshots
CPU Flame Graphs
Before
After
Mem Flame Graphs
Before
After