The testing data I used came from a production environment and is not suitable for sharing publicly.
Depending on your use-case you may prefer to obtain sample data in a variety of ways; one thing to consider is the effect on IP locality and the liklihood of a cache hit (although the cache is not enabled by default, as performance testing indicated it does not have a payoff). Behaviour will likely be quite different when querying client IPs from a public website compared to syslog traffic inside a datacentre where most things will have either a private address range or appear in a small range of public addresses.
If you don't already have parsed logs (eg. using grok on existing log files), then you could simply just grep for any IPs that appear in a log file.
Here's one example of that; the regular expression doesn't need to be terribly complex, depending on the log file used. In this example, I have a very large daily file containing both public IPs and private IPs.
Because the log file is so large, I only need a suitable portion to get a representative sample; I want this for performance analysis, so better to take too much rather than too less, so I'm taking the first million lines. I'll grep out each instance of what looks like an IPv4 address; there may be multiple per line; and then take the first million of those.
We will emit metrics every 100,000 rows processed.
head -n1000000 big-log-file \
| grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' \
| head -n1000000 > sample-ips.txt
You'll end up with one IP per line, which is perfect for doing a simple stdin input with a 'plain' codec.
First build the container image:
PS> docker build -t logstash-filter-mmdb:qa -f .\Dockerfile.qa .
Then you can run it either using the internal 'demo' data, which is only useful as a brief smoke-test of the container itself:
Or you provide the input data (IPs and MMDB) as mapped into the container:
(Note the below is in Powershell, and ` is like \ in Bash)
PS> docker run --name qa --rm -it `
-v ${PWD}/../private-test-data:/qa/inputs `
logstash-filter-mmdb:qa `
/qa/inputs/ip-sampling.log /qa/inputs/my-addressing.mmdb
If you don't have any demo logs or database, a simple one will be provided:
docker run --name qa --rm -it logstash-filter-mmdb:qa
The key lines of output will the ones that look like the following:
Processed 99375 events in 70ms: 0.7μs / event
Note that performance will get better as more records are processed due to the JVM Hotspot compilation etc.
You'll also notice a spot-check record being emitted every 20k rows.
If you're running docker with a TTY (ie. -it
), then these lines
will be dimmed on your terminal so they don't get in the way of the
primary lines of interest.