Provides a grok functionality for reading files from Swift: Instead of receiving the raw files, when grokked the file will be sent as a series of JSONs, one JSON per grokked line.
Text blobs are assumed.
This is mostly useful for semantially reading log files that are stored on Swift.
-
Clone this repo (
git clone https://github.com/synhershko/swift-middleware-grok.git
) -
Run
sudo python setup.py install
-
Alter your proxy-server.conf pipeline to have the grok middleware: add the
[filter:grok]
section and the grok filter itself as close to the proxy-server as possible
[filter:grok]
use = egg:grok#grok
[pipeline:main]
pipeline = catch_errors healthcheck proxy-logging cache container_sync bulk tempurl tempauth slo dlo staticweb proxy-logging grok proxy-server
The relevant config files are in /etc/swift/proxy-server/proxy-server.conf.d/20_settings.conf
or /etc/swift/proxy-server.conf
, depending on your installation.
-
Restart proxy-server by running
sudo swift-init proxy-server restart
-
Verify installation using
curl -XGET 'http://saio:8080/info' | grep grok
You can experiment with Swift locally pretty easily using vagrant. Follow the instructions at https://github.com/swiftstack/vagrant-swift-all-in-one. After you have cloned and ran vagrant up
, do vagrant ssh
follow the installation intructions above.
vagrant@saio:~/$ echo "awesome" > test
vagrant@saio:~/$ swift upload test test
vagrant@saio:~/$ swift download test test -o -
awesome
vagrant@saio:~/$ swift download test test --header "grok-pattern":"%{WORD:word}" -o -
{"word": "awesome"}
Error downloading test: md5sum != etag, a2b0bd1d6a929129dd67d67f6ebd414f != 70c1db56f301c9e337b0099bd4174b28
The md5sum error is expected, as the file downloaded is a projection of the file being stored on Swift.
A typical line from a Swift log looks something like this:
Dec 12 23:35:48 vagrant-ubuntu-trusty-64 proxy-server: 127.0.0.1 127.0.0.1 12/Dec/2015/23/35/48 GET /v1/AUTH_test/sample-log/sample-log HTTP/1.0 200 - python-swiftclient-2.6.1.dev26 AUTH_tk262ff273b... - 71 - tx6398b237474f4a69a9a37-00566caf54 - 0.0057 - - 1449963348.351684093 1449963348.357352018 0
The following grok pattern is going to extract the important bits:
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:client} %{SYSLOGPROG:program} %{HOSTNAME} %{HOSTNAME} %{NOTSPACE} %{WORD:verb} %{NOTSPACE:request} (?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest}) %{NUMBER:response} - %{QS:agent} %{NOTSPACE} - %{NUMBER:client_etag} - %{NOTSPACE:transaction_id} - {NUMBER} - - {NUMBER:request_start_time} {NUMBER:request_end_time} {NUMBER:policy_index}