-
Notifications
You must be signed in to change notification settings - Fork 24
Guide
Sybil is a command line program that reads JSON records from stdin (one per line) and saves them on disk in a column based format for fast querying. Using column storage lets the query engine reduce the amount of data it needs to read off disk to run aggregations. For table scans that do not touch all the fields in a dataset, this leads to an improvement in query time over traditional row or document based DBs. Somewhat differently from other DBs, Sybil also runs its full table scans in parallel (chewing up as many CPUs as GOMAXPROCS allows) in order to speed up query time.
# check go path
echo $GOPATH
# if not set, you can set it with 'export'
# (also put the below in your .bashrc to set it when you log in again)
export GOPATH=~/go;
# mkdir the GOPATH if it doesn't exist
mkdir $GOPATH
# install sybil
go get github.com/logv/sybil
Sybil uses ordinary JSON notation to ingest records. An example sample looks like:
{
// time is in seconds since the epoch and is the only required field
time: 1461765374,
// sybil supports ints (up to int64)
age: 28,
// sybil supports string columns
country: "USA",
state: "NY",
favorite_food: "ice cream",
gym_membership: "no"
// and sybil supports sets
favorite_bands: [ "the doors", "talking heads" ]
}
Sybil doesn't require that table schemas be defined beforehand, but it does prefer that if a column is defined, the type below does not get changed. It's very important to notice that "0"
and 0
are not the same in JSON! One is a string, while one is an integer.
the sybil binary is split into multiple subcommands. To import data, use the 'ingest' command and supply JSON samples on stdin, one per line this will create a new dir, 'db/my_first_table'
sybil ingest -table my_first_table < json_samples.json
import from a mongo DB
mongoexport -collection my_collection | sybil ingest -table my_table
import from a CSV file. requires that the first line be comma separated headers
sybil ingest -csv -table my_csv_table < some_csv.csv
examine the db file structure
ls -R db/
# look at disk space usage
du -ch db/
Sybil supports several query types: rollup (aka Table), time series, distributions and raw samples.
Rollup queries are the default query in sybil. The fields to group by, fields to aggregate and filters are supplied via the command line and sybil prints either a formatted table or JSON output to stdout. A simple query would be: sybil query -table my_table -group col1,col2,col3 -int col4,col5,col6
which would output a formatted table of data.
You can use -json flag to have sybil print the output in JSON. By default, sybil is pretty verbose on STDERR. Redirect stderr to quiet sybil down and to see just the results
To see table info or a list of tables is pretty easy: sybil query -tables
and sybil query -table my_table -info
. Retrieving samples is similar: sybil query -table my_table -samples -limit 5
Most queries support filters - filters are tested against each record before the aggregation and used to determine whether a record should be included. Filters are supplied as command line arguments to sybil. The format for a filter string is: -*-filter col:op:val,col:op:val
where filter is one of
-
-str-filter
- supports string regexes using re and nre -
-int-filter
- supports eq, neq, gt and lt -
-set-filter
- supports in and nin
An easy and common filter trick is to use the date
command:
# specify a filter on time greater than 1 hour ago
-int-filter time_col:gt:`date --date="-1 hour" +%s`
To run a time series query in sybil, specify the -time
, -time-col <FIELD>
and optionally -time-bucket <SECONDS>
flags to sybil. Adding an int filter on the time range is useful, because it lets sybil only look at blocks relevant to your query.
Sybil also supports histogram queries by supplying the '-op hist' flag. Supplying the -hist
flag tells sybil to create a histogram for each row in the group by result.
There are more examples and information around the rest of this wiki. Please get in touch if you want have any questions, comments, feedback or want any more information. Thanks!