Bobcat is a data generation tool that allows you to generate production-like data using a simple DSL. Define concepts (i.e. objects) found in your software system in our input file format, and the tool will generate JSON objects that can be inserted into a variety of datastores.
Current features include:
- Concise syntax for modeling domain objects.
- Flexible expression composition to generate of a variety of data.
- Over 30 built-in dictionaries plus support for any custom dictionary to provide more realistic data values.
- Distributions to determine the shape of the generated data.
- Variable assignment for easy reference in the input file to previously generated entities.
- Ability to denote a field as the primary key to allow for easy insertion into a SQL database.
- File imports for better organization of input file(s).
There are no prerequisites. The executable is a static binary. For more information on the usage of the executable use the flag --help
.
-
Download the latest release
-
Run the executable corresponding to your operating system on the sample input file:
- Linux:
./bobcat-linux examples/example.lang
- macOS:
./bobcat-darwin examples/example.lang
- Windows:
.\bobcat-windows examples\example.lang
- Linux:
-
Modify the sample file or create one from scratch to generate your own custom entities
- Checkout the code:
git clone https://github.com/ThoughtWorksStudios/bobcat.git
- Set up, build, and test:
make local
The input file is made of three main concepts:
- defining entities (the objects or concepts found in your software system)
- fields on those entities (properties that an entity posses)
- generate statements to produce the desired number of entities in the resulting JSON output
The following is an example of an input file.
# import another input file
import "examples/users.lang"
# override default $id primary key
pk("ID", $incr)
# define entity
entity Profile {
#define fields on entity
firstName: $dict("first_names"),
lastName: $dict("last_names"),
email: firstName + "." + lastName + "@fastmail.com",
addresses: $dict("full_address")<0..3>,
gender: $dict("genders"),
dob: $date(1970-01-01, 1999-12-31, "%Y-%m-%d"),
emailConfirmed: $bool(),
}
# declare and assign variables
let bestSelling = "Skinny"
let jeanStyles = ["Classic", "Fitted", "Relaxed", bestSelling]
entity CatalogItem {
title: $dict("words"),
style: $enum(jeanStyles),
sku: $str(10),
price: $float(1.0, 30.00)
}
# generate statement to create corresponding JSON output
let Products = generate(10, CatalogItem)
entity CartItem {
product: $enum(Products),
quantity: $int(1, 3),
}
entity Cart {
pk("Cart_Id", $incr)
items: CartItem<0..10>,
}
# define entity that extends an existing entity
entity Customer << User {
last_login: $date(2010-01-01, NOW),
profile: Profile,
cart: Cart
}
# supports anonymous/inlined extensions as well
generate (10, Customer << {cart: null}) # new users don't have a cart yet
generate (90, Customer)
Type | Example |
---|---|
string | "hello world!" |
integer | 1234 |
float | 5.2 |
bool | true |
null | null |
date | 2017-07-04 |
date with time | 2017-07-04T12:30:28 |
date with time (UTC) | 2017-07-04T12:30:28Z |
date with time and zone offset | 2017-07-04T12:30:28Z-0800 |
collection (heteregenous) | ["a", "b", "c", 1, 2, 3] |
If you need to customize the JSON representation of a literal date, you have 2 options:
- Use
$date(min, max, format)
wheremin == max
and provide astrftime
format, e.g.$date(2017-01-01, 2017-01-01, "%b %d, %Y")
- Use a literal string that looks like a date, as JSON serializes all dates as strings anyway
Declare variables with the let
keyword followed by an identifier:
let max_value = 100
One does not need to initialize a declaration:
# simply declares, but does not assign value
let foo
Assignment syntax should be familiar. This assigns a new value to a previous declaration:
let max_value = 10
# assigns a new value to max_value
max_value = 1000
One can only assign values to variables that have been declared (i.e. implicit declarations are not supported):
baz = "hello" # throws error because baz was not previously declared
An identifier starts with a letter or underscore, followed by any number of letters, numbers, and underscores. Other symbols are not allowed. This applies to all identifiers, not just variables.
The following variables may be used without declaration:
Name | Value |
---|---|
UNIX_EPOCH |
DateTime representing Jan 01, 1970 00:00:00 UTC |
NOW |
Current DateTime at the start of the process |
Functions are declared using the lambda
keyword followed by an identifier, a list of input arguments, and the function body surrounded by curly braces {}
. Note that the result of the last expression in the function body will be the return value of the function:
# declaring perc function
lambda perc(amount, rate) {
amount * rate
}
lambda calcTax(amount) {
perc(amount, 0.085)
}
entity Invoice {
price: $float(10, 30),
# calling calcTax function on price
tax: calcTax(price),
total: price + tax
}
You can also create anonymous functions by omitting the identifier in the declaration:
let taxRate = 0.085
entity Invoice {
price: $float(10, 30),
#defining anonymous function and calling it on price
tax: (lambda (amount) { amount * taxRate })(price),
total: price + tax
}
The following are functions builtin to bobcat to allow easy generation of random values. The function names are prefixed with $
to indicate they are native, and cannot be overridden.
Function | Returns | Arguments | Defaults when omitted |
---|---|---|---|
$str(length) |
a random string of specified length | length is integer |
length=5 |
$float(min, max) |
a random floating point within a given range | min and max are numeric |
min=1.0, max=10.0 |
$int(min, max) |
a random integer within a given range | min and max are integers |
min=1, max=10 |
$uniqint() |
an unsigned unique integer | none | none |
$bool() |
true or false | none | none |
$incr(offset) |
an auto-incrementing integer from offset | offset is a non-negative integer |
offset=0 |
$uid() |
a 20-character unique id (MongoID compatible) | none | none |
$date(min, max, format) |
a random datetime within a given range | min and max are datetimes, format is a strftime string |
min=UNIX_EPOCH, max=NOW, format="%Y-%m-%dT%H:%M:%S%z" |
$dict(dictionary_name) |
an entry from a specified dictionary (see Dictionary Basics and Custom Dictionaries for more details) | dictionary_name is a string |
none |
$enum(collection) |
a random value from the given collection | collection is a collection |
none |
Note that a key difference between native functions and user-defined functions is that native functions may have optional arguments with default values.
Entities are declared using the entity
keyword followed by a name (identifier) and a list of field declarations surrounded by curly braces {}
. The entity name will be emitted as the $type
property when the entity is serialized to JSON.
A field declaration is simply an identifier, followed by a colon :
, an expression, and an optional count range. Multiple field declarations are delimited by commas ,
. Example:
entity User {
# randomly selects a value from the 'email_address' dictionary
login: $dict("email_address"),
# creates a 16-char random-char string
password: $str(16),
# chooses one of the values in the collection
status: $enum(["enabled", "disabled", "pending"])
}
The expressions used when defining fields can be made up of any combination of functions, literals, or references to other variables (including other fields). Right now the arithmetic operators + - * /
are supported.
lambda userId(fn, ln) {
fn + "." + ln "_" + $uniqint()
}
entity User {
first_name: $dict("first_names"),
last_name: $dict("last_names"),
#compose guaranteed unique email
email: userId(first_name, last_name) + "@" + $dict("companies") + ".com"
}
To control the probability distribution of values for a specific field you can use distributions.
Anonymous entities can be defined by omitting the identifier:
entity {
login: $dict("email_address"),
status: $enum(["enabled", "disabled", "pending"])
}
One can also assign an anonymous entity to a variable. This allows one to reference the entity, but does not set $type
to the variable name.
let User = entity {
login: $dict("email_address"),
status: $enum(["enabled", "disabled", "pending"])
}
The following entity expressions are subtly different:
# anonymous entity literal, with assignment
let Foo = entity { name: "foo" }
# formal declaration will set the entity name, as reported in the output as the `$type` property
entity Foo { name: "foo" }
This extends the User
entity with a superuser
field (always set to true) into a new entity called Admin
, whose $type
is set to Admin
. The original User
entity is not modified:
entity Admin << User {
superuser: true
}
As with defining other entities, extensions can be anonymous. The original User definition is not modified, and the resultant entity from the anonymous extension still reports its $type
as User
(i.e. the parent):
User << {
superuser: true
}
# anonymous extension assigned to a variable
let Admin = User << {
superuser: true
}
Field values can also be other entities:
entity Kitten {
says: "meow"
}
entity Person {
name: "frank frankleton",
pet: Kitten
}
And of course any of the variations on entity expressions or declarations can be inlined here as well (see section below for more detail):
entity Kitten {
says: "meow"
}
entity Person {
name: "frank frankleton",
# anonymous entity
some_animal: entity { says: "oink" },
# extended entity
big_cat: entity Tiger << Kitten { says: "roar!" },
# anonymous extended entity, $type is still "Kitten"
pet: Kitten << { says: "woof?" },
}
Entity fields support multi-value fields.
Ultimately one would want to generate JSON output based on the entities defined in the input file.
Generating entities is achieved with generate(count, <entity-expression>)
statements. The default output file for the resulting JSON objects is entities.json. The entity passed in as the second argument may be defined beforehand, or inlined. generate()
expressions return a collection of $id
values from each generated entity result.
Generating 10 User
entities:
generate(10, User) # returns a collection of the 10 `$id`s from the User entities generated
With anonymous entities:
generate(10, entity {
login: $dict("email_address"),
password: $str(10)
})
Or inlined extension:
generate(10, User << {
superuser: true
})
Or formally declared entities:
generate(10, entity Admin << User {
group: "admins",
superuser: true
})
It's useful to organize your code into separate files for complex projects. To import other *.lang
files, just use an import statement. Paths can be absolute, or relative to the current file:
import "path/to/file.lang"