Skip to content
stuarthalloway edited this page Dec 11, 2012 · 1 revision

(design notes, wip)

Capturing Results

Results include the state of the database(s) under test, logs, metrics, and any other side effects you find worth capturing. Capturing the database under test is easy -- it is a database. Correlating that with everything else is the fun part.

Results Can Be Big

Glib quote: "The log of the work is bigger than the work." Imagine a simulation of a system that is read-dominated and scales horizontally. You might want to keep some data about every read, for study later. At a minimum this would include the action entity that triggered the read, and how long the read took. You want to write all this information into a database for analysis, but you can't do that in real time without perturbing (possibly to the point of failure) the system you are testing. So you need to capture that stuff as quickly as possible, and write it into a database later/as needed.

Tentative Plan

My current idea for this is to allow you to configure a storage (likely S3 at first) to dump this stuff into. Then each sim process can cache results on a local file system during the sim, and push them to S3 at the end. The S3 bucket keys are then stored in the database.

Should there be rules for what goes in the results buckets? It would make things uniformly easy for the sim if this information had to be shaped like transaction data. But if people have other data formats, that puts the conversion job in the sim process at the tail end of simulation run, rather than letting that be a separate task that might be done later.

Different Path for Small Results?

If the results are not big, it might be fine to write them to the database during the sim. E.g. a metrics payload a few dozen datoms once a minute would be fine.

Clone this wiki locally