Releases: jepsen-io/jepsen
0.1.15
This is mostly an ergonomics & bugfix release, with a few new minor checkers. In particular, you may want to try the stats
and unhandled-exceptions
checkers, which can help you avoid issues where your test passes because every operation failed! We also track exceptions thrown by client operations, and can summarize them to tell you what kinds of exceptions you aren't catching. This can help make your tests more robust without endless scrolling through logs. We've added some retries for flaky SCP downloads, and made logged exceptions more useful in some places. Plus more!
Special thanks to Vojtech Juranek for JDK12 compatibility, and to everyone else who contributed patches and feedback. :)
New Features
- jepsen.checker/stats, jepsen.checker/unhandled-exceptions: some basic statistics and error reporting which can be applied to almost any test.
- jepsen.tests.cycle.append can now detect internal consistency violations within transactions.
API Changes
- jepsen.test.cycle.append now has a generator and a default test map which makes it easier to build append tests.
- When exceptions are thrown by Client/invoke!, we attach the exception (as clojure data) to the generated
:info
op under the:exception
key.
Minor Changes
- jepsen.control/download now retries some SCP failures. These have been traditionally flaky.
- Tests now log much less garbage to the console.
- jepsen.nemesis.time now stops the ntp service, in addition to ntpd, during setup.
- We're now compatible with JDK12.
- jepsen.tests.cycle.append doesn't generate empty transactions as a part of its workload.
- jepsen.reconnect now logs the full exception when a reconnectable error occurs.
Bugfixes
- Exceptions thrown in (e.g.) OS and DB setup could be propagated incorrectly as BrokenBarrierExceptions, which, while technically correct, didn't provide much useful information about what went wrong. We now make a special effort to provide useful exceptions.
- jepsen.control.util/grepkill now properly catches
no such process
errors, which could happen when racing to kill a process.
0.1.14
This is a big release! We've got a bunch of bugfixes--most importantly, an issue which allowed tests.long-fork to fail to find long fork anomalies, and a long-standing bug which wrote invalid .fressian files. Plot rendering has been totally re-worked, which standardizes behavior between several types of plot that used to do their own thing, and adds colorizable nemeses to all the usual plots. There's also a new, somewhat experimental set of tests for cycle detection in jepsen.tests.cycle. Finally, we've got several quality-of-life improvements around debuggability and error handling: better log messages for jepsen.control errors, crashes in analyses, and more careful choices about which exceptions to throw when more than one occurs concurrently, or sequentially, during a test.
Special thanks to Kit Patella and Peter Alvaro for their work and discussion around plotting and cycle detection, and to Craig Pastro, who made several documentation fixes and improvements to Docker support.
New Features
- Totally re-worked plots. Latency, rate, clock, and bank plots now have a unified system for rendering nemesis operations. Nemesis ops have nice colors that match their legends. Fixed a whole bunch of edge cases with rendering outside plot ranges. Fixed issues with autoscaling. Fixed issues with multiple start operations followed by a single stop. Fixed several issues with crashes when plotting short or empty histories. Fixed issues with nemeses which extended to the end of the test. You can now include a :plot map to tests, specifying how to classify, label, and colorize nemeses in plots.
- jepsen.tests.cycle: New tests based on cycle detection between operations. It's functional, finds bugs, and the bugs it finds all check out so far, but expect some API changes and re-organization as we refine it for use in additional databases.
API Changes
- jepsen.tests.linearizable-register: now uses process-limit, rather than limit, by default. This should make register tests better at finding bugs, and less likely to become incredibly expensive to analyze.
Minor Changes
- jepsen.txn 0.1.1: adds some additional support functions for transactional histories
- dom-top 1.0.5
- jepsen.core: when threads race to abort, try to throw more meaningful exceptions, rather than broken barriers/interrupts.
- Fixed a bug in internal integration tests for teardown
- jepsen.control.util/wget now retries NXDOMAIN wget errors. I know, this shouldn't happen. EC2's DNS is apparently awful?
- jepsen.faketime: add an installer for our custom build of faketime supporting CLOCK_MONOTONIC_COARSE
- jepsen.control/exec exceptions now include a human & logger-friendly error message as well as data
- jepsen.db/cycle no longer swallows exceptions that occur during teardown!. This created hard-to-debug situations.
- jepsen.control now uses real-pmap from dom-top.
- jepsen.core: log a special message when :valid? is :unknown
- jepsen.control.util/start-daemon! now logs the escaped command line used to start the daemon, which is really helpful for debugging startup issues.
- jepsen.core/run! now throws the original exception if a test aborts and, during cleanup, an error occurs while snarfing logs. The log-snarfing message is still logged, but it's probably not as useful as the original error that interrupted the test!
- jepsen.txn/reduce-mops: a helper for writing reductions over ever micro-op in a history
- Docker support now uses networks instead of links, some other assorted updates for Debian Stretch.
- jepsen.control.util/signal! sends a signal to a process by name
- jepsen.os.debian now installs dirmngr by default.
Bugfixes
- jepsen.tests.long-fork often failed to find long forks. This was a serious issue which could have allowed tests to pass when they should have failed. Now fixed.
- jepsen.store: fixed a longstanding bug with writing invalid .fressian files for tests containing sets
- jepsen.faketime/wrap!: if a test crashes during wrap!, don't get stuck with wrapper scripts without mode +x
- jepsen.control.util/grepkill! no longer throws when no processes match
- jepsen.tests.sequential: fixed a misleading namespace docstring which mischaracterized the invariants the test looked for. The test itself was OK; the docs were just wrong.
- Don't copy temporary files into the control docker image, which could cause errors when copying symlinks referring to directories created during previous runs.
0.1.13
This is a small release to provide support for Debian Stretch. Debian Jessie mirrors were shut down recently; I thought that as amd64 users we'd be supported via LTS, but this was not the case.
Minor Changes
- Tests now log a message when test relative time begins
- Support for Debian Stretch
Bugfixes
- Fixed a bug causing the smartos namespace to fail to compile
0.1.12
New Features
- When tests crash, Jepsen will write the stacktrace to that test's
jepsen.log
file for you - Named locks: a concurrency primitive for locking a dynamic pool of resources by some identifier
- Knossos now supports timeouts to help bound search time
jepsen.checker/set-full
reads can now use vectors, not just sets, and will flag duplicate values- Tests now take a :logging map, which can override package log levels. Helpful for tracing, or noisy clients
- Latency and rate graphs now render different kinds of nemesis operations in separate tracks, like gantt charts, with colors and customizable legends.
jepsen.generator/process-limit
: bounds number of processes, rather than number of operations. Helpful for linearizable tests, where process concurrency is the dominant factor in complexity.jepsen.generator/seq-all
: likeseq
, but emits every element of each generator it's given. Useful for constructing an infinite series of finite generators.- New type of test:
jepsen.tests.causal-reverse
, which looks for incompatible read orders in serializable systems. jepsen.os.ubuntu
: supports running tests on Ubuntu.- Docker scripts can also set up Ubuntu nodes with
--ubuntu
API Changes
- Keyword nodes (
:n1
) are no longer supported. We've all been using string node names for a few years now. - Tests no longer take a
:model
key. Only a few checkers used them; you provide models to checkers on construction now. jepsen.control.util
now throws ex-info exception maps using Slingshot, which means you can pattern-match return codes in exception handlers for shell commands usingslingshot/try+
. Exceptions fromexec
also have their stdout, stderr, and command neatly separated into different fields, which should cut down on regex tomfoolery.
Minor Changes
- Additional test coverage
- Knossos 0.3.4
jepsen.checker/counter
is now more precise: it filters out failed operations- TravisCI integration means we'll be more rigorous about tests and PRs
- Performance improvements to
jepsen.checker/set-full
Bugfixes
- Clock skew plots no longer explode on empty histories
- Bank plots no longer explode on empty histories
- Fixed a nullpointerexception serializing empty multisets
- Documentation fixes
- Typo fixes in the tutorial
Notes
jepsen.generator/time-limit
continues to be bad; it contains a least two race conditions around nested time limits that can mostly work, but can also ruin your life. We need to fundamentally redesign generators.
0.1.11
New Features
- jepsen.checker/set-full: A new set checker which supports reads throughout the lifecycle, as well as linearizable and eventually consistent checker modes. This checker provides quantitative bounds on stale and lost reads, including latency quantiles for visibility. This is still somewhat experimental--in particular, it may, for systems with stale reads, report spurious lost records at the end of history--records which would have appeared if given some time to cool off before reading. Check the output and history carefully.
- Jepsen now maintains a
current
symlink for the test which is presently running;latest
refers to the last completed test. Helpful for looking through logfiles as tests are running, and debugging tests which crashed. - checker/clock-plot plots relative clock offsets, as recorded by nemesis.time ops
- generator/map: applies a function to each operation generated by some other generator
- nemesis/timeout: wrap any nemesis and force its operations to time out. Helpful when nemeses can get stuck doing something to a database.
API Changes
- jepsen.adya renamed to jepsen.tests.adya. We're moving reusable workload-specific support namespaces under jepsen.tests.
Minor Changes
- Upgrade to tools.cli 0.4.1
- Upgrade to tools.logging 0.4.1
- Upgrade to tea-time 1.0.1
- Upgrade to dom-top 1.0.4
- Debian now installs apt-transport-https by default
- nemesis/partitioner can now take grudges as values, allowing generators to control partition topologies
- nemesis/node-start-stopper targeting fns can optionally take a test map
- Use Fipp, a faster pretty-printer, for writing EDN output like histories and analysis results. Significant speedups!
- jepsen.util/map-keys: transforms keys in a map by applying a function to each
- jepsen.store: can now serialize java.util.Instants
- jepsen.web logs parse errors when reading results
- jepsen.store/load-results can now read back defrecords. May come to regret this...
Bugfixes
- Jepsen now catches all Exceptions when reopening clients, not just RuntimeException.
debian/install!
now tells debian that the frontend is noninteractive, which fixes occasional dpkg-preconfigure errors- Fix several race conditions in jepsen.core which allowed workers to deadlock or move on to new phases, like a nemesis running operations while clients were still setting up the test.
- jepsen.nemesis.time no longer crashes horribly when setting up on nodes where NTP is not yet installed
- jepsen.generator.f-map now passes through nils instead of calling f on them, since
nil
represents the end of a generator
0.1.10
0.1.9
New Features
- A new namespace,
jepsen.tests.bank
, provides support for running snapshot-isolated bank tests, including visualizations! - A new namespace,
jepsen.tests.long-fork
, looks for long forks, an anomaly possible under parallel snapshot isolation - For flaky databases, you can now throw a particular type of exception to trigger automatic retries in DB setup
jepsen.control/daemon-running?
checks to see if pidfiles are alive- Jepsen now downloads logs automatically if the JVM is interrupted; e.g. if a test crashes or if you ^C
- Generators now provide a somewhat more helpful prn/pprint representation
jepsen.generator/time-limit
now interrupts threads when the time limit expires, rather than waiting up to dt secondsjepsen.cli/single-test-cmd
now also provides ananalyze
command for re-running an analysis on the last test. This is a gross hack, but really helpful.- CLI tests can now pass
--nodes n1,n2,n3
instead of passing--node n1 --node n2, ...
API Changes
jepsen.checker/set
andqueue
now emit absolute counts, rather than ambiguous fractions.
Minor Changes
jepsen.checker/set
now accepts any type of collection, not just sets. We're laying the groundwork for multiset checkers- The web interface now shows test times with punctuation
jepsen.nemesis
default nemeses now use the Nemesis protocoljepsen.generator/delay
andstagger
now validate their arguments are non-negative integers
Bugfixes
jepsen.checker.perf/qs->colors
works with more than six quantiles now.jepsen.generator/phases
andconcat
no longer make you wade through every phase in order to get ops from later phases. This fixes a common issue where operations in a test with a waiting phase had to wait on every final operation.
0.1.8
This is a minor release for performance and usability. Stuff that should have made it into 0.1.7, but I only remembered while re-writing the tutorial. The tutorial is also updated for 0.1.8, and includes two totally new chapters on command line parameters and breaking up tests into composable workloads.
Improvements
- Network partitions are now significantly faster, thanks to an optional protocol for making all network changes at once, instead of via separate commands.
- control.net/ip resolution is now memoized; also speeds up partitions.
- Checking, writing histories, and writing results is now parallelized to take full advantage of multi-core systems. On my 48-way Xeon, this cuts analysis phases from 10+ minutes to ~20 seconds.
- checker/concurrency-limit can limit parallelization if you have an expensive checker to run, or if you're running in a CI context where turnaround time is less important.
- The web interface now serves common files (like logs) with utf-8 content types, fixing some encoding bugs. Emoji table flips look correct now!
0.1.7
Major thanks to Kit Patella (@mkcp) for all her hard work on this release!
API changes
- Clients and Nemeses have fundamentally different lifecycles and operations, and have been split into two protocols accordingly. Client is for clients, and Nemesis is for nemeses.
- The Client protocol conflated two things: connecting to a database, and setting up initial state. We now offer explicit steps:
open
andclose!
create a new nemesis with a connection to a DB, and close! closes its connection. At the start and end of a test, we callsetup!
andteardown!
to perform any db/table setup and cleanup. This is all backwards compatible--but you will see deprecation warnings urging you to migrate to the new Client protocol, becauuuse... - Clients are now closed and abandoned when
invoke
returns aninfo
result, rather than continuing to use the same client. This is important because a background thread (say, spawned by a timeout operation for a previous call toinvoke!
might still be trying to make calls against that client, or the client could have some transactional state buried inside it which gets reused by the next invocation with a new logical client. This led to improperly mixed transactions, especially in jdbc, and required all kinds of complicated reconnection logic. The new approach causes connection churn oninfo
, but makes it much simpler to write correct tests. While old clients are backwards compatible, this re-opening behavior could lead to clients running setup code in the middle of a test, and if proper idempotent locking was not employed, this could clobber state. Split your Client functions into separate open! and setup! phases to fix this. - Nemesis and client setup is now concurrent, rather than taking place in separate phases. This should not affect most users.
- Jepsen.model is dead; models turned out to be specific to their checkers. Use knossos.model instead.
New features
- Jepsen histories now assign a unique
:index
to every operation. - HTML plots have the full operation data and wall-clock time in tooltips for each operation, and a unique link target for every op, making it possible to link people to a particular part of the history.
- Knossos' SVG plots link operations to their corresponding view in the timeline, so you can jump immediately to the surrounding context of a fault.
Minor changes
- jepsen.nemesis.time rounds millisecond times to integers, rather than using floats.
- jepsen.client has significantly expanded documentation.
- jepsen.control/upload can take java.io.Files, not just filenames.
- SSH Exceptions now log additional debugging context which can help diagnose mysterious connection errors.
- When gnuplot is not installed, Jepsen emits a friendlier error message.
Bugfixes
- Client changes allowed us to fix a longstanding deadlock in jepsen's core, where exceptions could crash worker threads and hang the whole test. Jepsen is now much more robust to exceptions in clients, nemeses, and generators. We also ensure all workers connect, set up, work, tear down, and close in separate phases.
- jepsen.generator/mix now accepts empty vectors, and emits nil always.
- jepsen.checker.perf now renders nemesis events as vertical bars, rather than ranges. This is less helpful for start/stop pairs, but more helpful in the general case, when operations use other names and don't have clear start/stop semantics.
- jepsen.cli/tarball-opt is properly backwards-compatible