Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major overhaul of the internals of riak_test #710

Merged

Conversation

kellymclaughlin
Copy link
Contributor

See commit message for details: 1076cf4

* Upgrade versions have moved to being a list not just a single
  version. `upgrade_version` is removed in favor of
  `upgrade_path`. `upgrade_path` is a comma-separated list
  representing an upgrade sequence *e.g.* `1.3.4,1.4.10,2.0.0`
* Unification of node deployment code: common code and work from the
  individual harness modules has been brought into the framework where
  possible.
* Further decouple and distinguish between framework setup and prep,
  test setup, and test execution.
* Streamline the configuration for versions and root path. Move away
  from convention of *current*, *previous*, and *legacy*. Too
  restrictive and not enough benefit and *current* especially is
  ambiguous. Config must specify `root_path` and all versions under
  root path are represented as release directories. A subdirectory of
  `2.0.0` means that *2.0.0* can now be used as a version the same as
  *current*, *previous*, or *legacy* could previously, but the upside
  is that it requires no extra configuration.
* Upgrade transitions can be specified as a list with no bound so if
  we want to test upgrading to each major release from `1.0.0` to
  `2.0.0` then that is possible.
* Node deployment and teardown for a particular test execution is more
  isolated. The framework attempts to stop **all** nodes during its
  setup phase, but doing so after each test execution is
  unnecessary. Node deployment is now a matter of requesting the
  number of required nodes and the versions involved in the test. The
  framework determines if the number of nodes to cover the
  requirements of the test are available and returns success or
  failure on that criteria. As the last statement implies, concurrent
  test execution is now attempted where possible. If there are 8 nodes
  available for the *2.0.0* version and 4 tests queued to be run that
  require only 2 nodes each and do not require upgrade testing then
  there is no reason to block on serial execution.
* Responsibility for management of test execution is more clearly
  delineated. Previously it was hard to account for responsibility of
  tasks between `riak_test_escript`, `riak_test_runner`, and
  `rt`. `riak_test_escript` has been heavily refactored and the work
  it does has been minimized to command line argument parsing and
  spawning the workhorse processes to execute the tests. It also made
  sense to make use of OTP behaviors for the implementation of some of
  the execution helpers. The `riak_test_executor`, an `gen_fsm`, is
  introduced to manage the scheduling of tests and handle reporting
  results. It is a named process and only one runs at a
  time. Individual test execution is managed by a `riak_test_runner`
  process. These are spawned by the `riak_test_executor` and there is
  one for each test that executes. `riak_test_runner` is also
  implemented as a `gen_fsm`. Finally there is the `node_manager`
  process. It is a `gen_server` that handles all node manipulation and
  manages access to the nodes for testing. The `riak_test_executor`
  requests to reserve `N` number of nodes from the node manager and if
  the reservation can be fulfilled the `node_manager` responds with
  the list of nodes for the requesting test to use. If the requested
  number of nodes is not available the execution of the test is
  deferred. The `node_manager` is aware if the current series of test
  executions involves upgrades and deploys nodes initially using the
  correct version based on that information. Thus when a test receives
  a list of nodes there is no need to take any action and test
  execution can begin immediately. * Maximize resource efficiency and
  length of execution duration by seeking to avoid unnecessary node
  starts or cycles. *e.g.* The `node_manager` is initialized with a
  list of nodes and the versions involved in test execution, but no
  nodes are deployed until the first call to
  `node_manager:reserve_nodes` that requires those nodes. If only one
  test slated for execution and it only requires 3 nodes there is no
  reason to start or stop more than 3 nodes.
* Facilitate replication testing setup and eliminate crufty setup code
  duplicated in replication test with new properties `cluster_count`
  (defaults to 1) and `cluster_weights`, a list of weights that
  determine distribution of available nodes among requested clusters
  (defaults to `undefined`). Setting up multiple clusters for testing
  replication should not require any kludgy steps, it should have full
  support in the framework.
* Update backend setup so that backend configuration can be done for
  selected nodes only instead of all nodes
* Change setup scripts to use only version numbers as directory names.
* Change setup scripts to avoid the unnecessary `dev` directory when
  installing devrel releases. *e.g.* Instead of `~/rt/riak/2.0.0/dev/`
  being the path to the `dev*` releases, the path is just
  `~/rt/riak/2.0.0/`. This just removes an unnecessary subdirectory
  and removes the need for some complications in node deployment.
* Refactor properties to distinguish between node name and id: The
  helper functions used by the framework have different input
  requirements.  Some that use rpc require the actual node name which
  may be different depending on the harness used. Others use the node
  identifier to form strings representing shell commands to
  execute. The node property has been replaced by a node_id property
  and a node_map structure that maps a node identifier to a full node
  name.
* Add new rtdev-install script that is intended to replace most of the
  other setup scripts.
* Allow distinction between console and file logging levels: Move from
  a single lager_level configuration option to lager_console_level and
  lager_file_level in order to be able to control these
  independently. The default level for the console output is notice to
  minimize the output display during a run. The default level for the
  file output is info in order to capture all of the logging that
  previously has been output to the console.
* Use `riak_cli` table generation for display of test result details
  when using the `-v` option.
* Update rebar to 2.5.1
* Update Makefile to use tools.mk
@kellymclaughlin kellymclaughlin force-pushed the refactor/deploy-api-wip branch 2 times, most recently from 4c75889 to e11f183 Compare December 15, 2014 21:38
kellymclaughlin added a commit that referenced this pull request Dec 15, 2014
Major overhaul of the internals of riak_test
@kellymclaughlin kellymclaughlin merged commit 66ab243 into feature/decouple-cluster-setup Dec 15, 2014
@kellymclaughlin kellymclaughlin deleted the refactor/deploy-api-wip branch December 15, 2014 21:39
@kellymclaughlin
Copy link
Contributor Author

This is part of the larger RFC effort of #667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant