Major overhaul of the internals of riak_test #710

kellymclaughlin · 2014-12-12T22:23:09Z

See commit message for details: 1076cf4

* Upgrade versions have moved to being a list not just a single version. `upgrade_version` is removed in favor of `upgrade_path`. `upgrade_path` is a comma-separated list representing an upgrade sequence *e.g.* `1.3.4,1.4.10,2.0.0` * Unification of node deployment code: common code and work from the individual harness modules has been brought into the framework where possible. * Further decouple and distinguish between framework setup and prep, test setup, and test execution. * Streamline the configuration for versions and root path. Move away from convention of *current*, *previous*, and *legacy*. Too restrictive and not enough benefit and *current* especially is ambiguous. Config must specify `root_path` and all versions under root path are represented as release directories. A subdirectory of `2.0.0` means that *2.0.0* can now be used as a version the same as *current*, *previous*, or *legacy* could previously, but the upside is that it requires no extra configuration. * Upgrade transitions can be specified as a list with no bound so if we want to test upgrading to each major release from `1.0.0` to `2.0.0` then that is possible. * Node deployment and teardown for a particular test execution is more isolated. The framework attempts to stop **all** nodes during its setup phase, but doing so after each test execution is unnecessary. Node deployment is now a matter of requesting the number of required nodes and the versions involved in the test. The framework determines if the number of nodes to cover the requirements of the test are available and returns success or failure on that criteria. As the last statement implies, concurrent test execution is now attempted where possible. If there are 8 nodes available for the *2.0.0* version and 4 tests queued to be run that require only 2 nodes each and do not require upgrade testing then there is no reason to block on serial execution. * Responsibility for management of test execution is more clearly delineated. Previously it was hard to account for responsibility of tasks between `riak_test_escript`, `riak_test_runner`, and `rt`. `riak_test_escript` has been heavily refactored and the work it does has been minimized to command line argument parsing and spawning the workhorse processes to execute the tests. It also made sense to make use of OTP behaviors for the implementation of some of the execution helpers. The `riak_test_executor`, an `gen_fsm`, is introduced to manage the scheduling of tests and handle reporting results. It is a named process and only one runs at a time. Individual test execution is managed by a `riak_test_runner` process. These are spawned by the `riak_test_executor` and there is one for each test that executes. `riak_test_runner` is also implemented as a `gen_fsm`. Finally there is the `node_manager` process. It is a `gen_server` that handles all node manipulation and manages access to the nodes for testing. The `riak_test_executor` requests to reserve `N` number of nodes from the node manager and if the reservation can be fulfilled the `node_manager` responds with the list of nodes for the requesting test to use. If the requested number of nodes is not available the execution of the test is deferred. The `node_manager` is aware if the current series of test executions involves upgrades and deploys nodes initially using the correct version based on that information. Thus when a test receives a list of nodes there is no need to take any action and test execution can begin immediately. * Maximize resource efficiency and length of execution duration by seeking to avoid unnecessary node starts or cycles. *e.g.* The `node_manager` is initialized with a list of nodes and the versions involved in test execution, but no nodes are deployed until the first call to `node_manager:reserve_nodes` that requires those nodes. If only one test slated for execution and it only requires 3 nodes there is no reason to start or stop more than 3 nodes. * Facilitate replication testing setup and eliminate crufty setup code duplicated in replication test with new properties `cluster_count` (defaults to 1) and `cluster_weights`, a list of weights that determine distribution of available nodes among requested clusters (defaults to `undefined`). Setting up multiple clusters for testing replication should not require any kludgy steps, it should have full support in the framework. * Update backend setup so that backend configuration can be done for selected nodes only instead of all nodes * Change setup scripts to use only version numbers as directory names. * Change setup scripts to avoid the unnecessary `dev` directory when installing devrel releases. *e.g.* Instead of `~/rt/riak/2.0.0/dev/` being the path to the `dev*` releases, the path is just `~/rt/riak/2.0.0/`. This just removes an unnecessary subdirectory and removes the need for some complications in node deployment. * Refactor properties to distinguish between node name and id: The helper functions used by the framework have different input requirements. Some that use rpc require the actual node name which may be different depending on the harness used. Others use the node identifier to form strings representing shell commands to execute. The node property has been replaced by a node_id property and a node_map structure that maps a node identifier to a full node name. * Add new rtdev-install script that is intended to replace most of the other setup scripts. * Allow distinction between console and file logging levels: Move from a single lager_level configuration option to lager_console_level and lager_file_level in order to be able to control these independently. The default level for the console output is notice to minimize the output display during a run. The default level for the file output is info in order to capture all of the logging that previously has been output to the console. * Use `riak_cli` table generation for display of test result details when using the `-v` option. * Update rebar to 2.5.1 * Update Makefile to use tools.mk

Major overhaul of the internals of riak_test

kellymclaughlin · 2014-12-15T21:43:22Z

This is part of the larger RFC effort of #667

kellymclaughlin mentioned this pull request Dec 12, 2014

RFC Refactoring #667

Open

15 tasks

kellymclaughlin force-pushed the refactor/deploy-api-wip branch from 62b7efc to 1076cf4 Compare December 12, 2014 23:16

kellymclaughlin force-pushed the refactor/deploy-api-wip branch 2 times, most recently from 4c75889 to e11f183 Compare December 15, 2014 21:38

Port Jon Anderson's work on test groups to avoid merge pain

e11f183

kellymclaughlin added a commit that referenced this pull request Dec 15, 2014

Merge pull request #710 from basho/refactor/deploy-api-wip

66ab243

Major overhaul of the internals of riak_test

kellymclaughlin merged commit 66ab243 into feature/decouple-cluster-setup Dec 15, 2014

kellymclaughlin deleted the refactor/deploy-api-wip branch December 15, 2014 21:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major overhaul of the internals of riak_test #710

Major overhaul of the internals of riak_test #710

kellymclaughlin commented Dec 12, 2014

kellymclaughlin commented Dec 15, 2014

Major overhaul of the internals of riak_test #710

Major overhaul of the internals of riak_test #710

Conversation

kellymclaughlin commented Dec 12, 2014

kellymclaughlin commented Dec 15, 2014