rapidsai · rapids-bot · Aug 5, 2022 · Jun 17, 2022 · Jun 17, 2022 · Jun 23, 2022
diff --git a/docs/cudf/source/developer_guide/benchmarking.md b/docs/cudf/source/developer_guide/benchmarking.md
@@ -0,0 +1,288 @@
+# Benchmarking cuDF
+
+The goal of the benchmarks in this repository is to measure the performance of various cuDF APIs.
+Benchmarks in cuDF are written using the
+[`pytest-benchmark`](https://pytest-benchmark.readthedocs.io/en/latest/index.html) plugin to the
+[`pytest`](https://docs.pytest.org/en/latest/) Python testing framework.
+Using `pytest-benchmark` provides a seamless experience for developers familiar with `pytest`.
+We include benchmarks of both public APIs and internal functions.
+The former give us a macro view of our performance, especially vis-à-vis pandas.
+The latter help us quantify and minimize the overhead of our Python bindings.
+
+```{note}
+Our current benchmarks focus entirely on measuring run time.
+However, minimizing memory footprint can be just as important for some cases.
+In the future, we may update our benchmarks to also include memory usage measurements.
+```
+
+## Benchmark organization
+
+At the top level benchmarks are divided into `internal` and `API` directories.
+API benchmarks are for public features that we expect users to consume.
+Internal benchmarks capture the performance of cuDF internals that have no stability guarantees.
+
+Within each directory, benchmarks are organized based on the type of function.
+Functions in cuDF generally fall into two groups:
+
+1. Methods of classes like `DataFrame` or `Series`.
+2. Free functions operating on the above classes like `cudf.merge`.
+
+The former should be organized into files named `bench_class.py`.
+For example, benchmarks of `DataFrame.eval` belong in `API/bench_dataframe.py`.
+Benchmarks should be written at the highest level of generality possible with respect to the class hierarchy.
+For instance, all classes support the `take` method, so those benchmarks belong in `API/bench_frame_or_index.py`.
+If a method has a slightly different API for different classes, benchmarks should use a minimal common API,
+_unless_ developers expect certain arguments to trigger code paths with very different performance characteristics.
+One example, is `DataFrame.where`, which supports a wide range of inputs (like other `DataFrame`s) that other classes don't support.
+Therefore, we have separate benchmarks for `DataFrame`, in addition to the general benchmarks for all `Frame` and `Index` classes.
+
+```{note}
+`pytest` does not support having two benchmark files with the same name, even if they are in separate directories.
+Therefore, benchmarks of internal methods of _public_ classes go in files suffixed with `_internal`.
+Benchmarks of `DataFrame._apply_boolean_mask`, for instance, belong in `internal/bench_dataframe_internal.py`.
+```
+
+Free functions have more flexibility.
+Broadly speaking, they should be grouped into benchmark files containing similar functionality.
+For example, I/O benchmarks can all live in `bench_io.py`.
+For now those groupings are left to the discretion of developers.
+
+## Running benchmarks
+
+The main distinction between tests and benchmarks is configured prefix.
+Bench files and functions are prefixed with `bench_` rather than `test_`.
+`pytest` is automatically configured to find benchmarks this way, though,
+so running benchmarks is as simple as installing `pytest-benchmark` and then running `pytest`.
+
+When benchmarks are run, the default behavior is to output the results in a table to the terminal.
+A common requirement is to then compare the performance of benchmarks before and after a change.
+We can generate these comparisons by saving the output using the `--benchmark-autosave` option to pytest.
+When using this option, after the benchmarks are run the output will contain a line:
+```
+Saved benchmark data in: /path/to/XXXX_*.json
+```
+
+The `XXXX` is a four-digit number identifying the benchmark.
+If preferred, a user may also use the `--benchmark-save=NAME` option,
+which allows more control over the resulting filename.
+Given two benchmark runs `XXXX` and `YYYY`, benchmarks can then be compared using
+```
+pytest-benchmark compare XXXX YYYY
+```
+Note that the comparison uses the `pytest-benchmark` command rather than the `pytest` command.
+`pytest-benchmark` has a number of additional options that can be used to customize the output.
+The next line contains one useful example, but developers should experiment to find a useful output
+```
+pytest-benchmark compare XXXX YYYY --sort="name" --columns=Mean --name=short --group-by=param
+```
+
+## Benchmark contents
+
+### Benchmark configuration
+
+Our benchmarks aim to support [comparing to pandas](pandascompare) or [running tests](testing) out of the box.
+In order for the former to work, _all tests must import `cudf` and `cupy` from the `config` module_.
+In other words:
+```python
+from ..common.config import cudf  # This is good
+import cudf  # This is bad
+```
+Testing is usually transparently supported except when users define custom fixtures or cases.
+In those instances, as a general rule developers should avoid hardcoding data sizes in benchmarks.
+Instead, data sizes should be dependent on variables stored in `config.py`.
+This requirement is discussed in more depth below.
+
+### Writing benchmarks 
+
+Just as benchmarks should be written in terms of the highest level classes in the hierarchy,
+they should also assume as little as possible about the nature of the data.
+For instance, unless there are meaningful functional differences,
+benchmarks should not care about the dtype or nullability of the data.
+Objects that differ in these ways should be interchangeable for most benchmarks.
+The goal of writing benchmarks in this way is to then automatically benchmark objects with different properties.
+We support this use case with the `benchmark_with_object` decorator.
+
+The use of this decorator is best demonstrated by example:
+
+```python
+@benchmark_with_object(cls="dataframe", dtype="int", cols=6)
+def bench_foo(benchmark, dataframe):
+    benchmark(dataframe.foo)
+```
+
+In the example above `bench_foo` will be run for DataFrames containing six columns of integer data.
+The decorator allows automatically parametrizing the following object properties:
+
+- `cls`: Objects of a specific class, e.g. `DataFrame`.
+- `dtype`: Objects of a specific dtype.
+- `nulls`: Objects with and without null entries.
+- `cols`: Objects with a certain number of columns.
+- `rows`: Objects with a certain number of rows.
+
+In the example, since we did not specify the number of rows or nullability,
+it will be run once for each valid number of rows and for both nullable and non-nullable data.
+The valid set of all parameters (e.g. the numbers of rows) is stored in the `common/config.py` file.
+This decorator allows a developer to write a generic benchmark that works for many types of objects,
+then have that benchmark automatically run for all objects of interest.
+
+### Custom fixtures
+
+Developers may define custom fixtures if necessary, but this should be done with care.
+The `benchmark_with_object` decorator covers most use cases and automatically guarantees a baseline of benchmark coverage.
+When writing fixtures, developers should make the data sizes dependent on the benchmarks configuration.
+The `benchmarks/common/config.py` file defines standard data sizes to be used in benchmarks.
+These data sizes can be tweaked for debugging purposes (see {ref}`testing` below).
+Fixture sizes should be relative to the `NUM_ROWS` and/or `NUM_COLS` variables defined in the config module.
+
+### Parametrization vs. fixtures
+
+Our benchmarks make use of the [`pytest_cases`](https://smarie.github.io/python-pytest-cases/) `pytest` plugin.
+This plugin allows us to handle parametrization much more cleanly than pytest does out of the box.
+Specifically, it provides some syntactic sugar around
+
+```python
+@pytest.mark.parametrize(
+    "num", [1, 2, 3]
+)
+def bench_foo(benchmark, num):
+    benchmark(num * 2)
+```
+
+for when the parameters are nontrivial and require complex initialization.
+This is common for benchmarks of functions accepting cuDF objects, such as `cudf.concat`.
+With `pytest_cases`, the different cases are instead placed into separate functions and automatically made available.
+
+```python
+# bench_foo_cases.py
+def case_1():
+    return 1
+
+def case_2():
+    return 2
+
+def case_3():
+    return 3
+
+# bench_foo.py
+@pytest_cases.parametrize_with_cases(num)
+def bench_foo(benchmark, num):
+    benchmark(num * 2)
+```
+
+This approach is strongly encouraged within cuDF benchmarks.
+It forces developers to put complex initialization into named, documented functions.
+That becomes especially valuable when benchmarking APIs whose performance can vary drastically based on parameters.
+Additionally, cases, like fixtures, are lazily evaluated.
+Initializing complex objects inside a `pytest.mark.parametrize` can dramatically slow down test collection,
+or even lead to out of memory issues if too many complex cases are collected.
+Using lazy case functions ensures that the associated objects are only created on an as-needed basis.
+
+When writing cases, just as in writing custom fixtures, developers should be make use of the config variables.
+Cases should import the `NUM_ROWS` and/or `NUM_COLS` variables from the config module and use them to define data sizes.
+
+The observant reader may recognize that cases seem quite familiar to fixtures.
+While the implementations are in fact quite similar, for our purposes it is important to keep the two distinct.
+From a conceptual standpoint, fixtures are homogeneous and generic while cases are heterogeneous and specific.
+Fixtures should be created sparingly and used broadly.
+Cases can be precisely tailored for a single test.
+
+(pandascompare)=
+
+## Comparing to pandas
+
+An important aspect of benchmarking cuDF is comparing it to pandas.
+We often want to generate quantitative comparisons, so we need to make that as easy as possible.
+Our benchmarks support this by setting the environment variable `CUDF_BENCHMARKS_USE_PANDAS`.
+When this variable is detected, all benchmarks will automatically be run using pandas instead of cuDF.
+Therefore, comparisons can easily be generated by simply running the benchmarks twice,
+once with the variable set and once without.
+
+```{note}
+`CUDF_BENCHMARKS_USE_PANDAS` effectively remaps `cudf` to `pandas` and `cupy` to `numpy`.
+It does so by aliasing these modules in `common.config.py`.
+This aliasing is why it is critical for developers to import these packages from `config.py`.
+```
+
+(testing)=
+
+## Testing
+
+Benchmarks need to be kept up to date with API changes in cuDF.
+However, we cannot simply run benchmarks in CI.
+Doing so would consume too many resources, and it would significantly slow down the development cycle
+
+To balance these issues, our benchmarks also support running in "testing" mode.
+To do so, developers can set the `CUDF_BENCHMARKS_DEBUG_ONLY` environment variable.
+When benchmarks are run with this variable, all data sizes are set to a minimum and the number of sizes are reduced.
+Our CI testing takes advantage of this to ensure that benchmarks remain valid code.
+
+```{note}
+The objects provided by `benchmark_with_object` respect the `NUM_ROWS` and `NUM_COLS` defined in `common/config.py`.
+`CUDF_BENCHMARKS_DEBUG_ONLY` works by conditionally redefining these values.
+This is why it is crucial for developers to use these variables when defining custom fixtures or cases.
+```
+
+## Profiling
+
+Although not strictly part of our benchmarking suite, profiling is a common need so we provide some guidelines here.
+There are two easy ways to profile benchmarks:
+1. The [`pytest-profiling`](https://github.com/man-group/pytest-plugins/tree/master/pytest-profiling) plugin.
+2. The [`py-spy`](https://github.com/benfred/py-spy) package.
+
+Using the former is as simple as adding the `--profile` (or `--profile-svg`) arguments to your `pytest` invocation.
+The latter requires instead invoking pytest from py-spy, like so:
+```
+py-spy record -- pytest bench_foo.py
+```
+Depending on exactly what information you need, your mileage may vary with each one.
+Developers should try both and see what works for their workflows.
+
+(advancedtopics)=
+
+## Advanced Topics
+
+This section discusses some underlying details of how cuDF benchmarks work.
+They are not usually necessary for typical developers or benchmark writers.
+This information is primarily for developers looking to extend the types of objects that can be easily benchmarked.
+
+### Understanding `benchmark_with_object`
+
+Under the hood, `benchmark_with_object` is made up of two critical pieces, fixture unions and some decorator magic.
+
+#### Fixture unions
+
+Fixture unions are a feature of [`pytest_cases`](https://smarie.github.io/python-pytest-cases/).
+A fixture union is a fixture that, when used as a test function parameter,
+will trigger the test to run once for each fixture contained in the union.
+Since most cuDF benchmarks can be run with the same relatively small set of objects,
+our benchmarks generate the Cartesian product of possible fixtures and then create all possible unions.
+
+This feature is critical to the design of our benchmarks.
+For each of the relevant parameter combinations (size, nullability, etc) we programatically generate a new fixture.
+The resulting fixtures are unambiguously named according to the following scheme:
+`{classname}_dtype_{dtype}[_nulls_{true|false}][[_cols_{num_cols}]_rows_{num_rows}]`.
+If a fixture name does not contain a particular component, it represents a union of all values of that component.
+As an example, consider the fixture `dataframe_dtype_int_rows_100`.
+This fixture is a union of both nullable and non-nullable `DataFrame`s of with different numbers of columns.
+
+#### The `benchmark_with_object` decorator
+
+The long names of the above unions are cumbersome when writing tests.
+Moreover, having this information embedded in the name means that in order to change the parameters used,
+the entire benchmark needs to have the fixture name replaced.
+The `benchmark_with_object` decorator is the solution to this problem.
+When used on a test function, it essentially replaces the function parameter name with the true fixture.
+In our original example from above
+
+```python
+@benchmark_with_object(cls="dataframe", dtype="int", cols=6)
+def bench_foo(benchmark, dataframe):
+    benchmark(dataframe.foo)
+```
+
+is functionally equivalent to
+
+```python
+def bench_foo(benchmark, dataframe_dtype_int_cols_6):
+    benchmark(dataframe.foo)
+```
diff --git a/docs/cudf/source/developer_guide/index.md b/docs/cudf/source/developer_guide/index.md
@@ -5,3 +5,4 @@
 
 library_design
 documentation
+benchmarking
Original file line number	Diff line number	Diff line change
Expand Up		@@ -5,3 +5,4 @@

		library_design
		documentation
		benchmarking