Skip to content

Commit

Permalink
Add transform proposal (#113)
Browse files Browse the repository at this point in the history
* Introduce transform concept

The proposed `Transform` class is a thin, possibly stateful class mapping
records to other records.

It is generally assumed to be invertible, which is expressed by the
addition of an `iapply()` method (name pending), which can be used to
(approximately) revert records to their old form IF the given transform
supports it.

The added script showcases both proposals, with a barebones example of
why it is useful: To serialize a given model into a set of digestible,
unique identifiers, and doing the reverse in the inverse method.

This way, even custom model classes can be written to JSON, where they
are (very simply) saved via their chosen transform parametrization.

* Add `nnbench.io.transforms` module containing transform stub classes

Contains the basic definitions for 1->1, N->1 and N->N record transforms.

Right now, this comes without guarantees or enforcement of attributes
(invertibility/length preservation), but this can be added in the future.

Users have to mark transform capabilities on their own.

* Change transform example to use a `OneToOneTransform`

Thanks to the new base class, this is a no-brainer.

* Add transform doc

Covers the basic ideas, includes a usage example, and gives tips on what
to consider when designing transforms.

* Add full example appendix

Trim a leftover comment, adjusting the snippet line numbers in the process.
  • Loading branch information
nicholasjng authored Mar 15, 2024
1 parent fa05c66 commit cdf4884
Show file tree
Hide file tree
Showing 4 changed files with 293 additions and 0 deletions.
61 changes: 61 additions & 0 deletions docs/guides/transforms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Using transforms to manipulate benchmark records

After a successful benchmark run execution, you end up with your metrics, context, and parameters in a single benchmark record struct.
In general, this data is a best-effort representation of the environment and configuration the benchmarks are run in.

However, in some situations, manual editing and transformation of these records is required.
nnbench exposes the `nnbench.io.transforms` module to facilitate these transforms.

## Types of transforms: 1->1 vs. N->1 vs. N->N

In nnbench, transforms are grouped by the functional relationship between inputs and outputs.
The easiest case is a 1->1 (one-to-one) transform, which takes a record and produces another.

In the N->1 (N-to-one) case, the transform takes a collection of records and produces a single output record.
This case is very common when computing statistics on records, like mean and variance of target metrics.

In the N->N (N-to-N) case, the transform maps the input record collection to an output collection, generally assumed to be of the same length.
This case is common when mapping records to an equivalent but more easily digestible record format.

The following is an example of a 1->1 transform, which maps the benchmark parameters to representations that are JSON-serializable.

```python
--8<-- "examples/transforms/transforms.py:33:54"
```

In the `MyTransform.apply()` method, the NumPy array is serialized as a list by calling `array.tolist()`, while the model is saved by its checksum only.
In real applications, parametrizing the model with basic Python values will likely take more effort, but this is a first example of how to do it.

The transform is applied on the resulting record, and allows writing the record to JSON without any errors that would normally occur.

```python
--8<-- "examples/transforms/transforms.py:68:75"
```

## Invertible transforms

Borrowing from the same concept in linear algebra, an nnbench `Transform` is said to be **invertible** if there is a function that restores the original record when applied on the transformed record.
For simplicity, the inverse of a transform can be directly defined in the class with the `Transform.iapply()` method.

In general, when designing an invertible transform, it should hold that for any benchmark record `r`, `T.iapply(T.apply(r)) == r`.
A transform is signalled to be invertible if the `Transform.invertible` attribute is set to `True`.

!!! Tip
While this framework is useful for designing and thinking about transforms, it is not actually enforced by nnbench.
nnbench will not take any steps to ensure invertibility of transforms, so any transforms should be tested against expected benchmark record data.

## General considerations for writing and using transforms

A few points are useful to keep in mind while writing transforms:

* It is in general not advised to inject arbitrary metadata into records via a transform. If you find yourself needing to supply more metadata, consider using a `ContextProvider` instead.
* When serializing Python values (like the benchmark parameters), be careful to choose a unique representation, otherwise you might not be able to reconstruct model and data versions from written records in a reproducible manner.
* When designing a transform that is not invertible, consider raising a `NotImplementedError` in the `iapply()` method to prevent accidental calls to the (ill-defined) inverse.

## Appendix: The full example code

Here is the full example on how to use transforms for record-to-file serialization:

```python
--8<-- "examples/transforms/transforms.py"
```
78 changes: 78 additions & 0 deletions examples/transforms/transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
from dataclasses import dataclass
from typing import Any

import numpy as np

import nnbench
from nnbench.io.transforms import OneToOneTransform
from nnbench.reporter.file import FileIO
from nnbench.types import BenchmarkRecord


class MyModel:
def __init__(self, checksum: str):
self.checksum = checksum

def apply(self, data: np.ndarray) -> float:
return data.mean()

def to_json(self) -> dict[str, Any]:
return {"checksum": self.checksum}

@classmethod
def from_json(cls, obj: dict[str, Any]) -> "MyModel":
# intentionally fail if no checksum is given.
return cls(checksum=obj["checksum"])


@nnbench.benchmark
def accuracy(model: MyModel, data: np.ndarray) -> float:
return model.apply(data)


class MyTransform(OneToOneTransform):
def apply(self, record: BenchmarkRecord) -> BenchmarkRecord:
"""Apply this transform on a record."""
for b in record.benchmarks:
params: dict[str, Any] = b["parameters"]
b["parameters"] = {
"model": params["model"].to_json(),
"data": params["data"].tolist(),
}
return record

def iapply(self, record: BenchmarkRecord) -> BenchmarkRecord:
"""Apply the inverse of this transform."""
for b in record.benchmarks:
params: dict[str, Any] = b["parameters"]
b["parameters"] = {
"model": MyModel.from_json(params["model"]),
"data": np.asarray(params["data"]),
}
return record


def main():
@dataclass(frozen=True)
class MyParams(nnbench.Parameters):
model: MyModel
data: np.ndarray

runner = nnbench.BenchmarkRunner()

m = MyModel(checksum="12345")
data = np.random.random_sample((10,))
params = MyParams(m, data)
record = runner.run(__name__, params=params)

transform = MyTransform()
trecord = transform.apply(record)
f = FileIO()
f.write(trecord, "record.json")

record2 = f.read("record.json")
new_record = transform.iapply(record2)


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ nav:
- guides/organization.md
- guides/runners.md
- guides/artifacts.md
- guides/transforms.md
- Examples:
- tutorials/index.md
- tutorials/artifact_benchmarking.md
Expand Down
153 changes: 153 additions & 0 deletions src/nnbench/io/transforms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""Metaclasses for defining transforms acting on benchmark records."""

from typing import Sequence

from nnbench.types import BenchmarkRecord


class Transform:
"""The basic transform which every transform has to inherit from."""

pass


class OneToOneTransform(Transform):
invertible: bool = True
"""
Whether this transform is invertible,
i.e. records can be converted back and forth with no changes or data loss.
"""

def apply(self, record: BenchmarkRecord) -> BenchmarkRecord:
"""
Apply this transform to a benchmark record.
Parameters
----------
record: BenchmarkRecord
Benchmark record to apply the transform on.
Returns
-------
BenchmarkRecord
The transformed benchmark record.
"""

def iapply(self, record: BenchmarkRecord) -> BenchmarkRecord:
"""
Apply the inverse of this transform.
In general, applying the inverse on a record not previously transformed
may yield unexpected results.
Parameters
----------
record: BenchmarkRecord
Benchmark record to apply the inverse transform on.
Returns
-------
BenchmarkRecord
The inversely transformed benchmark record.
"""


class ManyToOneTransform(Transform):
"""
A many-to-one transform reducing a collection of records to a single record.
This is useful for computing statistics on a collection of runs.
"""

invertible: bool = True
"""
Whether this transform is invertible,
i.e. records can be converted back and forth with no changes or data loss.
"""

def apply(self, record: Sequence[BenchmarkRecord]) -> BenchmarkRecord:
"""
Apply this transform to a benchmark record.
Parameters
----------
record: Sequence[BenchmarkRecord]
A sequence of benchmark record to apply the transform on,
yielding a single resulting record.
Returns
-------
BenchmarkRecord
The transformed (reduced) benchmark record.
"""

def iapply(self, record: BenchmarkRecord) -> Sequence[BenchmarkRecord]:
"""
Apply the inverse of this transform.
In general, applying the inverse on a record not previously transformed
may yield unexpected results.
Parameters
----------
record: BenchmarkRecord
Benchmark record to apply the inverse transform on.
Returns
-------
Sequence[BenchmarkRecord]
The inversely transformed benchmark record sequence.
"""
# TODO: Does this even make sense? Can't hurt to allow it on paper, though.


class ManyToManyTransform(Transform):
"""
A many-to-many transform mapping an input record collection to an output collection.
Use this to programmatically wrangle metadata or types in records, or to
convert parameters into database-ready representations.
"""

invertible: bool = True
"""
Whether this transform is invertible,
i.e. records can be converted back and forth with no changes or data loss.
"""
length_invariant: bool = True
"""
Whether this transform preserves the number of records, i.e. no records are dropped.
"""

def apply(self, record: Sequence[BenchmarkRecord]) -> Sequence[BenchmarkRecord]:
"""
Apply this transform to a benchmark record.
Parameters
----------
record: Sequence[BenchmarkRecord]
A sequence of benchmark record to apply the transform on.
Returns
-------
Sequence[BenchmarkRecord]
The transformed benchmark record sequence.
"""

def iapply(self, record: Sequence[BenchmarkRecord]) -> Sequence[BenchmarkRecord]:
"""
Apply the inverse of this transform.
In general, applying the inverse on a record not previously transformed
may yield unexpected results.
Parameters
----------
record: Sequence[BenchmarkRecord]
A sequence of benchmark record to apply the transform on.
Returns
-------
Sequence[BenchmarkRecord]
The inversely transformed benchmark record sequence.
"""

0 comments on commit cdf4884

Please sign in to comment.