Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions tensorboard/data/provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,56 @@ class DataProvider(object):
providers may be asked to implement more filtering mechanisms, such as
downsampling strategies or domain restriction by step or wall time.

The data provider interface specifies three *data classes*: scalars,
tensors, and blob sequences. All data is stored in *time series* for
one of these data classes. A time series is identified by run name and
tag name (each a non-empty text string), as well as an experiment ID
and plugin name (see below). Points in a time series are uniquely
indexed by *step*, an arbitrary non-negative integer. Each point in a
time series also has an associated wall time, plus its actual value,
which is drawn from the corresponding data class.

Each point in a scalar time series contains a single scalar value, as
a 64-bit floating point number. Scalars are "privileged" rather than
being subsumed under tensors because there are useful operations on
scalars that don't make sense in the general tensor case: e.g., "list
all scalar time series with tag name `accuracy` whose exponentially
weighted moving average is at least 0.999".

Each point in a tensor time series contains a tensor of arbitrary
dtype (including byte strings and text strings) and shape (including
rank-0 tensors, a.k.a. scalars). Each tensor is expected to be
"reasonably small" to accommodate common database cell size limits.
For instance, a histogram with a bounded number of buckets (say, 30)
occupies about 500 bytes, and a PR curve with a bounded number of
thresholds (say, 201) occupies about 5000 bytes. These are both well
within typical database tolerances (Google Cloud Spanner: 10 MiB;
MySQL: 64 KiB), and would be appropriate to store as tensors. By
contrast, image, audio, or model graph data may easily be multiple
megabytes in size, and so should be stored as blobs instead. The
tensors at each step in a time series need not have the same dtype or
shape.

Each point in a blob sequence time series contains an ordered sequence
of zero or more blobs, which are arbitrary data with no tensor
structure. These might represent PNG-encoded image data, protobuf wire
encodings of TensorFlow graphs, or PLY-format 3D mesh data, for some
examples. This data class provides blob *sequences* rather than just
blobs because it's common to want to take multiple homogeneous samples
of a given time series: say, "show me the bounding box classifications
for 3 random inputs from this batch". A single blob can of course be
represented as a blob sequence that always has exactly one element.

Every time series belongs to a specific experiment and is owned by a
specific plugin. (Thus, the "primary key" for a time series has four
components: experiment, plugin, run, tag.) The experiment ID is an
arbitrary URL-safe non-empty text string, whose interpretation is at
the discretion of the data provider. As a special case, the empty
string as an experiment ID denotes that no experiment was given. Data
providers may or may not fully support an empty experiment ID. The
plugin name should correspond to the `plugin_data.plugin_name` field
of the `SummaryMetadata` proto passed to `tf.summary.write`.

Unless otherwise noted, any methods on this class may raise errors
defined in `tensorboard.errors`, like `tensorboard.errors.NotFoundError`.
"""
Expand Down