diff --git a/content/docs/command-reference/metrics/index.md b/content/docs/command-reference/metrics/index.md index 25b9170426..0ffa120024 100644 --- a/content/docs/command-reference/metrics/index.md +++ b/content/docs/command-reference/metrics/index.md @@ -15,16 +15,6 @@ positional arguments: diff Show changes in metrics between commits. ``` -## Types of metrics - -DVC has two concepts for metrics, that represent different results of machine -learning training or data processing: - -1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_, - etc. -2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss - functions, confusion matrices, etc. - ## Description In order to follow the performance of machine learning experiments, DVC has the @@ -32,9 +22,9 @@ ability to mark a certain stage outputs as metrics. These metrics are project-specific floating-point or integer values e.g. AUC, ROC, false positives, etc. -This type of metrics files are typically generated by user data processing code, -and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`) -options of `dvc stage add`. +Metrics files are typically generated by user data processing code, and are +tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`) options of +`dvc stage add`. In contrast to `dvc plots`, these metrics should be stored in hierarchical files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the diff --git a/content/docs/command-reference/plots/diff.md b/content/docs/command-reference/plots/diff.md index 856168e214..a1b4845d8f 100644 --- a/content/docs/command-reference/plots/diff.md +++ b/content/docs/command-reference/plots/diff.md @@ -1,7 +1,7 @@ # plots diff -Show multiple versions of [plot metrics](/doc/command-reference/plots) by -overlaying them in a single image. This allows to compare them easily. +Show multiple versions of [plots](/doc/command-reference/plots) by overlaying +them in a single image. This allows to compare them easily. ## Synopsis @@ -123,11 +123,11 @@ file:///Users/usr/src/dvc_plots/index.html Compare two specific versions (commit hashes, tags, or branches): ```cli -$ dvc plots diff HEAD 0135527 --targets logs.csv +$ dvc plots diff HEAD^ 0135527 --targets logs.csv file:///Users/usr/src/dvc_plots/index.html ``` -![](/img/plots_diff.svg) +![](/img/plots_diff_two_revs.svg) ## Example: Confusion matrix diff --git a/content/docs/command-reference/plots/index.md b/content/docs/command-reference/plots/index.md index 7b20023f6b..f724d873ef 100644 --- a/content/docs/command-reference/plots/index.md +++ b/content/docs/command-reference/plots/index.md @@ -1,7 +1,7 @@ # plots -A set of commands to visualize and compare _plot metrics_: -[show](/doc/command-reference/plots/show), +A set of commands to visualize and compare data series or images from ML +projects: [show](/doc/command-reference/plots/show), [diff](/doc/command-reference/plots/diff), [modify](/doc/command-reference/plots/modify) and [templates](/doc/command-reference/plots/templates). @@ -13,31 +13,23 @@ usage: dvc plots [-h] [-q | -v] {show,diff,modify,templates} ... positional arguments: COMMAND - show Generate plot from a metrics file. - diff Plot differences in metrics between commits. - modify Modify display properties of data-series plots (has no effect on image-type plots). - templates Write built-in plots templates to a directory (.dvc/plots by default). + show Generate plots from target files or from `plots` + definitions in `dvc.yaml`. + diff Show multiple versions of a plot by overlaying them + in a single image. + modify Modify display properties of data-series plots + defined in stages (has no effect on image plots). + templates Write built-in plots templates to a directory + (.dvc/plots by default). ``` -## Types of metrics - -DVC has two concepts for metrics, that represent different results of machine -learning training or data processing: - -1. `dvc metrics` represent **scalar numbers** such as AUC, _true positive rate_, - etc. -2. `dvc plots` can be used to visualize **data series** such as AUC curves, loss - functions, confusion matrices, etc. - ## Description -DVC provides a set of commands to visualize certain metrics of machine learning -experiments as plots. Usual plot examples are AUC curves, loss functions, -confusion matrices, among others. - -This type of metrics files are created by users, or generated by user data -processing code, and can be defined in `dvc.yaml` (`plots` field) for tracking -(optional). +DVC provides a set of commands to visualize data produced by machine learning +projects. Usual plots include AUC curves, loss functions, or confusion matrices, +for example. Plots are a great alternative to `dvc metrics` when working with +multi-dimensional performance data. They also help you present and compare +[experiments] effectively. DVC can work with two types of plots files: @@ -50,17 +42,18 @@ DVC plots from the [VS Code Extension], which includes a special [Plots Dashboard] that corresponds to the features in the `dvc plots` commands. Data-series plots utilize [Vega-Lite](https://vega.github.io/vega-lite/) for -rendering (declarative JSON grammar for defining graphics). Image-type plots are -rendered using `` tags directly. +rendering (declarative JSON grammar for defining graphics). Images are rendered +using `` tags directly. [vs code extension]: https://marketplace.visualstudio.com/items?itemName=Iterative.dvc [plots dashboard]: https://github.com/iterative/vscode-dvc/blob/main/extension/resources/walkthrough/plots.md +[experiments]: /doc/user-guide/experiment-management/experiments-overview -## Supported file formats +### Supported file formats -Image-type plots are included in HTML as-is, without additional processing. +Images are included in HTML as-is, without additional processing. > We recommend to track these source image files with DVC instead of Git, to > prevent the repository from bloating. @@ -105,7 +98,144 @@ names in the `train` array below: } ``` -## Plot templates (data series only) +## Defining plots + +In order to create visualizations, users need to provide the data and +(optionally) configuration that will help customize the plot. DVC provides two +ways to configure visualizations. Users can mark specific stage +outputs as plots or define top-level `plots` in `dvc.yaml`. + +### Stage plots + +When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular +outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that +they are intended for visualizations. + +Upon running `dvc plots show/diff` DVC will collect stage plots alongside the +[top-level plots](#top-level-plots) and display them conforming to their +configuration. Note, that if there are stage plots in the project and they are +also used in some top-level definitions, DVC will create separate rendering for +the stage plots and all definitions using them. + +This special type of outputs might come in handy if users want to visually +compare experiments results with other experiments versions and not bother with +writing top-level plot definitions in `dvc.yaml`. + +### Top-level plots + +Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike +[stage plots](#stage-plots), these definitions let you overlay plots from +different data sources, for example training vs. test results (on the current +project version). Conversely, you can create multiple plots from a single source +file. You can also use any plot file in the project, regardless of whether it's +a stage outputs. This creates a separation between visualization and outputs. + +In order to define the plot users need to provide data and an optional +configuration for the plot. The plots should be defined in `dvc.yaml` file under +`plots` key. + +```yaml +# dvc.yaml +stages: ... + +plots: ... +``` + +Every plot has to have its own ID. Configuration, if provided, should be a +dictionary. + +In the simplest use case, a user can provide the file path as the plot ID and +not provide configuration at all: + +```yaml +# dvc.yaml +--- +plots: + logs.csv: +``` + +In that case the default behavior will be applied. DVC will take data from +`logs.csv` file and apply `linear` plot +[template](/doc/command-reference/plots#plot-templates) to the last found column +(CSV, TSV files) or field (JSON, YAML). + +We can customize the plot by adding appropriate fields to the configuration: + +```yaml +# dvc.yaml +--- +plots: + confusion_matrix: + y: + confusion_matrix_data.csv: predicted_class + x: actual_class + template: confusion +``` + +In this case we provided `confusion_matrix` as a plot ID. It will be displayed +in the plot as a title, unless we override it with `title` field. In this case +we provided data source in `y` axis definition. Data will be sourced from +`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On +`x` axis we will have `actual_class` field. Note that DVC will assume that +`actual_class` is inside `confusion_matrix_data.csv`. + +We can provide multiple columns/fields from the same file: + +```yaml +#dvc.yaml +--- +plots: + multiple_series: + y: + logs.csv: [accuracy, loss] + x: epoch +``` + +In this case, we will take `accuracy` and `loss` fields and display them agains +`epoch` column, all coming from `logs.csv` file. + +We can source the data from multiple files too: + +```yaml +#dvc.yaml +--- +plots: + multiple_files: + y: + train_logs.csv: accuracy + test_logs.csv: accuracy + x: epoch +``` + +In this case we will plot `accuracy` field from both `train_logs.csv` and +`test_logs.csv` against the `epoch`. Note that both files have to have `epoch` +field. + +### Available configuration fields + +- `x` - field name from which the X axis data comes from. An auto-generated + _step_ field is used by default. It has to be a string. + +- `y` - field name from which the Y axis data comes from. + - Top-level plots: It can be a string, list or dictionary. If its a string or + list, it is assumed that plot ID will be the path to the data source. + String, or list elements will be the names of data columns or fields withing + the source file. If this field is a dictionary, it is assumed that its keys + are paths to data sources. The values have to be either strings or lists, + and are treated as column(s)/field(s) within respective files. + - Plot outputs: It is a field name from which the Y axis data comes from. +- `x_label` - X axis label. The X field name is the default. +- `y_label` - Y axis label. If all provided Y entries have the same field name, + this name will be the default, `y` string otherwise. +- `title` - Plot title. Defaults: + - Top-level plots: `path/to/dvc.yaml::plot_id` + - Plot outputs: Path to the file. + +Refer to the [`show` command] documentation for examples. + +[`show` command]: /doc/command-reference/plots/show#example-top-level-plots + +## Plot templates (data-series only) DVC uses [Vega-Lite](https://vega.github.io/vega-lite/) JSON specifications to create plots from user data. A set of built-in _plot templates_ are included. @@ -133,7 +263,7 @@ DVC has the following built-in plot templates: - `confusion` - confusion matrix, see [example](/doc/command-reference/plots#example-confusion-matrix) -[custom template]: https://dvc.org/doc/command-reference/plots/templates +[custom templates]: https://dvc.org/doc/command-reference/plots/templates - `confusion_normalized` - confusion matrix with values normalized to <0, 1> range @@ -187,7 +317,7 @@ important fields that DVC adds to the plot data: Refer to [`templates`](/doc/command-reference/plots/templates) command for more information on how to prepare your own template from pre-defined ones. -## HTML templates +## Custom HTML templates It's possible to supply an HTML file to `dvc plots show` and `dvc plots diff` by using the the `--html-template` option. This allows you to customize the @@ -209,54 +339,60 @@ this feature to render DVC plots without an Internet connection, below. - `-v`, `--verbose` - displays detailed tracing information. -## Example: Tabular data - -We'll use tabular metrics file `logs.csv` for this example: +## Example: Offline HTML Template -``` -epoch,accuracy,loss,val_accuracy,val_loss -0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257 -1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942 -2,0.98375,0.05241111190887168,0.9788,0.06665669009438716 -3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989 -4,0.99111664,0.027362171787042946,0.978,0.07385754839298315 -5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166 -6,0.9945,0.017702101902437668,0.9803,0.07830339228538505 -7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 -``` +The plots generated by `dvc plots` uses Vega-Lite JavaScript libraries, and by +default these load [online resources](https://vega.github.io/vega/usage/#embed). +There may be times when you need to produce plots without Internet access, or +want to customize the plots output to put some extra content, like banners or +extra text. DVC allows to replace the HTML file that contains the final plots. -Let's plot the last column (default behavior): +Download the Vega-Lite libraries into the directory where you'll produce the +`dvc plots`: ```dvc -$ dvc plots show logs.csv -file:///Users/usr/src/dvc_plots/index.html +$ wget https://cdn.jsdelivr.net/npm/vega@5.20.2 -O my_vega.js +$ wget https://cdn.jsdelivr.net/npm/vega-lite@5.1.0 -O my_vega_lite.js +$ wget https://cdn.jsdelivr.net/npm/vega-embed@6.18.2 -O my_vega_embed.js ``` -![](/img/plots_show.svg) +Create the following HTML file and save it in `.dvc/plots/mypage.html`: -Difference in this metric between the current project version and the previous -commit: +```html + +
+ + + + + + {plot_divs} + + +``` + +Note that this is a standard HTML file with only `{plot_divs}` as a placeholder +for DVC to inject plots. ` - - - - - {plot_divs} - -