Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Commit

Permalink
[docs] Add docs explaining metrics. (#3498)
Browse files Browse the repository at this point in the history
* Add docs explaining metrics.

* Slightly change title
  • Loading branch information
stephenroller authored Mar 8, 2021
1 parent c110f73 commit 7224486
Showing 1 changed file with 43 additions and 1 deletion.
44 changes: 43 additions & 1 deletion docs/source/tutorial_metrics.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Understanding and adding new metrics
# Understanding and adding metrics

Author: Stephen Roller

## Introduction and Standard Metrics

:::{tip} List of metrics
If you're not sure what a metric means, refer to our [List of metrics](#list-of-metrics).
:::

ParlAI contains a number of built-in metrics that are automatically computed when
we train and evaluate models. Some of these metrics are _text generation_ metrics,
which happen any time we generate a text: this includes F1, BLEU and Accuracy.
Expand Down Expand Up @@ -53,6 +57,7 @@ One nice thing about metrics is that they are automatically logged to the
statements into your code.



### Agent-specific metrics

Some agents include their own metrics that are computed for them. For example,
Expand Down Expand Up @@ -402,3 +407,40 @@ __Under the hood__: Local metrics work by including a "metrics" field in the
return message. This is a dictionary which maps field name to a metric value.
When the teacher receives the response from the model, it utilizes the metrics
field to update counters on its side.

## List of Metrics

Below is a list of metrics and a brief explanation of each.

:::{note} List of metrics
If you find a metric not listed here,
please [file an issue on GitHub](https://github.com/facebookresearch/ParlAI/issues/new?assignees=&labels=Docs,Metrics&template=other.md).
:::

| Metric | Explanation |
| ----------------------- | ------------ |
| `accuracy` | Exact match text accuracy |
| `bleu-4` | BLEU-4 of the generation, under a standardized (model-independent) tokenizer |
| `clip` | Fraction of batches with clipped gradients |
| `ctpb` | Context tokens per batch |
| `ctps` | Context tokens per second |
| `exps` | Examples per second |
| `exs` | Number of examples processed since last print |
| `f1` | Unigram F1 overlap, under a standardized (model-independent) tokenizer |
| `gnorm` | Gradient norm |
| `gpu_mem` | Fraction of GPU memory used. May slightly underestimate true value. |
| `hits@1`, `hits@5`, ... | Fraction of correct choices in K guesses. (Similar to recall@K) |
| `interdistinct-1`, `interdictinct-2` | Fraction of n-grams unique across _all_ generations |
| `intradistinct-1`, `intradictinct-2` | Fraction of n-grams unique _within_ each utterance |
| `jga` | Joint Goal Accuracy |
| `loss` | Loss |
| `lr` | The most recent learning rate applied |
| `ltpb` | Label tokens per batch |
| `ltps` | Label tokens per second |
| `rouge-1`, `rouge-1`, `rouge-L` | ROUGE metrics |
| `token_acc` | Token-wise accuracy (generative only) |
| `token_em` | Utterance-level token accuracy. Roughly corresponds to perfection under greedy search (generative only) |
| `total_train_updates` | Number of SGD steps taken across all batches |
| `tpb` | Total tokens (context + label) per batch |
| `tps` | Total tokens (context + label) per second |
| `ups` | Updates per second (approximate) |

0 comments on commit 7224486

Please sign in to comment.