facebookresearch · moyapchen · Dec 23, 2021 · Nov 15, 2021 · Nov 16, 2021 · Nov 16, 2021
diff --git a/conftest.py b/conftest.py
@@ -67,6 +67,7 @@ def filter_tests_with_circleci(test_list):
     ('datatests/', 'data'),
     ('parlai/tasks/', 'teacher'),
     ('tasks/', 'tasks'),
+    ('tod/', 'tod'),
 ]
 
 

diff --git a/parlai/core/teachers.py b/parlai/core/teachers.py
@@ -726,6 +726,8 @@ def num_episodes(self) -> int:
         """
         Return the number of episodes in the data.
         """
+        if hasattr(self, "_num_episodes_cache"):
+            return self._num_episodes_cache
         try:
             return self.data.num_episodes()
         except AttributeError:

diff --git a/parlai/core/tod/README.md b/parlai/core/tod/README.md
@@ -0,0 +1,79 @@
+# Task-Oriented Dialog (TOD) Core README
+
+For the quickest getting-to-use of TOD classes, start with the "Teachers + Agents Usage" section below (for understanding how to setup agents such that they work with new datasets) and `parlai/scripts/tod_world_script.py` (for understanding how to run simulations with the TOD conversations format). 
+
+See `projects/tod_simulator/README` for a higher-level usage-focused README. This document also describes the structure of the contents of this directory. 
+
+As a convention, files referenced externally to this directory are prefixed with `tod` whereas those only referenced by other files within the directory are not. 
+
+---
+
+# Teachers + Agents Usage
+
+See `tod_agents.py` for the classes.  
+
+For a given dataset, extend `TodStructuredDataParser` and implement `generate_episodes()` and `get_id_task_prefix()`. The former of these is expected to do the data processing to convert a dataset to `List[TodStructuredEpisode]`. From here, multiple inheritance can be used to define Agents and Teachers that utilize the data.
+
+For example, given a `class XX_DataParser(TodStructuredDataParser)`, `class XX_UserSimulatorTeacher(XX_DataParser, TodUserSimulatorTeacher)` would be how one would define a teacher that generates training data for a User Simulator model.
+
+Once the relevant agents have been created (or relevant models have been fine-tuned), see `parlai.scripts.tod_world_script` for generating the simulations themselves.
+
+## Why we do this
+These files aid in consistency between Teachers and Agents for simulation. Rather than having to align multiple different agents to be consistent about assuptions about data formatting, tokens, spacing, etc, we do this once (via converting everything to `TodStructuredEpisode`) and let the code handle the rest.
+
+# Description of Agents + Teachers useful for Simulation
+## Teachers for training (generative) models
+    * TodSystemTeacher
+    * TodUserSimulatorTeacher
+
+## Agents for Grounding
+For goal grounding for the User for simulation:
+    * TodGoalAgent
+        * Dumps goals as is from the dataset, possibly multiple per episode
+    * TodSingleGoalAgent
+        * Flattens goals such that a single one is used to seed a conversation. For datasets that include multiple goals per conversation, each individual goal is used as a seed.
+
+For (optional) API schema grounding for the System:
+    * TodApiSchemaAgent (must be used with `TodGoalAgent` only)
+    * TodSingleApiSchemaAgent (must be used with `TodSingleGoalAgent` only)
+    * EmptyApiSchemaAgent
+        * Used for simulations where the expectation is `no schema`, ie, evaluation simulations.
+
+## Agents for mocking APIs:
+    * StandaloneApiAgent
+         * Assumed to be provided a .pickle file 'trained' by `TodStandaloneApiTeacher`. (See comments in-line on classes for train command example)
+
+# Agents for dumping data from a ground truth dataset
+The following are for extracting TOD World metrics from a ground truth dataset. These are generally used sparingly and only for calculating baselines.
+    * TodApiCallAndSysUttAgent
+    * TodApiResponseAgent
+    * TodUserUttAgent
+
+For this metrics extraction, `TodGoalAgent` and `TodApiSchemaAgent` should be used.
+
+# Other agents
+There is a `EmptyGoalAgent` for use in human-human conversations where a goal is unnecessary.
+
+---
+
+# Directory contents
+
+This directory is split into 3 main components: files to support agents + teachers, files to support the simulation world, and files to store functionality common to both of these. We describe the common functionality first then go to the other two.
+
+Tests for all files in this directory are stored in `tests/tod`
+
+## Files for common functionality 
+`tod_core.py` defines consts and enums used across TOD agents, teachers, and world. It also defines dataclasses for storing the intermediate data format used when parsing a dataset to the TOD structure as well as a `SerializationHelper` from going from machine structured data (ex. API Calls) to flattened versions used by the models.
+
+
+## Files for agents and teachers
+Usage of `tod_agents.py` is described above. It references `teacher_metrics.py` which stores Metrics objects.
+
+## Files for simulation world
+Description of usage of the simulation world is primarily stored in the script running the world, stored in `parlai/scripts/tod_world_script.py`. The script is responsible for running multiple episodes of simulation and saving simulation output data. 
+
+The world itself is stored in `tod_world.py`. The world follows the same intermediate dataformats for episodes as described in `tod_core.py` and does the correct calling of different agents to support this. It is generally recommended that this file not be touched. 
+
+A general class for collecting metrics out of `TODWorld` is stored within `world_metrics.py` with individual 'metric handlers' responsible for calculating a given metric stored in `world_metric_handlers.py`. 
+
+
diff --git a/parlai/core/tod/teacher_metrics.py b/parlai/core/tod/teacher_metrics.py
@@ -0,0 +1,148 @@
+#!/usr/bin/env python3
+
+# Copyright (c) Facebook, Inc. and its affiliates.
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Task Oriented Dialogue (TOD) teacher metrics.
+"""
+from typing import Optional, List, Dict, Any
+from parlai.core.metrics import AverageMetric, BleuMetric, F1Metric, Metric, Metrics
+
+
+class SlotMetrics(Metrics):
+    """
+    Helper container which encapsulates standard slot metrics in task oriented learning
+    (jga, slot_p, slot_r, etc).
+
+    Due to differences in dialogue representations between tasks, the input is pre-
+    parsed ground truth and predicted slot dictionaries.
+    """
+
+    def __init__(
+        self,
+        teacher_slots: Dict[str, str],
+        predicted_slots: Dict[str, str],
+        prefixes: Optional[List] = None,
+        shared: Dict[str, Any] = None,
+    ) -> None:
+        super().__init__(shared=shared)
+        self.prefixes = prefixes if prefixes else []
+        # jga and optionally Avg(jga,nlg_bleu)
+        self.add_with_prefixes("jga", AverageMetric(teacher_slots == predicted_slots))
+        if len(teacher_slots) > 0:
+            self.add_with_prefixes(
+                "jga_noempty", AverageMetric(teacher_slots == predicted_slots)
+            )
+        else:
+            self.add_with_prefixes(
+                "jga_empty", AverageMetric(teacher_slots == predicted_slots)
+            )
+
+        # precision
+        for pred_slot_name, pred_value in predicted_slots.items():
+            slot_p = AverageMetric(teacher_slots.get(pred_slot_name) == pred_value)
+            self.add_with_prefixes("slot_p", slot_p)
+            self.add_with_prefixes("slot_f1", SlotF1Metric(slot_p=slot_p))
+        # recall
+        for teacher_slot_name, teacher_value in teacher_slots.items():
+            slot_r = AverageMetric(
+                predicted_slots.get(teacher_slot_name) == teacher_value
+            )
+            self.add_with_prefixes("slot_r", slot_r)
+            self.add_with_prefixes("slot_f1", SlotF1Metric(slot_r=slot_r))
+
+    def add_with_prefixes(self, name, value):
+        self.add(name, value)
+        for prefix in self.prefixes:
+            self.add(f"{prefix}/{name}", value)
+
+
+class NlgMetrics(Metrics):
+    """
+    Helper container for generation version of standard metrics (F1, BLEU, ..).
+    """
+
+    def __init__(
+        self,
+        guess: str,
+        labels: Optional[List[str]],
+        prefixes: Optional[List[str]] = None,
+        shared: Dict[str, Any] = None,
+        avg_jga_nlg_bleu: bool = False,
+    ) -> None:
+        super().__init__(shared=shared)
+        self.prefixes = prefixes if prefixes else []
+        bleu = BleuMetric.compute(guess, labels)
+        f1 = F1Metric.compute(guess, labels)
+        self.add_with_prefixes("nlg_bleu", bleu)
+        self.add_with_prefixes("nlg_f1", f1)
+
+    def add_with_prefixes(self, name, value):
+        self.add(name, value)
+        for prefix in self.prefixes:
+            self.add(f"{prefix}/{name}", value)
+
+
+AverageType = Optional[AverageMetric]
+
+
+def _average_type_sum_helper(first: AverageType, second: AverageType) -> AverageType:
+    """
+    Helper to deal with Nones.
+
+    We are "clever" in how we aggregate SlotF1Metrics (See SlotMetrics `__init__`) in
+    that we add precision and recall values separately, but this means we need to handle
+    None.
+    """
+    if first is None:
+        return second
+    if second is None:
+        return first
+    return first + second
+
+
+class SlotF1Metric(Metric):
+    """
+    Metric to keep track of slot F1.
+
+    Keeps track of slot precision and slot recall as running metrics.
+    """
+
+    __slots__ = ("_slot_p", "_slot_r")
+
+    @property
+    def macro_average(self) -> bool:
+        """
+        Indicates whether this metric should be macro-averaged when globally reported.
+        """
+        return True
+
+    def __init__(self, slot_p: AverageType = None, slot_r: AverageType = None):
+        if not isinstance(slot_p, AverageMetric) and slot_p is not None:
+            slot_p = AverageMetric(slot_p)
+        if not isinstance(slot_r, AverageMetric) and slot_r is not None:
+            slot_r = AverageMetric(slot_r)
+        self._slot_p = slot_p
+        self._slot_r = slot_r
+
+    def __add__(self, other: Optional["SlotF1Metric"]) -> "SlotF1Metric":
+        # NOTE: hinting can be cleaned up with "from __future__ import annotations" when
+        # we drop Python 3.6
+        if other is None:
+            return self
+        slot_p = _average_type_sum_helper(self._slot_p, other._slot_p)
+        slot_r = _average_type_sum_helper(self._slot_r, other._slot_r)
+        return type(self)(slot_p=slot_p, slot_r=slot_r)
+
+    def value(self) -> float:
+        if self._slot_p is None or self._slot_r is None:
+            return float("nan")
+        else:
+            slot_p = self._slot_p.value()
+            slot_r = self._slot_r.value()
+            if slot_p == 0.0 and slot_r == 0.0:
+                return float("nan")
+            else:
+                return 2 * (slot_p * slot_r) / (slot_p + slot_r)