openvinotoolkit · zhiltsov-max · Jun 2, 2021 · May 28, 2021 · May 28, 2021 · Jun 2, 2021
@@ -178,6 +178,8 @@ CVAT annotations                             ---> Publication, statistics etc.
 - Model integration
   - Inference (OpenVINO, Caffe, PyTorch, TensorFlow, MxNet, etc.)
   - Explainable AI ([RISE algorithm](https://arxiv.org/abs/1806.07421))
+    - RISE for classification
+    - RISE for object detection
 
 > Check [the design document](docs/design.md) for a full list of features.
 > Check [the user manual](docs/user_manual.md) for usage instructions.

@@ -17,13 +17,38 @@
 
 def build_parser(parser_ctor=argparse.ArgumentParser):
     parser = parser_ctor(help="Run Explainable AI algorithm",
-        description="Runs an explainable AI algorithm for a model.")
+        description="""
+        Runs an explainable AI algorithm for a model.|n
+        |n
+        This tool is supposed to help an AI developer to debug
+        a model and a dataset. Basically, it executes inference and
+        tries to find problems in the trained model - determine decision
+        boundaries and belief intervals for the classifier.|n
+        |n
+        Currently, the only available algorithm is
+        RISE (https://arxiv.org/pdf/1806.07421.pdf), which runs
+        inference and then re-runs a model multiple times
+        on each image to produce a heatmap of activations for
+        each output of the first inference. As a result, we obtain
+        few heatmaps, which shows, how image pixels affected
+        the inference result. This algorithm doesn't require any special
+        information about the model, but it requires the model to
+        return all the outputs and confidences. Check the User Manual
+        for usage examples.|n
+        Supported scenarios:|n
+        - RISE for classification|n
+        - RISE for Object Detection|n
+        |n
+        Examples:|n
+        - Run RISE on an image, display results:|n
+        |s|s%(prog)s -t path/to/image.jpg -m mymodel rise --max-samples 50
+        """, formatter_class=MultilineFormatter)
 
     parser.add_argument('-m', '--model', required=True,
         help="Model to use for inference")
     parser.add_argument('-t', '--target', default=None,
         help="Inference target - image, source, project "
-             "(default: current dir)")
+             "(default: current project)")
     parser.add_argument('-o', '--output-dir', dest='save_dir', default=None,
         help="Directory to save output (default: display only)")
 

@@ -1093,7 +1093,8 @@ datum model add \
 ```
 
 Interpretation script for an OpenVINO detection model (`convert.py`):
-You can find OpenVINO™ model interpreter samples in datumaro/plugins/openvino/samples. [Instruction](datumaro/plugins/openvino/README.md)
+You can find OpenVINO model interpreter samples in
+`datumaro/plugins/openvino/samples` ([instruction](datumaro/plugins/openvino/README.md)).
 
 ``` python
 from datumaro.components.extractor import *
@@ -1182,6 +1183,25 @@ datum diff inference -o diff
 
 ### Explain inference
 
+Runs an explainable AI algorithm for a model.
+
+This tool is supposed to help an AI developer to debug a model and a dataset.
+Basically, it executes inference and tries to find problems in the trained
+model - determine decision boundaries and belief intervals for the classifier.
+
+Currently, the only available algorithm is RISE ([article](https://arxiv.org/pdf/1806.07421.pdf)),
+which runs inference and then re-runs a model multiple times on each
+image to produce a heatmap of activations for each output of the
+first inference. As a result, we obtain few heatmaps, which
+shows, how image pixels affected the inference result. This algorithm doesn't
+require any special information about the model, but it requires the model to
+return all the outputs and confidences. The algorighm only supports
+classification and detection models.
+
+The following use cases available:
+- RISE for classification
+- RISE for object detection
+
 Usage:
 
 ``` bash
@@ -1200,11 +1220,70 @@ Example: run inference explanation on a single image with visualization
 ``` bash
 datum create <...>
 datum model add mymodel <...>
-datum explain \
-    -m mymodel \
-    -t 'image.png' \
-    rise \
-    -s 1000 --progressive
+datum explain -t image.png -m mymodel \
+    rise --max-samples 1000 --progressive
+```
+
+> Note: this algorithm requires the model to return
+> *all* (or a _reasonable_ amount) the outputs and confidences unfiltered,
+> i.e. all the `Label` annotations for classification models and
+> all the `Bbox`es for detection models.
+> You can find examples of the expected model outputs in [`tests/test_RISE.py`](../tests/test_RISE.py)
+
+For OpenVINO models the output processing script would look like this:
+
+Classification scenario:
+
+``` python
+from datumaro.components.extractor import *
+from datumaro.util.annotation_util import softmax
+
+def process_outputs(inputs, outputs):
+    # inputs = model input, array or images, shape = (N, C, H, W)
+    # outputs = model output, logits, shape = (N, n_classes)
+    # results = conversion result, [ [ Annotation, ... ], ... ]
+    results = []
+    for input, output in zip(inputs, outputs):
+        input_height, input_width = input.shape[:2]
+        confs = softmax(output[0])
+        for label, conf in enumerate(confs):
+            results.append(Label(int(label)), attributes={'score': float(conf)})
+
+    return results
+```
+
+
+Object Detection scenario:
+
+``` python
+from datumaro.components.extractor import *
+
+# return a significant number of output boxes to make multiple runs
+# statistically correct and meaningful
+max_det = 1000
+
+def process_outputs(inputs, outputs):
+    # inputs = model input, array or images, shape = (N, C, H, W)
+    # outputs = model output, shape = (N, 1, K, 7)
+    # results = conversion result, [ [ Annotation, ... ], ... ]
+    results = []
+    for input, output in zip(inputs, outputs):
+        input_height, input_width = input.shape[:2]
+        detections = output[0]
+        image_results = []
+        for i, det in enumerate(detections):
+            label = int(det[1])
+            conf = float(det[2])
+            x = max(int(det[3] * input_width), 0)
+            y = max(int(det[4] * input_height), 0)
+            w = min(int(det[5] * input_width - x), input_width)
+            h = min(int(det[6] * input_height - y), input_height)
+            image_results.append(Bbox(x, y, w, h,
+                label=label, attributes={'score': conf} ))
+
+            results.append(image_results[:max_det])
+
+    return results
 ```
 
 ### Transform Project