New object detector resnext50 sample implementation (mlcommons#1157)

* New object detector resnext50 sample implementation * Remove resnext50 model source code Remove model source code and adapt scripts to work with torchscript file Publish torchscript and onnx models in zenodo and add them to README * Integrate new detection model with openimages * Add script and instructions to export resnext model * Add calibration list and scripts to download them Co-authored-by: rameshchukka <rnaidu02@yahoo.com>
GATEOverflow · Jun 13, 2022 · 26951d7 · 26951d7
1 parent 77b22df
commit 26951d7
Show file tree

Hide file tree

Showing 13 changed files with 933 additions and 10 deletions.
diff --git a/calibration/openimages/openimages_cal_images_list.txt b/calibration/openimages/openimages_cal_images_list.txt
diff --git a/vision/classification_and_detection/README.md b/vision/classification_and_detection/README.md
@@ -28,7 +28,8 @@ You can find a short tutorial how to use this benchmark [here](https://github.co
 | ssd-resnet34 1200x1200 (depreciated since mlperf-v2.1)| pytorch | mAP 0.20 | coco resized to 1200x1200 | [from zenodo](https://zenodo.org/record/3236545/files/resnet34-ssd1200.pytorch) | [from mlperf](https://github.com/mlperf/inference/tree/master/others/cloud/single_stage_detector/pytorch) | fp32 | NCHW |
 | ssd-resnet34 1200x1200 (depreciated since mlperf-v2.1) | onnx | mAP 0.20 | coco resized to 1200x1200 | from zenodo [opset-8](https://zenodo.org/record/3228411/files/resnet34-ssd1200.onnx) | [from mlperf](https://github.com/mlperf/inference/tree/master/others/cloud/single_stage_detector) converted using the these [instructions](https://github.com/BowenBao/inference/tree/master/cloud/single_stage_detector/pytorch#6-onnx) | fp32 | Converted from pytorch model. |
 | ssd-resnet34 1200x1200 (depreciated since mlperf-v2.1) | onnx | mAP 0.20 | coco resized to 1200x1200 | from zenodo [opset-11](https://zenodo.org/record/4735664/files/ssd_resnet34_mAP_20.2.onnx) | [from zenodo](https://zenodo.org/record/3345892/files/tf_ssd_resnet34_22.1.zip) converted using [this script](https://github.com/mlcommons/inference/blob/master/vision/classification_and_detection/tools/convert-to-onnx.sh) | fp32 | Converted from the tensorflow model and uses the same interface as the tensorflow model. |
-| retinanet-resnext50 | pytorch | mAP 0.3755 | openimages | [from zenodo]([https://zenodo.org/record/3236545/files/resnet34-ssd1200.pytorch](https://zenodo.org/record/6605272#.YphVSKjMI2w)) | [from mlperf](???) | fp32 | ??? |
+| retinanet-resnext50 800x800 | pytorch | mAP 0.375 | OpenImages mlperf validation set resized to 800x800 | [from zenodo](https://zenodo.org/record/6617981/files/resnext50_32x4d_fpn.pth) | from mlperf. [Source Code](https://github.com/mlcommons/training/tree/master/single_stage_detector/ssd/model) and [Weights](https://zenodo.org/record/6605272) | fp32 | NCHW |
+| retinanet-resnext50 800x800 | onnx | mAP 0.375 | OpenImages mlperf validation set resized to 800x800 | [from zenodo](https://zenodo.org/record/6617879/files/resnext50_32x4d_fpn.onnx) | from mlperf converted from the pytorch model. [Source Code](https://github.com/mlcommons/training/tree/master/single_stage_detector/ssd/model) and [Weights](https://zenodo.org/record/6605272) | fp32 | NCHW |
 
 ## Disclaimer
 This benchmark app is a reference implementation that is not meant to be the fastest implementation possible.
@@ -70,6 +71,8 @@ python tools/accuracy-coco.py --mlperf-accuracy-file mlperf_log_accuracy.json --
 | imagenet2012 (validation) | http://image-net.org/challenges/LSVRC/2012/ | 
 | coco (validation) | http://images.cocodataset.org/zips/val2017.zip | 
 | coco (annotations) | http://images.cocodataset.org/annotations/annotations_trainval2017.zip |
+| openimages | We provide a [script](tools/openimages_mlperf.sh) to download the openimages mlperf validation set. You can download the dataset by going into the tools folder and running `./openimages_mlperf -d <DOWNLOAD_PATH>` |
+| openimages (calibration) | We also provide a [script](tools/openimages_calibration_mlperf.sh) to download the openimages mlperf validation set. You can download the dataset by going into the tools folder and running `./openimages_calibration_mlperf -d <DOWNLOAD_PATH>`. This requires you to have [calibration list](../../calibration/openimages/openimages_cal_images_list.txt) |
 
 ### Using Collective Knowledge (CK)
 

diff --git a/vision/classification_and_detection/python/backend_pytorch.py b/vision/classification_and_detection/python/backend_pytorch.py
diff --git a/vision/classification_and_detection/python/backend_pytorch_native.py b/vision/classification_and_detection/python/backend_pytorch_native.py
@@ -3,16 +3,17 @@
 """
 # pylint: disable=unused-argument,missing-docstring
 import torch  # currently supports pytorch1.0
+import torchvision
 import backend
 
 
-
 class BackendPytorchNative(backend.Backend):
     def __init__(self):
         super(BackendPytorchNative, self).__init__()
         self.sess = None
         self.model = None
         self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
+
     def version(self):
         return torch.__version__
 
@@ -23,7 +24,7 @@ def image_format(self):
         return "NCHW"
 
     def load(self, model_path, inputs=None, outputs=None):
-        self.model = torch.load(model_path,map_location=lambda storage, loc: storage)
+        self.model = torch.load(model_path, map_location=lambda storage, loc: storage)
         self.model.eval()
         # find inputs from the model if not passed in by config
         if inputs:
@@ -48,10 +49,9 @@ def load(self, model_path, inputs=None, outputs=None):
         self.model = self.model.to(self.device)
         return self
 
-
     def predict(self, feed):
-        key=[key for key in feed.keys()][0]    
+        key = [key for key in feed.keys()][0]
         feed[key] = torch.tensor(feed[key]).float().to(self.device)
         with torch.no_grad():
-            output = self.model(feed[key])    
+            output = self.model(feed[key])
         return output
diff --git a/vision/classification_and_detection/python/coco.py b/vision/classification_and_detection/python/coco.py
@@ -341,3 +341,4 @@ def __call__(self, results, ids, expected=None, result_dict=None):
                                               float(detection_class)])
                 self.total += 1
         return processed_results
+
diff --git a/vision/classification_and_detection/python/dataset.py b/vision/classification_and_detection/python/dataset.py
@@ -271,3 +271,12 @@ def pre_process_coco_resnet34_tf(img, dims=None, need_transpose=False):
         img = img.transpose([2, 0, 1])
 
     return img
+
+
+def pre_process_openimages_resnext50(img, dims=None, need_transpose=False):
+    img = maybe_resize(img, dims)
+    img /= 255.
+    # transpose if needed
+    if need_transpose:
+        img = img.transpose([2, 0, 1])
+    return img
diff --git a/vision/classification_and_detection/python/main.py b/vision/classification_and_detection/python/main.py
@@ -23,6 +23,7 @@
 import dataset
 import imagenet
 import coco
+import openimages
 
 logging.basicConfig(level=logging.INFO)
 log = logging.getLogger("main")
@@ -48,7 +49,19 @@
          {"image_size": [300, 300, 3]}),
     "coco-300-pt":
         (coco.Coco, dataset.pre_process_coco_pt_mobilenet, coco.PostProcessCocoPt(False,0.3),
-         {"image_size": [300, 300, 3]}),         
+         {"image_size": [300, 300, 3]}),
+    "openimages-300-resnext":
+        (openimages.OpenImages, dataset.pre_process_openimages_resnext50, openimages.PostProcessOpenImagesResnext(False,0.05,300,300), 
+        {"image_size": [300, 300, 3]}),
+    "openimages-800-resnext":
+        (openimages.OpenImages, dataset.pre_process_openimages_resnext50, openimages.PostProcessOpenImagesResnext(False,0.05,800,800), 
+        {"image_size": [800, 800, 3]}),
+    "openimages-1200-resnext":
+        (openimages.OpenImages, dataset.pre_process_openimages_resnext50, openimages.PostProcessOpenImagesResnext(False,0.05,1200,1200), 
+        {"image_size": [1200, 1200, 3]}),
+    "openimages-800-resnext-onnx":
+        (openimages.OpenImages, dataset.pre_process_openimages_resnext50, openimages.PostProcessOpenImagesResnext(False,0.05,800,800,False), 
+        {"image_size": [800, 800, 3]}),       
     "coco-1200":
         (coco.Coco, dataset.pre_process_coco_resnet34, coco.PostProcessCoco(),
          {"image_size": [1200, 1200, 3]}),
@@ -167,6 +180,23 @@
         "data-format": "NHWC",
         "model-name": "ssd-resnet34",
     },
+
+    # ssd-resnext50
+    "ssd-resnext50-pytorch": {
+        "inputs": "image",
+        "outputs": "boxes,labels,scores",
+        "dataset": "openimages-800-resnext",
+        "backend": "pytorch-native",
+        "model-name": "ssd-resnext50",
+    },
+    "ssd-resnext50-onnxruntime": {
+        "inputs": "images",
+        "outputs": "boxes,labels,scores",
+        "dataset": "openimages-800-resnext-onnx",
+        "backend": "onnxruntime",
+        "model-name": "ssd-resnext50",
+        "max-batchsize": 1
+    },
 }
 
 SCENARIO_MAP = {

diff --git a/vision/classification_and_detection/python/openimages.py b/vision/classification_and_detection/python/openimages.py
@@ -116,4 +116,179 @@ def get_item(self, nr):
 
     def get_item_loc(self, nr):
         src = os.path.join(self.data_path, self.image_list[nr])
-        return src
+        return src
+
+
+class PostProcessOpenImages:
+    """
+    Post processing for open images dataset. Annotations should
+    be exported into coco format.
+    """
+    def __init__(self):
+        self.results = []
+        self.good = 0
+        self.total = 0
+        self.content_ids = []
+        self.use_inv_map = False
+
+    def add_results(self, results):
+        self.results.extend(results)
+
+    def __call__(self, results, ids, expected=None, result_dict=None, ):
+        # results come as:
+        #   tensorflow, ssd-mobilenet: num_detections,detection_boxes,detection_scores,detection_classes
+        processed_results = []
+        # batch size
+        bs = len(results[0])
+        for idx in range(0, bs):
+            # keep the content_id from loadgen to handle content_id's without results
+            self.content_ids.append(ids[idx])
+            processed_results.append([])
+            detection_num = int(results[0][idx])
+            detection_boxes = results[1][idx]
+            detection_classes = results[3][idx]
+            expected_classes = expected[idx][0]
+            for detection in range(0, detection_num):
+                detection_class = int(detection_classes[detection])
+                if detection_class in expected_classes:
+                    self.good += 1
+                box = detection_boxes[detection]
+                processed_results[idx].append([float(ids[idx]),
+                                              box[0], box[1], box[2], box[3],
+                                              results[2][idx][detection],
+                                              float(detection_class)])
+                self.total += 1
+        return processed_results
+
+    def start(self):
+        self.results = []
+        self.good = 0
+        self.total = 0
+
+    def finalize(self, result_dict, ds=None, output_dir=None):
+        result_dict["good"] += self.good
+        result_dict["total"] += self.total
+
+        if self.use_inv_map:
+            # for pytorch
+            label_map = {}
+            with open(ds.annotation_file) as fin:
+                annotations = json.load(fin)
+            for cnt, cat in enumerate(annotations["categories"]):
+                label_map[cat["id"]] = cnt + 1
+            inv_map = {v:k for k,v in label_map.items()}
+
+        detections = []
+        image_indices = []
+        for batch in range(0, len(self.results)):
+            image_indices.append(self.content_ids[batch])
+            for idx in range(0, len(self.results[batch])):
+                detection = self.results[batch][idx]
+                # this is the index of the coco image
+                image_idx = int(detection[0])
+                if image_idx != self.content_ids[batch]:
+                    # working with the coco index/id is error prone - extra check to make sure it is consistent
+                    log.error("image_idx missmatch, lg={} / result={}".format(image_idx, self.content_ids[batch]))
+                # map the index to the coco image id
+                detection[0] = ds.image_ids[image_idx]
+                height, width = ds.image_sizes[image_idx]
+                # box comes from model as: ymin, xmin, ymax, xmax
+                ymin = detection[1] * height
+                xmin = detection[2] * width
+                ymax = detection[3] * height
+                xmax = detection[4] * width
+                # pycoco wants {imageID,x1,y1,w,h,score,class}
+                detection[1] = xmin
+                detection[2] = ymin
+                detection[3] = xmax - xmin
+                detection[4] = ymax - ymin
+                if self.use_inv_map:
+                    cat_id = inv_map.get(int(detection[6]), -1)
+                    if cat_id == -1:
+                        # FIXME:
+                        log.info("finalize can't map category {}".format(int(detection[6])))
+                    detection[6] =  cat_id
+                detections.append(np.array(detection))
+
+        # map indices to coco image id's
+        image_ids = [ds.image_ids[i]  for i in image_indices]
+        self.results = []
+        cocoGt = pycoco.COCO(ds.annotation_file)
+        cocoDt = cocoGt.loadRes(np.array(detections))
+        cocoEval = COCOeval(cocoGt, cocoDt, iouType='bbox')
+        cocoEval.params.imgIds = image_ids
+        cocoEval.evaluate()
+        cocoEval.accumulate()
+        cocoEval.summarize()
+        result_dict["mAP"] = cocoEval.stats[0]
+
+
+class PostProcessOpenImagesResnext(PostProcessOpenImages):
+    """
+    Post processing required by ssd-resnext50 / pytorch & onnx
+    """
+    def __init__(self, use_inv_map, score_threshold, height, width, dict_format=True):
+        """
+        Args:
+            height (int): Height of the input image
+            width (int): Width of the input image
+            dict_format (bool): True if the model outputs a dictionary.
+                        False otherwise. Defaults to True.
+        """
+        super().__init__()
+        self.use_inv_map = use_inv_map
+        self.score_threshold = score_threshold
+        self.height = height
+        self.width = width
+        self.dict_format = dict_format
+
+    def __call__(self, results, ids, expected=None, result_dict=None):
+        if self.dict_format:
+            # If the output of the model is in dictionary format. This happens
+            # for the model ssd-resnext50-pytorch
+            bboxes_ = [e['boxes'] for e in results]
+            labels_ = [e['labels'] for e in results]
+            scores_ = [e['scores'] for e in results]
+            results = [bboxes_, labels_, scores_]
+        else:
+            bboxes_ = [results[0]]
+            labels_ = [results[1]]
+            scores_ = [results[2]]
+            results = [bboxes_, labels_, scores_]
+
+        processed_results = []
+        content_ids = []
+        # batch size
+        bs = len(results[0])
+        for idx in range(0, bs):
+            content_ids.append(ids[idx])
+            processed_results.append([])
+            detection_boxes = results[0][idx]
+            detection_classes = results[1][idx]
+            expected_classes = expected[idx][0]
+            scores = results[2][idx]
+            for detection in range(0, len(scores)):
+                if scores[detection] < self.score_threshold:
+                    break
+                detection_class = int(detection_classes[detection])
+                if detection_class in expected_classes:
+                    self.good += 1
+                box = detection_boxes[detection]
+                # box comes from model as: xmin, ymin, xmax, ymax
+                # box comes with dimentions in the range of [0, height] 
+                # and [0, width] respectively. It is necesary to scale 
+                # them in the range [0, 1]
+                processed_results[idx].append(
+                    [
+                        float(ids[idx]),
+                        box[1] / self.height,
+                        box[0] / self.width,
+                        box[3] / self.height,
+                        box[2] / self.width,
+                        scores[detection],
+                        float(detection_class),
+                    ]
+                )
+                self.total += 1
+        self.content_ids.extend(content_ids)
+        return processed_results
diff --git a/vision/classification_and_detection/run_common.sh b/vision/classification_and_detection/run_common.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 
 if [ $# -lt 1 ]; then
-    echo "usage: $0 tf|onnxruntime|pytorch|tflite [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34] [cpu|gpu]"
+    echo "usage: $0 tf|onnxruntime|pytorch|tflite [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34|ssd-resnext50] [cpu|gpu]"
     exit 1
 fi
 if [ "x$DATA_DIR" == "x" ]; then
@@ -21,7 +21,7 @@ for i in $* ; do
        tf|onnxruntime|tflite|pytorch) backend=$i; shift;;
        cpu|gpu) device=$i; shift;;
        gpu) device=gpu; shift;;
-       resnet50|mobilenet|ssd-mobilenet|ssd-resnet34|ssd-resnet34-tf) model=$i; shift;;
+       resnet50|mobilenet|ssd-mobilenet|ssd-resnet34|ssd-resnet34-tf|ssd-resnext50) model=$i; shift;;
     esac
 done
 
@@ -77,6 +77,10 @@ if [ $name == "ssd-resnet34-tf-onnxruntime" ] ; then
     model_path="$MODEL_DIR/ssd_resnet34_mAP_20.2.onnx"
     profile=ssd-resnet34-onnxruntime-tf
 fi
+if [ $name == "ssd-resnext50-onnxruntime" ] ; then
+    model_path="$MODEL_DIR/resnext50_32x4d_fpn.onnx"
+    profile=ssd-resnext50-onnxruntime
+fi
 
 #
 # pytorch
@@ -95,6 +99,10 @@ if [ $name == "ssd-resnet34-pytorch" ] ; then
     model_path="$MODEL_DIR/resnet34-ssd1200.pytorch"
     profile=ssd-resnet34-pytorch
 fi
+if [ $name == "ssd-resnext50-pytorch" ] ; then
+    model_path="$MODEL_DIR/resnext50_32x4d_fpn.pth"
+    profile=ssd-resnext50-pytorch
+fi
 
 
 #
Original file line number	Diff line number	Diff line change
Expand Up		@@ -341,3 +341,4 @@ def __call__(self, results, ids, expected=None, result_dict=None):
		float(detection_class)])
		self.total += 1
		return processed_results