Skip to content

Reading PASCAL VOC Format

Dmitri Soshnikov edited this page Dec 5, 2018 · 1 revision

Pascal VOC format is a format for providing object detection data, i.e. images with bounding boxes. Dataset contains two directories: Annotations and JPEGImages.

Since Annotations contains XML-files that define bounding boxes, we can read them using mPyPl XML reader with appropriate parameters:

data = mp.get_xmlstream_from_dir(annotation_dir,list_fields=['object'],flatten_fields=['bndbox','size'],skip_fields=['pose','source','path'])

In the newer version of mPyPl there is also special function that achieves this:

data = mp.get_pascal_annotations('HollywoodHeadsSmall')

This gives us data stream with the mdict objects with the following structure:

{'folder': 'HollywoodHeads',
 'filename': 'mov_012_063018.jpeg',
 'source': {'database': 'HollywoodHeads 2015 Database',
  'annotation': 'HollywoodHeads 2015',
  'image': 'WILLOW'},
 'size_width': '548',
 'size_height': '226',
 'size_depth': '3',
 'segmented': '0',
 'object': [{'name': 'head',
   'bndbox_xmin': '340',
   'bndbox_ymin': '20',
   'bndbox_xmax': '397',
   'bndbox_ymax': '81',
   'difficult': '0'},
  {'name': 'head',
   'bndbox_xmin': '80',
   'bndbox_ymin': '63',
   'bndbox_xmax': '119',
   'bndbox_ymax': '112',
   'difficult': '0'}]}

Note that all bounding boxes are within one object field, which is a nested list, and all bounding box coordinates are strings (so they should be accessed with as_float or as_int modifiers).

Here is an example of showing bounding boxes around the first 5 images in the dataset:

# This imprints bounding boxes on an image
# Assumes arg[0] is image, arg[1] is a list of objects with bbox descriptions
def imprint(arg):
    for x in arg[1]:
        cv2.rectangle(arg[0],(x.as_int('bndbox_xmin'),x.as_int('bndbox_ymin')),(x.as_int('bndbox_xmax'),x.as_int('bndbox_ymax')),(255,0,255),3)

(data 
  | take(5) 
  | mp.apply('filename','img',lambda x: cv2.imread(os.path.join(images_dir,x)))
  | mp.sapply('img',lambda x: cv2.cvtColor(x,cv2.COLOR_BGR2RGB)))
  | mp.apply(['img','object'],None,imprint) 
  | mp.select_field('img') 
  | pexec(show_images)

)

Complete notebook example can be found here.

Clone this wiki locally