usage #2

ahundt · 2017-04-04T19:48:48Z

It seems the coco script requires files that don't exist in the repository and for which there are no generators?

in load_data():

     img_txt = os.path.join(data_dir, data_type, 'images.txt')
    lbl_txt = os.path.join(data_dir, data_type, 'labels.txt')

Also, note that if the class values are serialized into a single image then data will be lost, categorical classes are most appropriate since a single image can be multiple classes.

The text was updated successfully, but these errors were encountered:

PavlosMelissinos · 2017-04-04T23:40:01Z

You're right. Some parts are missing. This is definitely work in progress; I mean there isn't even a proper readme! (not for long though)

This code used to be part of another, private project, from which I had to transfer each piece separately. Apparently I forgot one function but I just pushed a new commit so it should be fine now. You can pull from the updated master branch if you want.

Data preparation

I'm using the following directory structure:

├── data
│   └── mscoco
│       ├── annotations -> /home/pmelissi/Data/MS-COCO/annotations
│       │   ├── instances_train2014.json
│       │   ├── instances_val2014.json
│       ├── train2014
│       │   ├── images -> /home/pmelissi/Data/MS-COCO/train2014
│       │   ├── images.txt
│       │   ├── labels
│       │   └── labels.txt
│       └── val2014
│           ├── images -> /home/pmelissi/Data/MS-COCO/val2014
│           ├── images.txt
│           ├── labels
│           └── labels.txt

In order to be able to train/evaluate on mscoco, you have to copy the coco structure (annotations, train2014 and val2014 folders) to data/mscoco. train2014/val2014 should contain two directories, one called "images" and another called "labels". Then you need to run coco_extract_labels.py like this:

python src/data/coco_extract_labels.py data/mscoco train2014
python src/data/coco_extract_labels.py data/mscoco val2014

After the images and labels are are in their respective directories, you will have to create the txt files. These are just links to the actual jpg and png files (labels need to be png because jpeg is lossy). Anyway, you can just run the following to create the files (assuming you have some flavor of unix as an OS):

find images -name *.jpg > images.txt
find labels -name *.png > labels.txt

Alternatively, you can modify the data_loader to directly accept the output of yield_image (this should be more time-consuming in the long term).

I will try to update the documentation tomorrow.

The full story

In the original code I used a separate preprocessing script to convert the annotations in instances_train2014.json and instances_val2014.json to rgb labels.

When I noticed that a single pixel can belong to multiple classes, I decided to simplify the problem (and be able to quickly visualize the labels), so I converted the one hot representations into a hxw image, where each pixel contained a single class id, knowing that this would lead to some information loss. It was a temporary solution that stuck because at the moment I needed to get to the results quickly and I just didn't happen to think of a better one since.

PavlosMelissinos · 2017-04-08T01:16:28Z

I will try to update the documentation tomorrow.

The README page has been updated! Some content is still missing which I will add next (e.g. the preparation of the dataset needs to be streamlined).

I also created a new branch to work on the opencv removal thingy that we discussed (see #3 for more).

ahundt · 2017-04-08T19:09:44Z

Cool! I've utilized some of the code for my own coco pipeline, sorry that I've copy pasted it out but I've updated it to generate a one hot encoding and save out npy files that can be read in by my branch of keras-fcn, which has a bit more complete functionality than your pipeline (aside from coco) and already doesn't use opencv.

With a little cleanup, perhaps like the class you suggested, and after some more testing what I've put in there might be good for submitting to keras-contrib.

PavlosMelissinos · 2017-04-09T16:28:59Z

Sure, that sounds great.

PavlosMelissinos · 2017-10-11T12:18:48Z

I've detached the part of the code that loads the images and labels from disk, so this issue does not apply to the current codebase anymore. The loading from disk task is not exactly solved yet but the code is organized in a much better way.

You might want to check out the updated classes in datasets.py.

ahundt mentioned this issue Apr 4, 2017

MIT license & submit to Keras? #1

Open

PavlosMelissinos closed this as completed Oct 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage #2

usage #2

ahundt commented Apr 4, 2017 •

edited

Loading

PavlosMelissinos commented Apr 4, 2017 •

edited

Loading

PavlosMelissinos commented Apr 8, 2017

ahundt commented Apr 8, 2017

PavlosMelissinos commented Apr 9, 2017

PavlosMelissinos commented Oct 11, 2017

usage #2

usage #2

Comments

ahundt commented Apr 4, 2017 • edited Loading

PavlosMelissinos commented Apr 4, 2017 • edited Loading

Data preparation

The full story

PavlosMelissinos commented Apr 8, 2017

ahundt commented Apr 8, 2017

PavlosMelissinos commented Apr 9, 2017

PavlosMelissinos commented Oct 11, 2017

ahundt commented Apr 4, 2017 •

edited

Loading

PavlosMelissinos commented Apr 4, 2017 •

edited

Loading