Added image loading capabilities (as 3D uint8 Tensors) #18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@josevalim, writing JPEG encoders and decoders (here we just need decoders though) is very tough and convoluted (and the original compression and decompression algorithms are all written in C). So I have opted for using
libjpeg
in C in our NIFs to ease our job of loading the image as a tensor.libjpeg
is present with most Linux distributions and (I think) Mac OSX as well (otherwise can be installed using brew--brew install libjpeg
) so doesn't require pre-compilation.With this PR, we can finally do image classification using Tensorflow models in Elixir. I have checked and seen that Inception V3 works. A demonstration is shown below.
The Inception V3 model can be downloaded here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
After unzipping, see that it contains the graphdef .pb file (
classify_image_graphdef.pb
) which contains our graph definition, a test jpeg image that should identify/classify as a panda (cropped_panda.pb
) and a few other files I will detail later.Now for running this in Tensorflex first the graph is loaded:
Then the cropped_panda image is loaded using the new
load_image_as_tensor
function:Then create the output tensor which will hold out output vector values. For the inception model, the output is received as a 1008x1 tensor, as there are 1008 classes in the model:
Then the output results are read into a list called
results
. Also, the input operation in the Inception model isDecodeJpeg
and the output operation issoftmax
:Finally, we need to find which class has the maximum probability and identify it's label. Since results is a List of Lists, it's better to read in the nested list. Then we need to find the index of the element in the new list which as the maximum value. Therefore:
We can thus see that the class with the maximum probability predicted (0.8849328756332397) for the image is 169. We will now find what the 169 label corresponds to. For this we can look back into the unzipped Inception folder, where there is a file called
imagenet_2012_challenge_label_map_proto.pbtxt
. On opening this file, we can find the string class identifier for the169
class index. This isn02510455
and is present on Line 1556 in the file. Finally, we need to match this string identifier to a set of identification labels by referring to the fileimagenet_synset_to_human_label_map.txt
file. Here we can see that corresponding to the string classn02510455
the human labels aregiant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca
(Line 3691 in the file).Thus, we have correctly identified the animal in the image as a panda using Tensorflex!