Added image loading capabilities (as 3D uint8 Tensors) #18

anshuman23 · 2018-07-01T19:54:25Z

@josevalim, writing JPEG encoders and decoders (here we just need decoders though) is very tough and convoluted (and the original compression and decompression algorithms are all written in C). So I have opted for using libjpeg in C in our NIFs to ease our job of loading the image as a tensor. libjpeg is present with most Linux distributions and (I think) Mac OSX as well (otherwise can be installed using brew-- brew install libjpeg) so doesn't require pre-compilation.

With this PR, we can finally do image classification using Tensorflow models in Elixir. I have checked and seen that Inception V3 works. A demonstration is shown below.

The Inception V3 model can be downloaded here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz

After unzipping, see that it contains the graphdef .pb file (classify_image_graphdef.pb) which contains our graph definition, a test jpeg image that should identify/classify as a panda (cropped_panda.pb) and a few other files I will detail later.

Now for running this in Tensorflex first the graph is loaded:

iex(1)> {:ok, graph} = Tensorflex.read_graph("classify_image_graph_def.pb")
2018-07-02 00:27:40.532461: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-07-02 00:27:40.641913: W tensorflow/core/framework/op_def_util.cc:334] OpBatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
{:ok, #Reference<0.101807160.3449159683.146711>}

Then the cropped_panda image is loaded using the new load_image_as_tensor function:

iex(2)> {:ok, input_tensor} = Tensorflex.load_image_as_tensor("cropped_panda.jpg")
{:ok, #Reference<0.101807160.3449159683.146733>}

Then create the output tensor which will hold out output vector values. For the inception model, the output is received as a 1008x1 tensor, as there are 1008 classes in the model:

iex(3)> out_dims = Tensorflex.create_matrix(1,2,[[1008,1]])
#Reference<0.101807160.3449159683.146741>

iex(4)> {:ok, output_tensor} = Tensorflex.float32_tensor_alloc(out_dims)
{:ok, #Reference<0.101807160.3449159683.146748>}

Then the output results are read into a list called results. Also, the input operation in the Inception model is DecodeJpeg and the output operation is softmax:

iex(5)> results = Tensorflex.run_session(graph, input_tensor, output_tensor, "DecodeJpeg", "softmax")
[
  [1.059142014128156e-4, 2.8240500250831246e-4, 8.30648496048525e-5,
   1.2982363114133477e-4, 7.32232874725014e-5, 8.014426566660404e-5,
   6.63459359202534e-5, 0.003170756157487631, 7.931600703159347e-5,
   3.707312498590909e-5, 3.0997329304227605e-5, 1.4232713147066534e-4,
   1.0381334868725389e-4, 1.1057958181481808e-4, 1.4321311027742922e-4,
   1.203602587338537e-4, 1.3130248407833278e-4, 5.850398520124145e-5,
   2.641105093061924e-4, 3.1629020668333396e-5, 3.906813799403608e-5,
   2.8646905775531195e-5, 2.2863158665131778e-4, 1.2222197256051004e-4,
   5.956588938715868e-5, 5.421260357252322e-5, 5.996063555357978e-5,
   4.867801326327026e-4, 1.1005574924638495e-4, 2.3433618480339646e-4,
   1.3062104699201882e-4, 1.317620772169903e-4, 9.388553007738665e-5,
   7.076268957462162e-5, 4.281177825760096e-5, 1.6863139171618968e-4,
   9.093972039408982e-5, 2.611844101920724e-4, 2.7584232157096267e-4,
   5.157176201464608e-5, 2.144951868103817e-4, 1.3628098531626165e-4,
   8.007588621694595e-5, 1.7929042223840952e-4, 2.2831936075817794e-4,
   6.216531619429588e-5, 3.736453436431475e-5, 6.782123091397807e-5,
   1.1538144462974742e-4, ...]
]

Finally, we need to find which class has the maximum probability and identify it's label. Since results is a List of Lists, it's better to read in the nested list. Then we need to find the index of the element in the new list which as the maximum value. Therefore:

iex(6)> actual_results = Enum.max(results)
  [1.059142014128156e-4, 2.8240500250831246e-4, 8.30648496048525e-5,
   1.2982363114133477e-4, 7.32232874725014e-5, 8.014426566660404e-5,
   6.63459359202534e-5, 0.003170756157487631, 7.931600703159347e-5,
   3.707312498590909e-5, 3.0997329304227605e-5, 1.4232713147066534e-4,
   1.0381334868725389e-4, 1.1057958181481808e-4, 1.4321311027742922e-4,
   1.203602587338537e-4, 1.3130248407833278e-4, 5.850398520124145e-5,
   2.641105093061924e-4, 3.1629020668333396e-5, 3.906813799403608e-5,
   2.8646905775531195e-5, 2.2863158665131778e-4, 1.2222197256051004e-4,
   5.956588938715868e-5, 5.421260357252322e-5, 5.996063555357978e-5,
   4.867801326327026e-4, 1.1005574924638495e-4, 2.3433618480339646e-4,
   1.3062104699201882e-4, 1.317620772169903e-4, 9.388553007738665e-5,
   7.076268957462162e-5, 4.281177825760096e-5, 1.6863139171618968e-4,
   9.093972039408982e-5, 2.611844101920724e-4, 2.7584232157096267e-4,
   5.157176201464608e-5, 2.144951868103817e-4, 1.3628098531626165e-4,
   8.007588621694595e-5, 1.7929042223840952e-4, 2.2831936075817794e-4,
   6.216531619429588e-5, 3.736453436431475e-5, 6.782123091397807e-5,
   1.1538144462974742e-4, ...]

iex(7)> maxval_class = Enum.max(results) |> Enum.max()
0.8849328756332397

iex(8)> Enum.find_index(actual_results, fn(x) -> x == maxval_class end)
169

We can thus see that the class with the maximum probability predicted (0.8849328756332397) for the image is 169. We will now find what the 169 label corresponds to. For this we can look back into the unzipped Inception folder, where there is a file called imagenet_2012_challenge_label_map_proto.pbtxt. On opening this file, we can find the string class identifier for the 169 class index. This is n02510455 and is present on Line 1556 in the file. Finally, we need to match this string identifier to a set of identification labels by referring to the file imagenet_synset_to_human_label_map.txt file. Here we can see that corresponding to the string class n02510455 the human labels are giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (Line 3691 in the file).

Thus, we have correctly identified the animal in the image as a panda using Tensorflex!

josevalim · 2018-07-02T22:59:41Z

c_src/Tensorflex.c

+
+  const char *dot = strrchr(file, '.');
+  if(!dot || dot == file) return enif_make_badarg(env);
+  if(!((strcmp((dot + 1),"jpg") == 0) || (strcmp((dot + 1),"jpeg") == 0))) return enif_make_badarg(env);


If you create Tensorflex.NIF (like we talked about elsewhere), we could move a bunch of this error validation to the Elixir code, which should make the code simpler and more idiomatic. :)

Yes, I completely agree. I'm working on doing that next, and should hopefully be done soon. Thanks for the input! :)

josevalim · 2018-07-02T22:59:55Z

This looks amazing, great job!

Added image loading capabilities

2eb03f1

josevalim reviewed Jul 2, 2018

View reviewed changes

anshuman23 merged commit 15e6193 into master Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added image loading capabilities (as 3D uint8 Tensors) #18

Added image loading capabilities (as 3D uint8 Tensors) #18

anshuman23 commented Jul 1, 2018 •

edited

Loading

josevalim Jul 2, 2018

anshuman23 Jul 3, 2018

josevalim commented Jul 2, 2018

Added image loading capabilities (as 3D uint8 Tensors) #18

Added image loading capabilities (as 3D uint8 Tensors) #18

Conversation

anshuman23 commented Jul 1, 2018 • edited Loading

josevalim Jul 2, 2018

Choose a reason for hiding this comment

anshuman23 Jul 3, 2018

Choose a reason for hiding this comment

josevalim commented Jul 2, 2018

anshuman23 commented Jul 1, 2018 •

edited

Loading