Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added image loading capabilities (as 3D uint8 Tensors) #18

Merged
merged 1 commit into from
Jul 3, 2018
Merged

Conversation

anshuman23
Copy link
Owner

@anshuman23 anshuman23 commented Jul 1, 2018

@josevalim, writing JPEG encoders and decoders (here we just need decoders though) is very tough and convoluted (and the original compression and decompression algorithms are all written in C). So I have opted for using libjpeg in C in our NIFs to ease our job of loading the image as a tensor. libjpeg is present with most Linux distributions and (I think) Mac OSX as well (otherwise can be installed using brew-- brew install libjpeg) so doesn't require pre-compilation.

With this PR, we can finally do image classification using Tensorflow models in Elixir. I have checked and seen that Inception V3 works. A demonstration is shown below.

The Inception V3 model can be downloaded here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz

After unzipping, see that it contains the graphdef .pb file (classify_image_graphdef.pb) which contains our graph definition, a test jpeg image that should identify/classify as a panda (cropped_panda.pb) and a few other files I will detail later.

Now for running this in Tensorflex first the graph is loaded:

iex(1)> {:ok, graph} = Tensorflex.read_graph("classify_image_graph_def.pb")
2018-07-02 00:27:40.532461: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2018-07-02 00:27:40.641913: W tensorflow/core/framework/op_def_util.cc:334] OpBatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
{:ok, #Reference<0.101807160.3449159683.146711>}

Then the cropped_panda image is loaded using the new load_image_as_tensor function:

iex(2)> {:ok, input_tensor} = Tensorflex.load_image_as_tensor("cropped_panda.jpg")
{:ok, #Reference<0.101807160.3449159683.146733>}

Then create the output tensor which will hold out output vector values. For the inception model, the output is received as a 1008x1 tensor, as there are 1008 classes in the model:

iex(3)> out_dims = Tensorflex.create_matrix(1,2,[[1008,1]])
#Reference<0.101807160.3449159683.146741>

iex(4)> {:ok, output_tensor} = Tensorflex.float32_tensor_alloc(out_dims)
{:ok, #Reference<0.101807160.3449159683.146748>}

Then the output results are read into a list called results. Also, the input operation in the Inception model is DecodeJpeg and the output operation is softmax:

iex(5)> results = Tensorflex.run_session(graph, input_tensor, output_tensor, "DecodeJpeg", "softmax")
[
  [1.059142014128156e-4, 2.8240500250831246e-4, 8.30648496048525e-5,
   1.2982363114133477e-4, 7.32232874725014e-5, 8.014426566660404e-5,
   6.63459359202534e-5, 0.003170756157487631, 7.931600703159347e-5,
   3.707312498590909e-5, 3.0997329304227605e-5, 1.4232713147066534e-4,
   1.0381334868725389e-4, 1.1057958181481808e-4, 1.4321311027742922e-4,
   1.203602587338537e-4, 1.3130248407833278e-4, 5.850398520124145e-5,
   2.641105093061924e-4, 3.1629020668333396e-5, 3.906813799403608e-5,
   2.8646905775531195e-5, 2.2863158665131778e-4, 1.2222197256051004e-4,
   5.956588938715868e-5, 5.421260357252322e-5, 5.996063555357978e-5,
   4.867801326327026e-4, 1.1005574924638495e-4, 2.3433618480339646e-4,
   1.3062104699201882e-4, 1.317620772169903e-4, 9.388553007738665e-5,
   7.076268957462162e-5, 4.281177825760096e-5, 1.6863139171618968e-4,
   9.093972039408982e-5, 2.611844101920724e-4, 2.7584232157096267e-4,
   5.157176201464608e-5, 2.144951868103817e-4, 1.3628098531626165e-4,
   8.007588621694595e-5, 1.7929042223840952e-4, 2.2831936075817794e-4,
   6.216531619429588e-5, 3.736453436431475e-5, 6.782123091397807e-5,
   1.1538144462974742e-4, ...]
]

Finally, we need to find which class has the maximum probability and identify it's label. Since results is a List of Lists, it's better to read in the nested list. Then we need to find the index of the element in the new list which as the maximum value. Therefore:

iex(6)> actual_results = Enum.max(results)
  [1.059142014128156e-4, 2.8240500250831246e-4, 8.30648496048525e-5,
   1.2982363114133477e-4, 7.32232874725014e-5, 8.014426566660404e-5,
   6.63459359202534e-5, 0.003170756157487631, 7.931600703159347e-5,
   3.707312498590909e-5, 3.0997329304227605e-5, 1.4232713147066534e-4,
   1.0381334868725389e-4, 1.1057958181481808e-4, 1.4321311027742922e-4,
   1.203602587338537e-4, 1.3130248407833278e-4, 5.850398520124145e-5,
   2.641105093061924e-4, 3.1629020668333396e-5, 3.906813799403608e-5,
   2.8646905775531195e-5, 2.2863158665131778e-4, 1.2222197256051004e-4,
   5.956588938715868e-5, 5.421260357252322e-5, 5.996063555357978e-5,
   4.867801326327026e-4, 1.1005574924638495e-4, 2.3433618480339646e-4,
   1.3062104699201882e-4, 1.317620772169903e-4, 9.388553007738665e-5,
   7.076268957462162e-5, 4.281177825760096e-5, 1.6863139171618968e-4,
   9.093972039408982e-5, 2.611844101920724e-4, 2.7584232157096267e-4,
   5.157176201464608e-5, 2.144951868103817e-4, 1.3628098531626165e-4,
   8.007588621694595e-5, 1.7929042223840952e-4, 2.2831936075817794e-4,
   6.216531619429588e-5, 3.736453436431475e-5, 6.782123091397807e-5,
   1.1538144462974742e-4, ...]

iex(7)> maxval_class = Enum.max(results) |> Enum.max()
0.8849328756332397

iex(8)> Enum.find_index(actual_results, fn(x) -> x == maxval_class end)
169

We can thus see that the class with the maximum probability predicted (0.8849328756332397) for the image is 169. We will now find what the 169 label corresponds to. For this we can look back into the unzipped Inception folder, where there is a file called imagenet_2012_challenge_label_map_proto.pbtxt. On opening this file, we can find the string class identifier for the 169 class index. This is n02510455 and is present on Line 1556 in the file. Finally, we need to match this string identifier to a set of identification labels by referring to the file imagenet_synset_to_human_label_map.txt file. Here we can see that corresponding to the string class n02510455 the human labels are giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (Line 3691 in the file).

Thus, we have correctly identified the animal in the image as a panda using Tensorflex!


const char *dot = strrchr(file, '.');
if(!dot || dot == file) return enif_make_badarg(env);
if(!((strcmp((dot + 1),"jpg") == 0) || (strcmp((dot + 1),"jpeg") == 0))) return enif_make_badarg(env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create Tensorflex.NIF (like we talked about elsewhere), we could move a bunch of this error validation to the Elixir code, which should make the code simpler and more idiomatic. :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree. I'm working on doing that next, and should hopefully be done soon. Thanks for the input! :)

@josevalim
Copy link
Contributor

This looks amazing, great job!

@anshuman23 anshuman23 merged commit 15e6193 into master Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants