Speed up the inference of the saved_model(s). Fixes #5847 #5848

darth-vader-lg · 2021-06-17T23:52:01Z

This little commit fixes #5847 issue about the inference speed of the TensorFlow models.
All information are well explained in the issue #5847.

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

- Fixed the exception while fitting data with more than one input tensor. Followed the OnnxTransformer schema for the data view getters creation. Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

codecov · 2021-06-18T18:59:07Z

Codecov Report

Merging #5848 (e2e5ae6) into main (ff01708) will decrease coverage by 0.01%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main    #5848      +/-   ##
==========================================
- Coverage   68.35%   68.33%   -0.02%     
==========================================
  Files        1134     1134              
  Lines      241910   241932      +22     
  Branches    25289    25293       +4     
==========================================
- Hits       165347   165330      -17     
- Misses      69919    69954      +35     
- Partials     6644     6648       +4

Flag	Coverage Δ
Debug	`68.33% <93.75%> (-0.02%)`	⬇️
production	`62.91% <93.75%> (-0.02%)`	⬇️
test	`89.27% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs	`84.92% <93.75%> (+0.21%)`	⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`79.48% <0.00%> (-20.52%)`	⬇️
...crosoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs	`60.00% <0.00%> (-16.00%)`	⬇️
...oML/Experiment/MetricsAgents/BinaryMetricsAgent.cs	`74.35% <0.00%> (-7.70%)`	⬇️
...AutoML/Experiment/Runners/CrossValSummaryRunner.cs	`68.53% <0.00%> (-3.50%)`	⬇️
test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs	`84.55% <0.00%> (-2.53%)`	⬇️
src/Microsoft.ML.Core/Data/IHostEnvironment.cs	`95.12% <0.00%> (-2.44%)`	⬇️
src/Microsoft.ML.Data/DataView/CacheDataView.cs	`83.96% <0.00%> (-0.68%)`	⬇️
src/Microsoft.ML.Core/Utilities/Contracts.cs	`45.27% <0.00%> (-0.21%)`	⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.23% <0.00%> (-0.15%)`	⬇️

darth-vader-lg · 2021-06-18T20:05:08Z

Unluckily it wasn't as easy as expected and it wasn't completely solvable with just the first little changes.
The first commit didn't pass the tests in the cases of models with more than one input tensors.
So I followed the fully functioning logic of the OnnxTransform, to create the cache for the inferences, in the second commit.

machinelearning/src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs

Lines 486 to 531 in ff01708

    
           protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, out Action disposer) 
        
               => throw new NotImplementedException("This should never be called!"); 
        
           private Delegate CreateGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, OnnxRuntimeOutputCacher outputCacher) 
        
           { 
        
               Host.AssertValue(input); 
        
               var activeOutputColNames = _parent.Outputs.Where((x, i) => activeOutput(i)).ToArray(); 
        
               if (_parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType is VectorDataViewType vectorType) 
        
               { 
        
                   var elemRawType = vectorType.ItemType.RawType; 
        
                   var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes); 
        
                   if (vectorType.ItemType is TextDataViewType) 
        
                       return MakeStringTensorGetter(input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher); 
        
                   else 
        
                       return Utils.MarshalInvoke(MakeTensorGetter<int>, elemRawType, input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher); 
        
               } 
        
               else 
        
               { 
        
                   var type = _parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType.RawType; 
        
                   var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes); 
        
                   return Utils.MarshalInvoke(MakeObjectGetter<int>, type, input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher); 
        
               } 
        
           } 
        
           public override Delegate[] CreateGetters(DataViewRow input, Func<int, bool> activeOutput, out Action disposer) 
        
           { 
        
               Contracts.Assert(input.Schema == InputSchema); 
        
               OnnxRuntimeOutputCacher outputCacher = new OnnxRuntimeOutputCacher(); 
        
               int n = OutputColumns.Value.Length; 
        
               var result = new Delegate[n]; 
        
               for (int i = 0; i < n; i++) 
        
               { 
        
                   if (!activeOutput(i)) 
        
                       continue; 
        
                   result[i] = CreateGetter(input, i, activeOutput, outputCacher); 
        
               } 
        
               disposer = () => 
        
               { 
        
                   outputCacher.Dispose(); 
        
               }; 
        
               return result; 
        
           }

.
Now it's all ok and drastically more faster than before 👍.

- The cached tensors are disposed at the end of inference operations. Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

darth-vader-lg · 2021-06-23T00:05:07Z

As mentioned in the issues #5847, these changes improve the tensorflow inference speed a lot; mostly if used for object detection.
The total improvement can be ~400% if joined with the PR #5857 / issue #5856.
I tested all with an intensive loop shown in the below code.

[TensorFlowFact]
public void TensorFlowTransformObjectDetectionTest()
{
    // Saved model
    var modelLocation = @"D:\ObjectDetection\carp\TensorFlow\exported-model-SSD-MobileNET-v2-320x320\saved_model";
    // Create the estimators pipe
    var pipe = 
        _mlContext.Transforms.LoadImages(
            inputColumnName: "ImagePath",
            outputColumnName: "Image",
            imageFolder: "")
        .Append(_mlContext.Transforms.ResizeImages(
            inputColumnName: "Image",
            outputColumnName: "ResizedImage",
            imageWidth: 300,
            imageHeight: 300,
            resizing: ImageResizingEstimator.ResizingKind.Fill))
        .Append(_mlContext.Transforms.ExtractPixels(
            inputColumnName: "ResizedImage",
            outputColumnName: "serving_default_input_tensor:0",
            interleavePixelColors: true,
            outputAsFloatArray: false))
        .Append(_mlContext.Model.LoadTensorFlowModel(modelLocation).ScoreTensorFlowModel(
            inputColumnNames: new[] { "serving_default_input_tensor:0" },
            outputColumnNames: new[]
            {
                "StatefulPartitionedCall:1" /* detection_boxes */,
                "StatefulPartitionedCall:2" /* detection_classes */,
                "StatefulPartitionedCall:4" /* detection_scores */
            }));

    // Collect all the path of the images in the test directory
    var imagesLocation = @"D:\ObjectDetection\carp\TensorFlow\images\test";
    var images =
        Directory.GetFiles(imagesLocation).Where(file => new[] { ".jpg", ".jfif" }
        .Any(ext => Path.GetExtension(file).ToLower() == ext))
        .Select(file => new { ImagePath = file })
        .ToArray();

    // Create the transformer
    var data = _mlContext.Data.LoadFromEnumerable(images.Take(0));
    var model = pipe.Fit(data);

    // Test n times the inference on the collected images
    for (int i = 0, nImage = 0; i < 1000; i++, nImage = (nImage + 1) % images.Length)
        model.Transform(_mlContext.Data.LoadFromEnumerable(new[] { images[nImage] })).Preview();
}

Here there are the results of the tests:

Without optimizations (current)

With only the TF cache optimization in the Microsoft.ML.TensorFlowTransform.cs (Issue #5847 / PR #5848)

With TensorFlow and image raw access optimization (Issue #5847 / PR #5848 and Issue #5856 / PR #5857)

michaelgsharp · 2021-06-23T03:06:01Z

@darth-vader-lg thanks for submitting this! That is a pretty drastic speedup which is awesome.

I really appreciate you making the PR and the issue really detailed. It makes understanding all the changes so much easier.

Let me run a couple of tests on it but it looks good to me.

darth-vader-lg · 2021-06-23T12:22:13Z

@darth-vader-lg thanks for submitting this! That is a pretty drastic speedup which is awesome.

I really appreciate you making the PR and the issue really detailed. It makes understanding all the changes so much easier.

Let me run a couple of tests on it but it looks good to me.

@michaelgsharp, I was just glad to give to this awesome project my little contribution and repay for the big help it gave me on my work. 👍
Also being used to working for a lot of years on microcontrollers, sometimes having limited power resources, the code optimization has become unfortunately one of my professional distortions...

Anyway, have a good merge.

P.S. (🔔 promotional spot 🔔) If also MS. and .NET teams will need services from my company in the future, I’m always here. 😉

Kind Regards

michaelgsharp

LGTM

* remotes/official/main: Update lgbm to v2.3.1 (dotnet#5851) Speed-up bitmap operations on images. Fixes dotnet#5856 (dotnet#5857) Onnx recursion limit (dotnet#5840) Speed up the inference of the saved_model(s). Fixes dotnet#5847 (dotnet#5848) Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

Speed up of the inference of saved_model(s).

48b517f

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

darth-vader-lg mentioned this pull request Jun 17, 2021

Speed-up TensorFlow models inference #5847

Closed

darth-vader-lg changed the title ~~Speed up of the inference of saved_model(s).~~ Speed up the inference of the saved_model(s). Jun 17, 2021

Fixed TensorFlowTransform fitting problem.

7af106e

- Fixed the exception while fitting data with more than one input tensor. Followed the OnnxTransformer schema for the data view getters creation. Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

Dispose of the cached tensors in the TensorFlowTransformer.

e2e5ae6

- The cached tensors are disposed at the end of inference operations. Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>

darth-vader-lg mentioned this pull request Jun 22, 2021

Speed-up bitmap operations on images. Fixes #5856 #5857

Merged

darth-vader-lg changed the title ~~Speed up the inference of the saved_model(s).~~ Speed up the inference of the saved_model(s). Fixes #5847 Jun 23, 2021

michaelgsharp approved these changes Jun 23, 2021

View reviewed changes

michaelgsharp merged commit 0fac0ba into dotnet:main Jun 23, 2021

darth-vader-lg deleted the fix/speed-up-tf-inference branch June 26, 2021 08:29

ghost locked as resolved and limited conversation to collaborators Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up the inference of the saved_model(s). Fixes #5847 #5848

Speed up the inference of the saved_model(s). Fixes #5847 #5848

Uh oh!

darth-vader-lg commented Jun 17, 2021

Uh oh!

codecov bot commented Jun 18, 2021 •

edited

Loading

Uh oh!

darth-vader-lg commented Jun 18, 2021

Uh oh!

darth-vader-lg commented Jun 23, 2021

Uh oh!

michaelgsharp commented Jun 23, 2021

Uh oh!

darth-vader-lg commented Jun 23, 2021

Uh oh!

michaelgsharp left a comment

Uh oh!

Uh oh!

Speed up the inference of the saved_model(s). Fixes #5847 #5848

Speed up the inference of the saved_model(s). Fixes #5847 #5848

Uh oh!

Conversation

darth-vader-lg commented Jun 17, 2021

Uh oh!

codecov bot commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

darth-vader-lg commented Jun 18, 2021

Uh oh!

darth-vader-lg commented Jun 23, 2021

Without optimizations (current)

With only the TF cache optimization in the Microsoft.ML.TensorFlowTransform.cs (Issue #5847 / PR #5848)

With TensorFlow and image raw access optimization (Issue #5847 / PR #5848 and Issue #5856 / PR #5857)

Uh oh!

michaelgsharp commented Jun 23, 2021

Uh oh!

darth-vader-lg commented Jun 23, 2021

Uh oh!

michaelgsharp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2021 •

edited

Loading