Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the inference of the saved_model(s). Fixes #5847 #5848

Merged
merged 3 commits into from
Jun 23, 2021

Conversation

darth-vader-lg
Copy link
Contributor

This little commit fixes #5847 issue about the inference speed of the TensorFlow models.
All information are well explained in the issue #5847.

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>
@darth-vader-lg darth-vader-lg changed the title Speed up of the inference of saved_model(s). Speed up the inference of the saved_model(s). Jun 17, 2021
- Fixed the exception while fitting data with more than one input tensor. Followed the OnnxTransformer schema for the data view getters creation.

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>
@codecov
Copy link

codecov bot commented Jun 18, 2021

Codecov Report

Merging #5848 (e2e5ae6) into main (ff01708) will decrease coverage by 0.01%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##             main    #5848      +/-   ##
==========================================
- Coverage   68.35%   68.33%   -0.02%     
==========================================
  Files        1134     1134              
  Lines      241910   241932      +22     
  Branches    25289    25293       +4     
==========================================
- Hits       165347   165330      -17     
- Misses      69919    69954      +35     
- Partials     6644     6648       +4     
Flag Coverage Δ
Debug 68.33% <93.75%> (-0.02%) ⬇️
production 62.91% <93.75%> (-0.02%) ⬇️
test 89.27% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/Microsoft.ML.TensorFlow/TensorflowTransform.cs 84.92% <93.75%> (+0.21%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
...crosoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs 60.00% <0.00%> (-16.00%) ⬇️
...oML/Experiment/MetricsAgents/BinaryMetricsAgent.cs 74.35% <0.00%> (-7.70%) ⬇️
...AutoML/Experiment/Runners/CrossValSummaryRunner.cs 68.53% <0.00%> (-3.50%) ⬇️
test/Microsoft.ML.AutoML.Tests/AutoFitTests.cs 84.55% <0.00%> (-2.53%) ⬇️
src/Microsoft.ML.Core/Data/IHostEnvironment.cs 95.12% <0.00%> (-2.44%) ⬇️
src/Microsoft.ML.Data/DataView/CacheDataView.cs 83.96% <0.00%> (-0.68%) ⬇️
src/Microsoft.ML.Core/Utilities/Contracts.cs 45.27% <0.00%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.23% <0.00%> (-0.15%) ⬇️

@darth-vader-lg
Copy link
Contributor Author

Unluckily it wasn't as easy as expected and it wasn't completely solvable with just the first little changes.
The first commit didn't pass the tests in the cases of models with more than one input tensors.
So I followed the fully functioning logic of the OnnxTransform, to create the cache for the inferences, in the second commit.

protected override Delegate MakeGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, out Action disposer)
=> throw new NotImplementedException("This should never be called!");
private Delegate CreateGetter(DataViewRow input, int iinfo, Func<int, bool> activeOutput, OnnxRuntimeOutputCacher outputCacher)
{
Host.AssertValue(input);
var activeOutputColNames = _parent.Outputs.Where((x, i) => activeOutput(i)).ToArray();
if (_parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType is VectorDataViewType vectorType)
{
var elemRawType = vectorType.ItemType.RawType;
var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes);
if (vectorType.ItemType is TextDataViewType)
return MakeStringTensorGetter(input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher);
else
return Utils.MarshalInvoke(MakeTensorGetter<int>, elemRawType, input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher);
}
else
{
var type = _parent.Model.ModelInfo.OutputsInfo[_parent.MapDataViewColumnToOnnxOutputTensor(iinfo)].DataViewType.RawType;
var srcNamedValueGetters = GetNamedOnnxValueGetters(input, _inputColIndices, _inputOnnxTypes, _inputTensorShapes);
return Utils.MarshalInvoke(MakeObjectGetter<int>, type, input, iinfo, srcNamedValueGetters, activeOutputColNames, outputCacher);
}
}
public override Delegate[] CreateGetters(DataViewRow input, Func<int, bool> activeOutput, out Action disposer)
{
Contracts.Assert(input.Schema == InputSchema);
OnnxRuntimeOutputCacher outputCacher = new OnnxRuntimeOutputCacher();
int n = OutputColumns.Value.Length;
var result = new Delegate[n];
for (int i = 0; i < n; i++)
{
if (!activeOutput(i))
continue;
result[i] = CreateGetter(input, i, activeOutput, outputCacher);
}
disposer = () =>
{
outputCacher.Dispose();
};
return result;
}
.
Now it's all ok and drastically more faster than before 👍.

- The cached tensors are disposed at the end of inference operations.

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>
@darth-vader-lg
Copy link
Contributor Author

As mentioned in the issues #5847, these changes improve the tensorflow inference speed a lot; mostly if used for object detection.
The total improvement can be ~400% if joined with the PR #5857 / issue #5856.
I tested all with an intensive loop shown in the below code.

[TensorFlowFact]
public void TensorFlowTransformObjectDetectionTest()
{
    // Saved model
    var modelLocation = @"D:\ObjectDetection\carp\TensorFlow\exported-model-SSD-MobileNET-v2-320x320\saved_model";
    // Create the estimators pipe
    var pipe = 
        _mlContext.Transforms.LoadImages(
            inputColumnName: "ImagePath",
            outputColumnName: "Image",
            imageFolder: "")
        .Append(_mlContext.Transforms.ResizeImages(
            inputColumnName: "Image",
            outputColumnName: "ResizedImage",
            imageWidth: 300,
            imageHeight: 300,
            resizing: ImageResizingEstimator.ResizingKind.Fill))
        .Append(_mlContext.Transforms.ExtractPixels(
            inputColumnName: "ResizedImage",
            outputColumnName: "serving_default_input_tensor:0",
            interleavePixelColors: true,
            outputAsFloatArray: false))
        .Append(_mlContext.Model.LoadTensorFlowModel(modelLocation).ScoreTensorFlowModel(
            inputColumnNames: new[] { "serving_default_input_tensor:0" },
            outputColumnNames: new[]
            {
                "StatefulPartitionedCall:1" /* detection_boxes */,
                "StatefulPartitionedCall:2" /* detection_classes */,
                "StatefulPartitionedCall:4" /* detection_scores */
            }));

    // Collect all the path of the images in the test directory
    var imagesLocation = @"D:\ObjectDetection\carp\TensorFlow\images\test";
    var images =
        Directory.GetFiles(imagesLocation).Where(file => new[] { ".jpg", ".jfif" }
        .Any(ext => Path.GetExtension(file).ToLower() == ext))
        .Select(file => new { ImagePath = file })
        .ToArray();

    // Create the transformer
    var data = _mlContext.Data.LoadFromEnumerable(images.Take(0));
    var model = pipe.Fit(data);

    // Test n times the inference on the collected images
    for (int i = 0, nImage = 0; i < 1000; i++, nImage = (nImage + 1) % images.Length)
        model.Transform(_mlContext.Data.LoadFromEnumerable(new[] { images[nImage] })).Preview();
}

Here there are the results of the tests:

Without optimizations (current)

WithoutOptimization

With only the TF cache optimization in the Microsoft.ML.TensorFlowTransform.cs (Issue #5847 / PR #5848)

TensorFlowCacheOptimization

With TensorFlow and image raw access optimization (Issue #5847 / PR #5848 and Issue #5856 / PR #5857)

FullOptimization

@michaelgsharp
Copy link
Member

@darth-vader-lg thanks for submitting this! That is a pretty drastic speedup which is awesome.

I really appreciate you making the PR and the issue really detailed. It makes understanding all the changes so much easier.

Let me run a couple of tests on it but it looks good to me.

@darth-vader-lg darth-vader-lg changed the title Speed up the inference of the saved_model(s). Speed up the inference of the saved_model(s). Fixes #5847 Jun 23, 2021
@darth-vader-lg
Copy link
Contributor Author

@darth-vader-lg thanks for submitting this! That is a pretty drastic speedup which is awesome.

I really appreciate you making the PR and the issue really detailed. It makes understanding all the changes so much easier.

Let me run a couple of tests on it but it looks good to me.

@michaelgsharp, I was just glad to give to this awesome project my little contribution and repay for the big help it gave me on my work. 👍
Also being used to working for a lot of years on microcontrollers, sometimes having limited power resources, the code optimization has become unfortunately one of my professional distortions...

Anyway, have a good merge.

P.S. (🔔 promotional spot 🔔) If also MS. and .NET teams will need services from my company in the future, I’m always here. 😉

Kind Regards

Copy link
Member

@michaelgsharp michaelgsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelgsharp michaelgsharp merged commit 0fac0ba into dotnet:main Jun 23, 2021
@darth-vader-lg darth-vader-lg deleted the fix/speed-up-tf-inference branch June 26, 2021 08:29
darth-vader-lg added a commit to darth-vader-lg/ML-NET that referenced this pull request Jun 26, 2021
* remotes/official/main:
  Update lgbm to v2.3.1 (dotnet#5851)
  Speed-up bitmap operations on images. Fixes dotnet#5856 (dotnet#5857)
  Onnx recursion limit (dotnet#5840)
  Speed up the inference of the saved_model(s). Fixes dotnet#5847 (dotnet#5848)

Signed-off-by: darth-vader-lg <luigi.generale@gmail.com>
@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed-up TensorFlow models inference
2 participants