Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lce benchmark and interpreter flags #717

Merged
merged 26 commits into from
Mar 10, 2022
Merged

Conversation

simonmaurer
Copy link
Contributor

@simonmaurer simonmaurer commented Mar 7, 2022

In general, this PR allows registering of the different kernel implementations of LceBconv2D (lce_register_ops.h) by additional parameters in LceInterpreter and flags in lce_benchmark_model binary

What do these changes do?

lce_benchmark_model

  • added two cmdline flags use_reference_bconv/use_indirect_bgemm as global variables to register respective kernel implementations
  • instead of manually parsing the flags in lce_benchmark_main.cc the builtin Flags of TF's BenchmarkModel are used. Thus by adding LceBenchmarkTfLiteModel as a child class of BenchmarkTfLiteModel the global variables are passed in as reference upon object instantiation and set after calling overidden Run() method

LceInterpreter

  • added parameter use_indirect_bgemm to register optimized indirect BGEMM kernel using Register_BCONV_2D_OPT_INDIRECT_BGEMM() in lce_register_ops.h
  • added parameter use_xnnpack in interpreter_wrapper_lite.cc to apply BuiltinOpResolver when true, otherwise BuiltinOpResolverWithoutDefaultDelegates

Related issue number

#711, #713

- added two cmdline flags use_reference_bconv/use_indirect_bgemm as global variables in lce_benchmark_main.cc to register respective kernels
- implemented LceBenchmarkTfLiteModel as a child class of BenchmarkTfLiteModel to use builtin flags instead of manually parsing them in lce_benchmark_main.cc
- modified lce_benchmark_main.cc to use LceBenchmarkTfLiteModel, the global flags are set upon calling overriden Run() method by passing them as an internal reference
- added build options for lce_benchmark_tflite_model.h same as in TFLite's benchmark_tflite_model.h
Copy link
Collaborator

@Tombana Tombana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks a lot!

@simonmaurer
Copy link
Contributor Author

simonmaurer commented Mar 8, 2022

@Tombana my pleasure. was also thinking about overriding ValidParams() to check when both use_reference_bconv and use_indirect_bgemm are set to true to throw an error/warning (same as BenchmarkTfLiteModel/BenchmarkModel do). similarly in the LCEInterpreter, what do you think ?
either we can do it in BenchmarkTfLiteModel/LCEInterpreter or when registering in lce_register_ops.h.

on the other hand there's a platform check to discretely register the regular bconv kernel if the optimized BGEMM kernel is not available. this might be also valuable to throw a warning for the caller

Copy link
Member

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. But can you explain your use case a bit, I'm curious why you need to turn off XNNPack in the interpreter.

@@ -40,11 +42,17 @@ def __init__(
flatbuffer_model: bytes,
num_threads: int = 1,
use_reference_bconv: bool = False,
use_indirect_bgemm: bool = False,
use_xnnpack: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the default or am I missing something? I think that's fine, but can you explain the reasoning behind it?

Copy link
Contributor Author

@simonmaurer simonmaurer Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @lgeiger. you're correct. following the discussion on #713 I've implemented it like this.
A) one reason to include this flag in the LceInterpreter is to make it reflect more the behavior of the benchmark binary.
B) the second reason is dynamic allocation, thus having models with allocation_type==kTfLiteDynamic.
we had models like this and when we set use_xnnpack=true in the benchmark_model there will be an error thrown. this would be true for LCEInterpreter as well, as under the hood XNNPACK is "silently" applied.

@Tombana
Copy link
Collaborator

Tombana commented Mar 8, 2022

@Tombana my pleasure. was also thinking about overriding ValidParams() to check when both use_reference_bconv and use_indirect_bgemm are set to true to throw an error/warning (same as BenchmarkTfLiteModel/BenchmarkModel do. similarly in the LCEInterpreter, what do you think ? either we can do it in BenchmarkTfLiteModel/LCEInterpreter or when registering in lce_register_ops.h.

That sounds good yes. It's probably easiest to do it in lce_ops_register.h, then all uses are covered. As for the output mechanism, it seems that by default the Interpreter uses DefaultErrorReporter() (from #include "tensorflow/lite/stderr_reporter.h") so its probably the cleanest to use that instead of calling printf directly.

on the other hand there's a platform check to discretely register the regular bconv kernel if the optimized BGEMM kernel is not available. this might be also valuable to throw a warning for the caller

It seems that the TFLite conv kernels don't have any warnings there either, so I think we can leave those like it is.

@simonmaurer
Copy link
Contributor Author

simonmaurer commented Mar 9, 2022

@Tombana my pleasure. was also thinking about overriding ValidParams() to check when both use_reference_bconv and use_indirect_bgemm are set to true to throw an error/warning (same as BenchmarkTfLiteModel/BenchmarkModel do. similarly in the LCEInterpreter, what do you think ? either we can do it in BenchmarkTfLiteModel/LCEInterpreter or when registering in lce_register_ops.h.

That sounds good yes. It's probably easiest to do it in lce_ops_register.h, then all uses are covered. As for the output mechanism, it seems that by default the Interpreter uses DefaultErrorReporter() (from #include "tensorflow/lite/stderr_reporter.h") so its probably the cleanest to use that instead of calling printf directly.

on the other hand there's a platform check to discretely register the regular bconv kernel if the optimized BGEMM kernel is not available. this might be also valuable to throw a warning for the caller

It seems that the TFLite conv kernels don't have any warnings there either, so I think we can leave those like it is.

perfect. I've added this as you suggested with the DefaultErrorReporter() first, which returns a static pointer to report an error for this use case. as it should be a warning, we're now using TFLITE_LOG(WARN) instead if y'all are okay with that.

@simonmaurer
Copy link
Contributor Author

simonmaurer commented Mar 9, 2022

@Tombana @lgeiger any hints why the inclusion of "@org_tensorflow//tensorflow/lite/tools/logging" is only working for tf_cc_binary() in BUILD of lce_benchmark_model but not for interpreter_wrapper_lite in py_bind_extension() as a dependency?

@Tombana
Copy link
Collaborator

Tombana commented Mar 9, 2022

@Tombana @lgeiger any hints why the inclusion of "@org_tensorflow//tensorflow/lite/tools/logging" is only working for tf_cc_binary() in BUILD of lce_benchmark_model but not for interpreter_wrapper_lite in py_bind_extension() as a dependency?

I'm not sure, I think you might need ..../tools:logging instead of ..../tools/logging.

@Tombana
Copy link
Collaborator

Tombana commented Mar 9, 2022

I think the PR is finished, right @simonmaurer ? Then it can be merged.

@simonmaurer
Copy link
Contributor Author

yessir 💪

@Tombana Tombana merged commit 4cb8e72 into larq:main Mar 10, 2022
@CNugteren CNugteren added the feature New feature or request label Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants