Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Caffe Frontend] supporting group > 1 cases for Deconv op #8125

Closed
wants to merge 141 commits into from

Conversation

zotanika
Copy link
Contributor

  • Handling group > 1 cases, assuming group == output channels
  • Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops.
  • Added some test cases.

zotanika and others added 18 commits May 11, 2021 16:50
- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework
- skipping Relay multiplication if coeff is 1

Signed-off-by: zotanika <zotanika@gmail.com>
* Handling group > 1 cases, assuming group == output channels
* Decomposed into Relay split, transposed conv, and multi-leveled concatenation.
* Added some test cases.

Signed-off-by: zotanika <zotanika@gmail.com>
* [TVMC] Add support for the MLF to 'compile' command

Add support for the Model Library Format (MLF) to 'tvmc' so users can
output compilation artifacts to a MLF archive passing the new flag
'--output-format mlf'. For instance:

$ python3 -m tvm.driver.tvmc compile ./sine_model.tflite --target="c" --output sine.tar --output-format mlf

will generate a sine.tar archive that is serialized accordingly to the
MLF.

Since the MLF is currently meant to be used only on micro targets, an
error is generated if one tries to run a MLF outside a micro context.

The micro context does not exist yet but will be later introduced as
part of the [RFC] "TVMC: Add support for µTVM".

That commit also adds 3 pytest tests to test tvmc + MLF.

Finally, it also fixes some missing periods in the 'compile' command
help sections and renames export_format to output_format so there is
no confusion with flag '--dump-code', which contains "formats to export"
in its help section.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>

* Fix missing importorskip in the import_package test

Fix missing importorskip() in the import_package test allowing the
test in question to be skipped when 'tflite' is not installed in the
test environment, otherwise the test will fail with:

[...]
>       archive_path = exported_tvmc_package.package_path
E       AttributeError: 'str' object has no attribute 'package_path'
Added handling of CallNode objects created via packed
functions invocation + test cases.

Change-Id: I5374abc59a3b0f79f27364c45f1a5789536df940
This PR is part of the TensorIR upstreaming effort (apache#7527), stage M2a.

In this PR, we implemented ScheduleError, an error reporting mechanism for schedule primitives to report user-face error messages, with the functionality of rendering the TIR out in the TVM script syntax.

This set of APIs allows future improvement of error location rendering, e.g. more colorful rendering mechanisms like synr does.

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn>
Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Hongyi Jin <3231950289@qq.com>
Co-authored-by: Wuwei Lin <wuwei@apache.org>
Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
* Fix typos and format in comments

Fix typos and format in comments about the registry manager of
packed functions.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>

* Fix lint

No more than 100 characters per line is allowed.
Fix typo in a comment about AOT executor.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
* [Vulkan] Enable instance/device extensions

- Vulkan requires that extensions be explicitly enabled if used.
  Explicitly list out which extensions are required (currently none)
  and which are optional.

* [Vulkan] Extract device information from vulkan API.

- Based on vkGetPhysicalDeviceProperties and
  vkGetPhysicalDeviceFeatures, determine which Vulkan capabilities are
  supported, pack into a Target.

* [Vulkan] Query instance-supported apiVersion before creating instance

- Previously, vkCreateInstance was called to initialize Vulkan 1.0.

* [Vulkan] Moved options for dedicated allocation and push descriptors to environment variables

- Query support for dedicated allocation and push descriptors along
  with the rest of the device support.  Move the options to disable
  their use from compile-time variables to environment variables
  `TVM_VULKAN_DISABLE_PUSH_DESCRIPTOR` and
  `TVM_VULKAN_DISABLE_DEDICATED_ALLOCATION`.

* [Vulkan] Move option for vulkan validation layers to environment variable

- Moved to enable faster use as a debug tool.  If
  `TVM_VULKAN_ENABLE_VALIDATION_LAYERS` is a non-empty string,
  validation layers will be enabled.

* [Vulkan] Explicitly enable vulkan features in device creation

- Vulkan requires that features be explicitly enabled before use.  For
  each feature that the device supports and a shader might use,
  declare it in the call to `vkCreateDevice`.

* [Vulkan] Avoid repeated queries for device attributes.

- Implement `VulkanDeviceAPI::GetAttr` based on the per-device values
  stored in the Target.  This pulls all logic for querying device
  parameters is in a single location.

* [Vulkan] Implement "from_device" flag for the vulkan target.

- With the number of device capabilities that may or may not be
  supported by a vulkan driver, it can be tedious to input them.
  Specifying "-from_device=0" now indicate that any unspecified values
  should be read from the device.

* [Vulkan][Codegen] Read vulkan device capabilities/limits from Target

- Previously, the codegen assumed that all device features were
  present.  Now, the codegen reads device capabilities from the
  Target, and throws an error if codegen would require use of an
  unsupported feature.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
This flag is causes CUBLAS to use tensore cores on all operations. With
f32 or f64 operations, this leads to loss of accuracy.
* Add fast_softmax support in fast_math pass

* Lintfix

* Update
@tqchen
Copy link
Member

tqchen commented May 26, 2021

@FrozenGene Please help to manage this PR

wyc-ruiker and others added 4 commits May 26, 2021 16:00
Co-authored-by: wangyucheng <wangyucheng@sensetime.com>
Change-Id: I927b43df95a8db8b042bc3cf2a1f23739d102b9d
)

Currently, on linux platforms, only checks for cuda install directory
in /usr/local/cuda/include.  The `nvidia-cuda-dev` package of Ubuntu
20.04 installs at /usr/include, so it would be good to check that
location as well.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
@FrozenGene
Copy link
Member

@zotanika Do you mind splitting two PRs ? One for Deconv, another is for reduction.

mehrdadh and others added 5 commits May 28, 2021 10:13
* initial

* remove compare

* temp fix

* debugging

* hack

* hack for testing

* both test pass

* cleanup

* fix tests and tutorials

* restructure

* cleanup

* cleanup

* fix check files

* fixed for physical devices

* address comments

* reduce nrf stack size

* update sample url

* format
This commit pins the black version to provide stability.
It is expected that the pinned version will be moved forward periodically.

Change-Id: Ied866bff85a1a832959bc1d4673a7fdec68128a7
* [IR][Pass][Instrument] Pass instrument framework

This commit provides utilies to instrument passes:
  1. Add a new namespace tvm.instrument
  2. Introduce PassInstrument and PassInstrumentor to PassContext

     Example
     -------
    passes_mem = #... Impl of memory instrument
    passes_time = tvm.instrument.PassesTimeInstrument()

    with tvm.transform.PassContext(
        pass_instrumentor=PassInstrumentor([passes_mem, passes_time])):

        tvm.relay.build(mod, 'llvm')

        passes_mem.rendor()
        passes_time.rendor()

  3. Integrate existing PassContext::Trace() and timing profile

* [IR][Pass][Instrument] Fix python test_pass_manager.py

* Fix comment

* Fix lint

* Fix test_pass_annotation

* Fix test_pass_annotation.py

* Fix lint

* Fix test_pass_annotation.py

* Fix test_pass_annotation.py

* Fix review comments

* Fix tutorial use_pass_infra.py

* Fix review comments

* Fix review comments

* Fix typo

* Fix review comments

* Fix review comments

* Fix unittest error: test_cow_pass

* Fix unittest error

* Add more test cases for exceptions

* Fix nit

* Doc override_instruments()

* Fix review comments

* Fix lint

* Fix EnterContext exception behavior
…nality. (apache#8157)

This is in preparation for additional refactoring.  Functions are
organized according to group similar functionality together, to
minimize the amount of file-to-file transfers needed later.  The main
divisions are between VulkanDeviceAPI,
VulkanModuleNode/VulkanWrappedFunc, VulkanThreadEntry, and
VulkanContext.

Other than minimal renaming of private functions and addition of some
comments, this commit should have zero changes to the functions
definitions themselves, only to their arrangement within the
src/runtime/vulkan directory.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
zotanika and others added 27 commits June 10, 2021 16:11
- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework
- skipping Relay multiplication if coeff is 1

Signed-off-by: zotanika <zotanika@gmail.com>
- Generate valid LLVM IR.
- Set proper alignment on the constant variables.
This helps in debugging, as the function name, arguments, and
docstrings show the function name from the source code instead of the
wrapper function.(e.g.
`<function tvm.topi.cuda.dense.dense_small_batch(cfg, data, weight, bias=None, out_dtype=None)>`
instead of
`<function tvm.autotvm.task.topi_integration.register_topi_compute.<locals>._decorate.<locals>.wrapper(*args, **kwargs)>`.)

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Reduced default number of threads in reduction kernels for Metal.
Default code generation generated thread block with the following size:
32x32x1. With this size number of threads per threadgroup was equal to
1024 (32 * 32 * 1). Sometimes device doesn't have enough resources and
in this case we will get an exception that the block size is greater
than value of maxTotalThreadsPerThreadgroup.
To prevent such situation we decrease default number of threads. With
this fix every model should work with default codegen and auto-tuning or
auto-scheduling will select the optimal number of threads.
…pache#8230)

Currently board-specific config files (boards/*.conf) are not
copied from Zephyr project dir to the destination build dir, so
as a consequence the per board configs are not used when building
the runtime libraries, like libcommon. Hence, for instance, it's
currently not possible to set CONFIG_FPU per board since it only
takes effect when it's set in the generic 'prj.con' config file.

This commit fixes it by copying to the build dir (to each lib
dir) the proper .conf for the selected target board. For example,
if target 'qemu_x86' is selected 'qemu_x86.conf' is copied to
the boards/ dir inside the lib dirs, so Zephyr build system can
find it and combine it with configs found in the generic 'prj.conf'.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
* Fixed the destruction order tflite::Interpreter and EdgeTPUContext

* Fixed include omission

* Formatted
…DeviceAPI (apache#8196)

* [Vulkan][Refactor] Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice

- Implemented ThreadMap, a container for per-thread objects.  Unlike
  dmlc::ThreadLocalStore, ThreadMap is intended for use as a
  non-static thread-specific lookup.

- Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated
  all uses.

* [Vulkan][Refactor] Pulled VulkanBuffer allocation/deallocation into constructor/destructor.

- VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates,
  and deallocates on destruction.

- VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls
  vkUnmapMemory on destruction.

* [Vulkan][Refactor] Move the VulkanStagingBuffer to be owned by the VulkanDevice

- Previously, was owned by VulkanThreadEntry, so any use required
  looking up both the thread entry and the device.  Now,
  thread-specific lookup is handled in the VulkanDevice class.

* [Vulkan][Refactor] Move ownership of per-thread uniform buffer to VulkanDevice

- Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so
  any use required looking up both the thread entry and the device.
  Now, thread-specific lookup is handled in the VulkanDevice class.

* [Vulkan][Refactor] Moved ownership of per-thread workspace pool to VulkanDeviceAPI

- Previously, the WorkspacePool was owned by VulkanThreadEntry, and
  required a lookup from VulkanDeviceAPI::AllocWorkspace.  As a
  result, non-global VulkanDeviceAPI would interact with each other.

* [Vulkan][Refactor] Moved ownership of per-thread active device id to VulkanDeviceAPI

- Previously, the active device was owned by VulkanThreadEntry, so
  lookups to multiple global variables were required.  Now, everything
  goes from the VulkanDeviceAPI.

- Removed VulkanThreadEntry, as all functionality has been moved to
  either VulkanDevice or VulkanDeviceAPI.

Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* [BYOC][ACL] Prevent dilated pooling

 Added check preventing avg_pool2d and max_pool2d to be
scheduled for execution via ACL* runtime if dilation other
than (1, 1) is provided as ACL does not currently support
dilation attribute in pooling layer.

*ACL stands for "Compute Library for the Arm® Architecture"

Change-Id: If8f65d3a154e09f880bec73dd756d9f985a20ff2

* linter

Change-Id: If91809350786e69f59596301e0cbd3def6815cd0
…he#7858)

- Replaced capabilities header file with api calls introduced by the 20.11 ethosn driver stack release.
  - Removed 20.08 driver stack support and updated all affected code.
* num of cores

* add target list

* extension

* qemu

* fix

* comments

* add qemu to setup build

* fix

* add mps2 test

* merge fix

* add commit option

* add log

* fix

* fix zephyr init

* rename

* fix zephyr init

* uncomment

* fixed qemu isntall

* cleanup

* version

* add commit option

* fixed qemu isntall

* add docker import

* cleanup

* fix

* cleanup

* fix

* fix zephyr path

* fix

* fix

* address comments

* fix test

* fix

* add wait

* comments

* changed test to script

* add checks

* fix zephyr

* Revert "add wait"

This reverts commit 70f3c7d.

* address comments
…adcast (apache#8250)

* Allow cblas batch_matmul implicit bcast

* Add cblas batch_matmul bcast when batch_a=1
The micro TVM page was moved during a recent docs update. This
patch moved the top level index to the former location.
…che#8245)

* [CI] [ComputeLibrary] Use pre-built binaries instead of compiled

Pre-built Compute Library binaries are now downloaded (credits to @leandorn)
instead of on-site compilation.

Change-Id: I9fd66ce02141813f02382b95351a382ccf775584

* Added Apache 2.0 License

Change-Id: I3c2af1a86984f81c4ee9408925af9c51510a978f
@zotanika zotanika closed this Jun 15, 2021
@zotanika zotanika deleted the frontend-caffe-deconv branch June 15, 2021 05:34
@zotanika
Copy link
Contributor Author

reopened #8260 on a clean branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.