-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Caffe Frontend] supporting group > 1 cases for Deconv op #8125
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>
* Handling group > 1 cases, assuming group == output channels * Decomposed into Relay split, transposed conv, and multi-leveled concatenation. * Added some test cases. Signed-off-by: zotanika <zotanika@gmail.com>
* [TVMC] Add support for the MLF to 'compile' command Add support for the Model Library Format (MLF) to 'tvmc' so users can output compilation artifacts to a MLF archive passing the new flag '--output-format mlf'. For instance: $ python3 -m tvm.driver.tvmc compile ./sine_model.tflite --target="c" --output sine.tar --output-format mlf will generate a sine.tar archive that is serialized accordingly to the MLF. Since the MLF is currently meant to be used only on micro targets, an error is generated if one tries to run a MLF outside a micro context. The micro context does not exist yet but will be later introduced as part of the [RFC] "TVMC: Add support for µTVM". That commit also adds 3 pytest tests to test tvmc + MLF. Finally, it also fixes some missing periods in the 'compile' command help sections and renames export_format to output_format so there is no confusion with flag '--dump-code', which contains "formats to export" in its help section. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> * Fix missing importorskip in the import_package test Fix missing importorskip() in the import_package test allowing the test in question to be skipped when 'tflite' is not installed in the test environment, otherwise the test will fail with: [...] > archive_path = exported_tvmc_package.package_path E AttributeError: 'str' object has no attribute 'package_path'
Added handling of CallNode objects created via packed functions invocation + test cases. Change-Id: I5374abc59a3b0f79f27364c45f1a5789536df940
This PR is part of the TensorIR upstreaming effort (apache#7527), stage M2a. In this PR, we implemented ScheduleError, an error reporting mechanism for schedule primitives to report user-face error messages, with the functionality of rendering the TIR out in the TVM script syntax. This set of APIs allows future improvement of error location rendering, e.g. more colorful rendering mechanisms like synr does. Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
* Fix typos and format in comments Fix typos and format in comments about the registry manager of packed functions. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> * Fix lint No more than 100 characters per line is allowed.
Fix typo in a comment about AOT executor. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
* [Vulkan] Enable instance/device extensions - Vulkan requires that extensions be explicitly enabled if used. Explicitly list out which extensions are required (currently none) and which are optional. * [Vulkan] Extract device information from vulkan API. - Based on vkGetPhysicalDeviceProperties and vkGetPhysicalDeviceFeatures, determine which Vulkan capabilities are supported, pack into a Target. * [Vulkan] Query instance-supported apiVersion before creating instance - Previously, vkCreateInstance was called to initialize Vulkan 1.0. * [Vulkan] Moved options for dedicated allocation and push descriptors to environment variables - Query support for dedicated allocation and push descriptors along with the rest of the device support. Move the options to disable their use from compile-time variables to environment variables `TVM_VULKAN_DISABLE_PUSH_DESCRIPTOR` and `TVM_VULKAN_DISABLE_DEDICATED_ALLOCATION`. * [Vulkan] Move option for vulkan validation layers to environment variable - Moved to enable faster use as a debug tool. If `TVM_VULKAN_ENABLE_VALIDATION_LAYERS` is a non-empty string, validation layers will be enabled. * [Vulkan] Explicitly enable vulkan features in device creation - Vulkan requires that features be explicitly enabled before use. For each feature that the device supports and a shader might use, declare it in the call to `vkCreateDevice`. * [Vulkan] Avoid repeated queries for device attributes. - Implement `VulkanDeviceAPI::GetAttr` based on the per-device values stored in the Target. This pulls all logic for querying device parameters is in a single location. * [Vulkan] Implement "from_device" flag for the vulkan target. - With the number of device capabilities that may or may not be supported by a vulkan driver, it can be tedious to input them. Specifying "-from_device=0" now indicate that any unspecified values should be read from the device. * [Vulkan][Codegen] Read vulkan device capabilities/limits from Target - Previously, the codegen assumed that all device features were present. Now, the codegen reads device capabilities from the Target, and throws an error if codegen would require use of an unsupported feature. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
This flag is causes CUBLAS to use tensore cores on all operations. With f32 or f64 operations, this leads to loss of accuracy.
* Add fast_softmax support in fast_math pass * Lintfix * Update
@FrozenGene Please help to manage this PR |
Co-authored-by: wangyucheng <wangyucheng@sensetime.com>
Change-Id: I927b43df95a8db8b042bc3cf2a1f23739d102b9d
@zotanika Do you mind splitting two PRs ? One for Deconv, another is for reduction. |
* initial * remove compare * temp fix * debugging * hack * hack for testing * both test pass * cleanup * fix tests and tutorials * restructure * cleanup * cleanup * fix check files * fixed for physical devices * address comments * reduce nrf stack size * update sample url * format
This commit pins the black version to provide stability. It is expected that the pinned version will be moved forward periodically. Change-Id: Ied866bff85a1a832959bc1d4673a7fdec68128a7
* [IR][Pass][Instrument] Pass instrument framework This commit provides utilies to instrument passes: 1. Add a new namespace tvm.instrument 2. Introduce PassInstrument and PassInstrumentor to PassContext Example ------- passes_mem = #... Impl of memory instrument passes_time = tvm.instrument.PassesTimeInstrument() with tvm.transform.PassContext( pass_instrumentor=PassInstrumentor([passes_mem, passes_time])): tvm.relay.build(mod, 'llvm') passes_mem.rendor() passes_time.rendor() 3. Integrate existing PassContext::Trace() and timing profile * [IR][Pass][Instrument] Fix python test_pass_manager.py * Fix comment * Fix lint * Fix test_pass_annotation * Fix test_pass_annotation.py * Fix lint * Fix test_pass_annotation.py * Fix test_pass_annotation.py * Fix review comments * Fix tutorial use_pass_infra.py * Fix review comments * Fix review comments * Fix typo * Fix review comments * Fix review comments * Fix unittest error: test_cow_pass * Fix unittest error * Add more test cases for exceptions * Fix nit * Doc override_instruments() * Fix review comments * Fix lint * Fix EnterContext exception behavior
…nality. (apache#8157) This is in preparation for additional refactoring. Functions are organized according to group similar functionality together, to minimize the amount of file-to-file transfers needed later. The main divisions are between VulkanDeviceAPI, VulkanModuleNode/VulkanWrappedFunc, VulkanThreadEntry, and VulkanContext. Other than minimal renaming of private functions and addition of some comments, this commit should have zero changes to the functions definitions themselves, only to their arrangement within the src/runtime/vulkan directory. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
- adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com>
This reverts commit e26846f.
- Generate valid LLVM IR. - Set proper alignment on the constant variables.
This helps in debugging, as the function name, arguments, and docstrings show the function name from the source code instead of the wrapper function.(e.g. `<function tvm.topi.cuda.dense.dense_small_batch(cfg, data, weight, bias=None, out_dtype=None)>` instead of `<function tvm.autotvm.task.topi_integration.register_topi_compute.<locals>._decorate.<locals>.wrapper(*args, **kwargs)>`.) Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
Reduced default number of threads in reduction kernels for Metal. Default code generation generated thread block with the following size: 32x32x1. With this size number of threads per threadgroup was equal to 1024 (32 * 32 * 1). Sometimes device doesn't have enough resources and in this case we will get an exception that the block size is greater than value of maxTotalThreadsPerThreadgroup. To prevent such situation we decrease default number of threads. With this fix every model should work with default codegen and auto-tuning or auto-scheduling will select the optimal number of threads.
…pache#8230) Currently board-specific config files (boards/*.conf) are not copied from Zephyr project dir to the destination build dir, so as a consequence the per board configs are not used when building the runtime libraries, like libcommon. Hence, for instance, it's currently not possible to set CONFIG_FPU per board since it only takes effect when it's set in the generic 'prj.con' config file. This commit fixes it by copying to the build dir (to each lib dir) the proper .conf for the selected target board. For example, if target 'qemu_x86' is selected 'qemu_x86.conf' is copied to the boards/ dir inside the lib dirs, so Zephyr build system can find it and combine it with configs found in the generic 'prj.conf'. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
* Fixed the destruction order tflite::Interpreter and EdgeTPUContext * Fixed include omission * Formatted
…ncubator-tvm into frontend-caffe-deconv
…DeviceAPI (apache#8196) * [Vulkan][Refactor] Moved VulkanStream ownership from VulkanThreadEntry to VulkanDevice - Implemented ThreadMap, a container for per-thread objects. Unlike dmlc::ThreadLocalStore, ThreadMap is intended for use as a non-static thread-specific lookup. - Added ThreadMap<VulkanStream> as a member to VulkanDevice, updated all uses. * [Vulkan][Refactor] Pulled VulkanBuffer allocation/deallocation into constructor/destructor. - VulkanBuffer owns the VkBuffer and VkDeviceMemory that it allocates, and deallocates on destruction. - VulkanHostVisibleBuffer owns a VulkanBuffer, and additional calls vkUnmapMemory on destruction. * [Vulkan][Refactor] Move the VulkanStagingBuffer to be owned by the VulkanDevice - Previously, was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Move ownership of per-thread uniform buffer to VulkanDevice - Previously, VulkanUniformBuffer was owned by VulkanThreadEntry, so any use required looking up both the thread entry and the device. Now, thread-specific lookup is handled in the VulkanDevice class. * [Vulkan][Refactor] Moved ownership of per-thread workspace pool to VulkanDeviceAPI - Previously, the WorkspacePool was owned by VulkanThreadEntry, and required a lookup from VulkanDeviceAPI::AllocWorkspace. As a result, non-global VulkanDeviceAPI would interact with each other. * [Vulkan][Refactor] Moved ownership of per-thread active device id to VulkanDeviceAPI - Previously, the active device was owned by VulkanThreadEntry, so lookups to multiple global variables were required. Now, everything goes from the VulkanDeviceAPI. - Removed VulkanThreadEntry, as all functionality has been moved to either VulkanDevice or VulkanDeviceAPI. Co-authored-by: Eric Lunderberg <elunderberg@octoml.ai>
* [BYOC][ACL] Prevent dilated pooling Added check preventing avg_pool2d and max_pool2d to be scheduled for execution via ACL* runtime if dilation other than (1, 1) is provided as ACL does not currently support dilation attribute in pooling layer. *ACL stands for "Compute Library for the Arm® Architecture" Change-Id: If8f65d3a154e09f880bec73dd756d9f985a20ff2 * linter Change-Id: If91809350786e69f59596301e0cbd3def6815cd0
…he#7858) - Replaced capabilities header file with api calls introduced by the 20.11 ethosn driver stack release. - Removed 20.08 driver stack support and updated all affected code.
* num of cores * add target list * extension * qemu * fix * comments * add qemu to setup build * fix * add mps2 test * merge fix * add commit option * add log * fix * fix zephyr init * rename * fix zephyr init * uncomment * fixed qemu isntall * cleanup * version * add commit option * fixed qemu isntall * add docker import * cleanup * fix * cleanup * fix * fix zephyr path * fix * fix * address comments * fix test * fix * add wait * comments * changed test to script * add checks * fix zephyr * Revert "add wait" This reverts commit 70f3c7d. * address comments
…adcast (apache#8250) * Allow cblas batch_matmul implicit bcast * Add cblas batch_matmul bcast when batch_a=1
The micro TVM page was moved during a recent docs update. This patch moved the top level index to the former location.
…che#8245) * [CI] [ComputeLibrary] Use pre-built binaries instead of compiled Pre-built Compute Library binaries are now downloaded (credits to @leandorn) instead of on-site compilation. Change-Id: I9fd66ce02141813f02382b95351a382ccf775584 * Added Apache 2.0 License Change-Id: I3c2af1a86984f81c4ee9408925af9c51510a978f
…nto frontend-caffe-deconv
reopened #8260 on a clean branch |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
group > 1
cases, assuminggroup == output channels
split
,conv2d_transposed
, and multi-leveledconcatenate
ops.