Releases · sophgo/tpu-mlir

29 May 11:15

github-actions

v1.8

a085169

TPU-MLIR v1.8 Release

Highlights:

Enhancements:
- Added support for dynamic shape inference in various operations.
- Optimized core operations for better performance on specific models.
- Improved backend support for multiple models like BM1684X, BM1688, BM1690, SG2380, etc.
- Introduced new operations and patterns for more efficient model processing.
- Updated documentation for better clarity and user guidance.
Bug Fixes:
- Resolved issues related to input/output handling, kernel configurations, and model-specific bugs.
- Fixed bugs in dynamic compilation, core parallel processing, and various backend operations.
- Addressed errors in specific model post-processing steps like YOLOv5, EfficientNet, etc.
Performance Improvements:
- Optimized cycle calculations for multi-core models.
- Enhanced bandwidth usage statistics for better resource management.
- Accelerated compilation processes for training models using a new layer-group scheme.
New Features:
- Introduced new operations like attention quant block, prelu op, and various dynamic compile features.
- Added support for additional operations, weight location, and dynamic compile enhancements.

Documentation Updates:

Updated developer manuals, quick start guides, and model-specific documentation for better understanding.

Miscellaneous:

Streamlined workflows for faster commit checks and improved debugging processes.
Added new test cases for regression testing and script-based model evaluations.
Fine-tuned backend operations for improved model performance and accuracy.

Assets 8

19 Apr 09:58

charlesxzb

v1.7

5598736

TPU-MLIR v1.7 Release

Change Log

New Features

Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
Added --dump_dataframe option for bmodel_checker and support for transpose with order [1, 2, 3, 0].
Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
Added new patterns for Cswin and Einsum operations.
Improved support for LLM (Large Language Models) in bm1688.

Bug Fixes

Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
Addressed logical issues in AddToScale pattern and issues in fp_forward.
Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Performance Improvements

Improved the performance of TDB and the bmodel_checker for 1684x pcie.
Optimized facenet and fixed performance issues of 1688 multicore.
Enabled single-core mode optimizations where necessary.

Documentation and Testing

Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
Fixed various documentation errors and updated the release note.

Other Changes

Added restrictions to tpulang ops and net test cases.
Adjusted descriptions and refined interfaces for better user experience.
Updated backend .so files and addressed sensitive words in the codebase.
Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.

Assets 8

01 Apr 09:16

luluman

v1.7-beta.0

af1ab50

Technical Preview Pre-release

Pre-release

Features

Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
Introduced fx2mlir, a new functionality for enhanced MLIR conversion.
Implemented nnvlc2.0 and nnvlc1.0 local activation and weight operations, respectively, for improved neural network performance.
Enabled TPULANG support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
Added cv186x support in run_sensitive_layer.py and for the TDB, expanding compatibility and debugging capabilities.
Introduced new ops and features like Watchpoint in TDB and activation ops support for scale & zero_point, broadening the range of functionalities available in the tpu-mlir project.
Supports BM1690.
L2mem performs intermediate data exchange for active tensor.

Bug Fixes

Resolved a variety of bugs affecting backend processes, including issues with the 1684x backend, permutefuse2, permutemulconstswap, and more, improving overall stability and performance.
Fixed several critical issues across tpulang, including errors in sort_by_key operation, reshape operations, where operation, and more, enhancing the language's reliability for developers.
Addressed bugs in model processing, including fixes for concat logic, scale2conv, scale2conv3d, instance norm, and several more, ensuring smoother model optimization and execution.
Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Documentation Updates

Updated tpulang documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Performance Improvements

Optimized TDB and bmodel_checker for 1684x pcie mode, significantly reducing processing times and enhancing efficiency for model analysis.
Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
Enabled IO tag mode and refined address mode for better memory management and operational flexibility.

Assets 8

27 Mar 12:25

luluman

v1.6.1

8e3206a

TPU-MLIR v1.6.1

Full Changelog: v1.6...v1.6.1

Assets 8

23 Feb 16:56

luluman

v1.6

eadac2d

TPU-MLIR v1.6 release

Change Log

Bug Fixes

Fixed documentation errors and added checks for documentation errors during build.
Set workaround for ar.copy cycle issue to 0, avoiding potential data overwriting in inplacing operations.
Addressed a bug in Caffe DetectionOutput and fixed a hang in cv186x.
Corrected Mul buffer size alignment issues and various other buffer size corrections.
Fixed issues with attention accuracy, RotaryPosEmbPattern, and op status validation before the matching process.
Addressed a series of backend bugs, including daily build errors, performance declines, and incorrect return values.
Fixed data_checker issues, api_conv bug, and a local slice calculation bug.
Resolved incorrect affineMap for Pooling buffer and fixed reshape bug for inner products.
Corrected Mul&Div dynamic support for local operations and fixed issues with Conv2d buffer size calculations.
Addressed various matmul bugs, including fp8 support issues and quantization inconsistencies.

Features

Enabled multicore optimizations and added support for multi-core model tests.
Updated libbackend_1688.so and various backend updates for better performance and compatibility.
Introduced groupParallel operation, support for dynamic input data generation.
Added support for new patterns such as Permute fuse pattern and splitQuantizedMLP pattern.
Implemented npz compare visualizer tool and added support for bm1688 backend.
Added MatMul weight split case and improved permute performance.
Added support for img2col pattern, attention interface, and several dialects for SG2260 operations.

Documentation Updates

Updated release notes and resolved issues with document formatting.
Standardized expression terminology and replaced sensitive words in documentation.

Performance Improvements

Improved local softmax performance and optimized dataFlow checking in coreMatch.
Enhanced performance for Vit L i8 4 batch operations and refined conv multi-core handling.
Optimized VIT-B concurrency and addressed performance issues with MaxPool buffer sizes.

Assets 8

29 Jan 13:39

luluman

v1.6-beta.0

dc061a7

v1.6-beta.0 Pre-release

Pre-release

New Features

Implemented SG2260 structureOp interface and structured transform, including a solver for finding transforms【ea234bc2†source】.
Added OneHot converter and support for fp8 in the debugger【c03ba46c†source】【f87127bd†source】【fed7e68a†source】.
Supported MatMulOp for special cases broadcast in batch dims and added interface for attention【90d4b327†source】【044c4fc3†source】.
Provided "decompose linalg op" and "tile+fuse" pass for MatMul parallel supports more batch patterns【25f24e3d†source】.
Unet single block test added【ea76f9c9†source】.
Implemented fp8 support for Matmul and other ops including addconst, subconst, mul, add, sub, and abs【e09adbda†source】【7eaec57f†source】.

Performance Improvements

Improved Matmul fp8 performance with new backend support【2b8dd03b†source】.
Enabled distribute MLP and attention with improved performance for cascade_net input/output names and order【d5a42d7a†source】.
Refactored tdb to improve disassembler serialize and resolve BM1688 decoding issue【e73450f8†source】【1457df29†source】.
Improved weight reorder for ConvOp and optimized permute of attention matmul【a9045c3c†source】【91a353e3†source】.

Bug Fixes

Resolved various bugs in MatMul, Conv, and other ops across multiple chipsets including SG2260, BM1688, and CV18xx【b809a8c1†source】【bfada4de†source】【9804e30c†source】.
Fixed bugs related to ReduceOp, ArgOp, SliceOp, and others for better operation and tensor handling【2cdeb60d†source】【bbacf00f†source】.
Addressed issues in SAM, daily test, and tdb related to core operations and functionality【83e1979c†source】【7c37e39d†source】.
Fixed memory and data handling bugs for more accurate and stable operation of the models【2310cd8d†source】【0ed60f1f†source】.

Documentation Updates

Updated documentation to remove sensitive words and improve clarity and comprehensiveness【43e0b428†source】【5d6c49fc†source】.

Miscellaneous

Enhanced various backend libraries and supported new ops and patterns for more efficient and versatile model handling【1ca95d71†source】【8f1a2de8†source】.
Improved scatterE and reduce dynamic shape_value handling for better model optimization【fa2ccf29†source】.
Refinements in graph optimization, permute parallel indexMapping, and related areas for improved model processing【094f05da†source】【1ec6c16b†source】.

Assets 8

03 Nov 10:00

luluman

v1.5-beta.0

1fee518

Technical Preview Pre-release

Pre-release

TPU-MLIR Project Update

Bug Fixes and Dependency Updates

Fix Dependency: Fixed the dependency of MLIRInputConversion.
SDK Release Workflow: Fixed tpu-mlir tag for building and added workflow file for SDK release.
Softplus LoweringINT8: Fixed 1684 Softplus LoweringINT8 issue.
Slice Begin Index: Fixed bm1684 slice begin_index problem.
Mul Conflict Resolution: Partially fixed the output data sign of mul conflict with chip restriction.

Feature Enhancements and Support

Subgraph Split Support: Enhanced support for subgraph split.
Quant IO List Note: Added quant io list note for better quantization handling.
New Full Operation: Supported the aten::new_full operation.
Torch Flip for bm1684x: Added support for torch.flip for bm1684x.
Weight Input Shape Bind: Supported shape bind for weight input.

Updates and Implementations for Specific Operations

Backend Update for sg2260: Updated sg2260 for backend for tag31.
ScatterElements Implementation: Implemented ScatterElements for any axis.
Unary Indexing Map: Added unary indexing map.
Binary Indexing Map: Added binary (add/sub/mul/div/min/max) indexing map.
Dynamic NMS Support: Featured support for dynamic nms for bm1684x.

Codebase and Documentation Refinements

Cleanup: Removed test/sg2260 dialect.
Documentation Update: Updated nntoolchain README and lib.
Codegen Documentation: Added documentation for codegen.
Template Format Update: Updated import mlir file template format.
Quick Start Docs Modification: Modified quick start docs for tpu-mlir.

Optimizations and Performance Improvements

Kernel Module Usage: Reverted to using the old kernel module.
MLIR Conv2D Optimization: Improved 1684 mlir conv2d with 3ic optimization.
SWINT Quantization: Added swint quant for better performance.
Opt Parameter Addition: Added an optimization parameter.
Loop and Fusion Enhancements: Supported interchange of inner loop, padOp transform, tensor op collapse, fusion on linalg-on-tensor, etc.

Assets 7

27 Sep 13:50

luluman

v1.4-beta.0

ef50921

Technical Preview

🐳 Docker Image Update

Changed required Docker image from sophgo/tpuc_dev:v2.2 to sophgo/tpuc_dev:v3.1, which is based on Ubuntu 22.04.

📖 Documentation

Updated docs to add more parameters in model deployment.

🐛 Bug Fixes

Fixed TPU-MLIR dialect Python binding for DEBUG mode.
Resolved backward training bug.
Addressed average pooling and max pooling issues.
Several other bug fixes related to Winograd inference, training, and more.

🚀 Feature Additions

Added Deconv3D backend support.
Support for dynamic tile added for bm1684x.
Added Winograd feature.
Several other feature additions, including dual-core support in debugger, MatMulSliceMerge support for int8/int4, and more.

🔧 Code Maintenance

Code renaming and cleaning.
Regression adjustments and tests.

⚙️ Backend Updates

Backend updates for various architectures including BM1684 and sg2260.

Assets 4

21 Aug 14:23

luluman

v1.3-beta.1

f735aff

Technical Preview Pre-release

Pre-release

New Features and Enhancements

Support for Various Operations: Added support for exp, erf, gelu, loopop, and other operations for specific platforms.
Tooling and Visualization: Renamed profile.py, added visual tools for weights, and enhanced debugging capabilities.
Model Support and Adjustments: Added daily release models, scripts, and support for specific model types like yolov8, yolov4s.
Distribution and Multicore Support: Implemented distribution steps, multicore support, and group convolution transformation.

Bug Fixes and Resolutions

Model and Parsing Fixes: Resolved issues in emvd models, parsing errors, slice bugs, and fixed specific issues in bm1684 and bm1686.
Codegen and Canonicalization Fixes: Addressed type errors, canonicalization failures, and operand kind checks.
Inference and Optimization Fixes: Fixed inference issues in max, where, and slice operations, and refined canonicalization.

Documentation and Cleanup

Documentation Updates: Refined tpu-mlir docs, added supposed ops document, and updated specific documents.
Code Cleanup and Refactoring: Removed unnecessary code, reconstructed permute move canonicalization, and prepared for LLVM upgrade.

Other Changes

Testing and Calibration: Added test cases, calibration updates, and support for regression and tag in TDB.
Backend and Runtime Adjustments: Updated backend, added support for auto-increase op, and fixed minor bugs.

Assets 3

26 Jul 09:25

PostMalone1998

v1.3-beta.0

879cc01

Technical Preview

Features:
BM1686: support post handle op, provided parallelOp codegen, add DivOp for f16/bf16.
BM1684: Support dynamic compilation load tensor from L2mem, implement GROUP_3D local layer function, support more dynamic ops, like MinConst, MaxConst, Lut; and some static ops, like deform_conv2d.
CV18XX: Support more ops like equalOp.
Support IfOp for f16/bf16/int8 mode.
Implement post process function of sensitive layer, unranked tensor and dynamic tensor at frontend, add empty and baddbmm torch converter/interpreter.
Support weight split when layer group if op is broadcastbinary, suppoprt parse ops of each layer in top.mlir, support int32 to i/u8 inference for modeol_runner.py.
Remove onnx-sim and use unranked_type for all ops.
Implement more graph opimize: merge matmul + add to matmul if float type, fuse same operation pass, weight trans when permute+add.
Support more torch ops, like rmsnorm, ceil, remainder.
Other new operations: lowering of GatherElements, multi-input Add.

Bug Fixes:
Fix chatglm2 rmsnorm untransformed prob, ScaleOp inference error, bmodel_dis format bin, shape inference of matmul, subnet output order mismatch cause error in dynamic runtime.
Avoid duplicate name of inserted CastOp, distinguish caffe matmul shape.

Code Refactoring:
Use llvm::md5, llvm::sha256.
Use Clang to speed up code compilation.
Remove some unused header files.
Use rewriter.eraseOp instead of op->earse, use string to define padding mode.
Refine disassembler, refactor mix_precision.

Documentation Updates:
Update document version and change some model-zoo requirements.
Modified English part and modified developer_manual doc for visual.py part.

Testing and Verification:
Updated list of test models supported by BM1684X.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Features

Bug Fixes

Documentation Updates

Performance Improvements

Change Log

Bug Fixes

Features

Documentation Updates

Performance Improvements

New Features

Performance Improvements

Bug Fixes

Documentation Updates

Miscellaneous

TPU-MLIR Project Update

Bug Fixes and Dependency Updates

Feature Enhancements and Support

Updates and Implementations for Specific Operations

Codebase and Documentation Refinements

Optimizations and Performance Improvements

New Features and Enhancements

Bug Fixes and Resolutions

Documentation and Cleanup

Other Changes

Releases: sophgo/tpu-mlir

TPU-MLIR v1.8 Release

TPU-MLIR v1.7 Release

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Technical Preview

Features

Bug Fixes

Documentation Updates

Performance Improvements

TPU-MLIR v1.6.1

TPU-MLIR v1.6 release

Change Log

Bug Fixes

Features

Documentation Updates

Performance Improvements

v1.6-beta.0

New Features

Performance Improvements

Bug Fixes

Documentation Updates

Miscellaneous

Technical Preview

TPU-MLIR Project Update

Bug Fixes and Dependency Updates

Feature Enhancements and Support

Updates and Implementations for Specific Operations

Codebase and Documentation Refinements

Optimizations and Performance Improvements

Technical Preview

Technical Preview

New Features and Enhancements

Bug Fixes and Resolutions

Documentation and Cleanup

Other Changes

Technical Preview