v0.1.6.post2 #1164
LeiWang1999
announced in
Announcements
v0.1.6.post2
#1164
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The Last Release for Python 3.8 (without
tvm-ffi) 🚀What's Changed
seq_q<seq_kvin flash attention examples by @Rachmanino in [Bugfix] Ensure correct handling for cases whereseq_q<seq_kvin flash attention examples #864B[i,j] = c[i] + A[i,j]by @kurisu6912 in [Fix] Fix bug 0905: tilelang doesn't vectorizeB[i,j] = c[i] + A[i,j]#798ExprDeepEqualinstead ofStructuralEqualwhen merge consecutive If stmt by @LeiWang1999 in [Bugfix] UseExprDeepEqualinstead ofStructuralEqualwhen merge consecutive If stmt #876T.ieee_rsqrtand related high precision op by @LeiWang1999 in [Precision] IntroduceT.ieee_rsqrtand related high precision op #882atomic_addperformance for bwd examples by @LeiWang1999 in [Example] Introduce split+sum template, and optimizeatomic_addperformance for bwd examples #940bfloat16and user-definedsm_scalein attention sink examples by @Rachmanino in [Example] Add support forbfloat16and user-definedsm_scalein attention sink examples #924pre-commitintegration by @XuehaiPan in [CI] addpre-commitintegration #955CumSum1Dby @LeiWang1999 in [TileOp] ImplememtCumSum1D#978T.alloc_varfor AugAssign and AnnAsign by @LeiWang1999 in [Language] EnhanceT.alloc_varfor AugAssign and AnnAsign #979InjectFenceProxyand expose some warp group primitives in frontend by @LeiWang1999 in [Refactor] Refactor PassInjectFenceProxyand expose some warp group primitives in frontend #977access_ptr("r")instead ofaccess_ptr("w")for correct pipeline analysis by @LeiWang1999 in [Bugfix] Useaccess_ptr("r")instead ofaccess_ptr("w")for correct pipeline analysis #983torch.accelerator.synchronize()totorch.cuda.synchronize()by @yyttt6 in [Bugfix] Fallbacktorch.accelerator.synchronize()totorch.cuda.synchronize()#987LowerIntrinfrom tvm into tilelang by @LeiWang1999 in [Transform] MigrateLowerIntrinfrom tvm into tilelang #999LowerIntrinby @LeiWang1999 in [TIR] Revert some changes of PassLowerIntrin#1035TL_LIBSby @LeiWang1999 in [Env] Optimize the mechanism for locatingTL_LIBS#1038T.get_warp_idx_syncandT.shuffle_electfor efficient thread election by @LeiWang1999 in [Language] ExposeT.get_warp_idx_syncandT.shuffle_electfor efficient thread election #989has_simt_copyto decide whether to insertset_max_nregby @chengyupku in [Refactor] Usehas_simt_copyto decide whether to insertset_max_nreg#982LegalizeSafeMemoryAccessto support recursive load/store rewrite by @SiriusNEO in [Refactor] Refactor PassLegalizeSafeMemoryAccessto support recursive load/store rewrite #1050T.Parallelwith dynamic extents by @LeiWang1999 in [Parallel] SupportT.Parallelwith dynamic extents #990tileang.clear_cache()by @LeiWang1999 in [Cache] raise errors fortileang.clear_cache()#1077T.dynamicinstead ofT.symbolicby @LeiWang1999 in [Language] Recommend usingT.dynamicinstead ofT.symbolic#1076T.reduce_with shared memory input/output by @LeiWang1999 in [Language] EfficientT.reduce_with shared memory input/output #1080tilelang_cythonand relocate its path by @LeiWang1999 in [Refactor] Rename cython output totilelang_cythonand relocate its path #1086tilelang.disable_cache()calls from examples and tests by @Rachmanino in [Cleanup] Removetilelang.disable_cache()calls from examples and tests #1088TL_STORAGE_REWRITE_DETECT_INPLACEby @LeiWang1999 in [PassConfig] Introduce PassConfigTL_STORAGE_REWRITE_DETECT_INPLACE#1089alloc_var(dtype, init=x)by @LeiWang1999 in [Language] Support tilelangalloc_var(dtype, init=x)#1092cuTensorMapEncodeIm2colcall by @chengyupku in [Bugfix] Fix missing hostcuTensorMapEncodeIm2colcall #1094format.shand addclang-tidyto GHA workflow by @XuehaiPan in [CI][Lint] Retireformat.shand addclang-tidyto GHA workflow #1044ldmatrixand update mamba scan kernel by @chengyupku in [Refactor] Use forceinline inldmatrixand update mamba scan kernel #1104format.shby @XuehaiPan in [Maint] Update uncommitted change detection command informat.sh#1102T.ptrandT.Tensorby @xwhzz in [Feature] Support None type as input forT.ptrandT.Tensor#1114fence_barrier_initprimitive after mbarrier init by @chengyupku in [Enhancement] Add missingfence_barrier_initprimitive after mbarrier init #1121format.shand introduce loop carry thread sync unit test by @LeiWang1999 in [CI] allow dirty workspace forformat.shand introduce loop carry thread sync unit test #1153New Contributors
Full Changelog: v0.1.6.post1...v0.1.6.post2
This discussion was created from the release v0.1.6.post2.
Beta Was this translation helpful? Give feedback.
All reactions