-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TVM v0.5 Roadmap #1596
Comments
Shall we add heterogeneous graph runtime? @zhiics is working on that. |
I am interested in implementing the Intel CPU support for INT8 quantization |
I'm interested in implementing the RUST runtime. |
@tqchen @siju-samuel My Rust runtime (dylib) support which follows the same generic API as Java for example (CPU, GPU, etc.) is 70%-ish done! I'll need to finish the callback support, add docs and cleanup. Any contributions is welcomed! @nhynes Rust static support is in a good shape as well but is specific to CPU with custom allocator etc. |
@ehsanmok OK |
@tqchen I have started working 8 bit quantizer and its operator support for conv2d, dense and relu. To avoid duplicate work pls let me know if anyone else is doing this work. |
@nhynes I meant you've defined your own allocator, threading, parallel backend support for CPU usage only for staticlib compiling with xargo while I've taken different route relying on existing layeouts for example and seems working for GPU. Though I admit I've done the project for my own enrichment first. |
@PariksheetPinjari909 the UW SAML team is working on a generic n-bit quantizer and hopefully things will get RFCed and upstreamed in this release cycle |
Please feel free to open new issues to track the working items, @siju-samuel standalone RPC is tracked by #1496 |
The first post contains an initial list of things based on the community feedback, please also feel free to propose new things and we will add it to the roadmap |
Will the new graph runtime make it into this release? I'd love to upstream some training codes, but they all depend on the semi-kluge |
@nhynes it belongs to the "high-level IR improvements" |
@tqchen Ok. Let me know what support i can give in 8 bit quantization. I am interested to contribute here. |
I would like to take up the control flow ops. Let me know if someone is working on that. |
@PariksheetPinjari909 We will make a major RFC to upgrade the IR system including control flow ops and type system, and after the first phase proposal is done, everyone is welcomed to contribute |
Sorry for being late. I’d like to add preliminary support for HLS shecudler to allow compiling actual neural networks with AOCL and SDAccel backends. |
int8 cuda gemm recipe #1614 |
Re microkernels/tensorization, I've been looking at that stuff the last few months or so. There's some WIP stuff in https://github.com/ajtulloch/tvm/tree/tvm-using-val/tensorize, notably well-tuned assembly versions of:
My hypothesis is that we can get a pretty decent part of the way with just GEMM microkernels for a lot of these dense workloads, but it's to-be-tested currently. Some examples of using them in GEMM-based convs and for the batch gemm of a minimal F(6x6, 3x3) Winograd (~2-3x faster than current trunk on most configurations for ARMv7) are in that dir as well. For folks interested in the "Micro-asm kernel exploration" and "8-bit network stuff" (esp on CPUs), it'd be good to collaborate :). |
@ajtulloch I am working on Intel 8-bit Conv implementation using Intel Skylake AVX512 instructions (with the long-term goal of using VNNI instructions). I am not using GEMM-based convolution though. I am starting from NCHWc format direct convolution present in current conv2d topi implementation. I should have some numbers for the conv operator by the next weekend and can share them. |
@ajtulloch It will be great if you can send a tutorial or topi recipe |
@anijain2305 you might find https://github.com/ajtulloch/tvm/blob/tvm-using-val/tensorize/gemm__avx2.c#L424-L531 or a similar microkernel for AVX512 useful on Skylake (same as MKL-DNN's vpmaddubsw/vpmaddwd/vpaddd sequence on AVX2/AVX512 pre VNNI). @merrymercy what would be useful to have documented/tutorialized or made into a recipe? |
I think making a simple runnable conv2d example and showing its speedup will be very useful. |
+1 to one conv2d runnable example. Besides ARMv7 / AVX2, I think we should also add SSE too. For some embbeding platforms, which would use Intel ATOM processors. However, Intel ATOM processors only support SSE4.2 at most, not AVX2. |
0.5 release note candidate is now up at #2448 |
v0.5 is now tagged, next cycle roadmap issue is available at #2623 |
This roadmap for TVM v0.5. TVM is a community-driven project and we love your feedback and proposals on where we should be heading. Please open up discussion in the discussion forum as well as bring RFCs.
Features
The text was updated successfully, but these errors were encountered: