One-Year Roadmap #2398

k-ye · 2021-06-03T09:07:26Z

Hi,

We are sharing this one-year road map to let you know what features are planned ahead or have been actively worked on. Hopefully some of these sound interesting/exciting! Let us know what you think so that we can adjust accordingly. Thanks!

New Features

Allow adding and deleting new SNodes dynamically ([RFC] Support adding new fields dynamically #2418, SNodeTree support destruction and memory recycling #2506)
- Phase 1: LLVM
- Phase 2: the rest
Make Taichi's JIT kernels deployable without the Python execution environment (Failed to export to C code. Any method to export sparse data structure to C? #2355)
- [doc] Add a tutorial: Run Ndarray Taichi program #3908
Provide a way for Taichi to pass data with other frameworks (Add Texture, Sampling Hardware Accerlation and Mutability hints #2380)
Support vector/matrix dynamic indexing ([misc] Support access vector/matrix elements dynamically #1004)
Systematic IR type system
(Hard) Real function support ([IR] Function definition/function call support in Taichi IR #602)
Refactor the compilation workflow so that user-program errors are properly and clearly reported
- Annotate the IRs with its source code location

Backends

Vulkan backend support (Add Vulkan backend #2298)
Support pointer on Metal (SNodeType=pointer not supported on Metal #1740)
Other backends Including DX, WASM, OpenCL, etc ([Backend] DirectX (D3D11, HLSL) Backend #992)
Create a unified backend interface (Unified backend device API #2736)
- Decouple Program from each backend
- Have a uniform layer for data transfer and device to host synchronization ([Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344)

Performance

Speed up the JIT compilation time by parallel compilation
Optimize the GPU backend performance in certain scenarios ([perf] Weird performance observations on CUDA (abnormally high amount of global memory access) #2324, Low level optimization for GPU? #2395)
Continuous regression tests for the GPU backends
Simplify the condition to use BLS
Accelerate import taichi
- The ti CLI should only import taichi when necessary

Documentation

Add Python docstring for the public APIs (Add Python docstring for public APIs #2579)
Add documentation for the essential C++ APIs (e.g. Program, Kernel)
Automatically generate the documentation webpages from the Python/C++ API documentation.

Cleanups

Codebase refactoring
- Remove the global variables, especially the global program and the frontend AST context ([refactor] [ir] IR system refactorings #988)
- Clean up the frontend AST transformer
- Simplify the C++ Expr, Expression classes and the Python Expr class.
- ti.field should return an instance of either SNode instance or a dedicated field class (Currently it returns an Expr).
- Simplify ti.Matrix and ti.Vector
Modularize the taichi package hierarchy (Clean up python import statements #2223)
- Do not expose everything to the top-level namespace.
Control the symbol visibility of libtaichi_core.so and remove the linker script
Deprecate TI_NAMESPACE_{BEGIN, END}

Release

Release to Python 3.9 (Python 3.9 support on Linux #2359)
Release to Apple M1 ([Mac] Apple M1 release #2254)
Separate taichi and taichilib release (Split taichi C++ library into separate python package. #2351)
- This makes taichi platform-independent ([PyPI] [Blender] Make a platform independent wheel package by download-on-fly? #1987)

CI/Productivity

In addition, the items below are also on our radar, but we haven't thought about too much yet:

Polish the autodiff system
- Either remove/relax the Kernel Simplicity Rule, or provide a concrete error message explaining why the kernel breaks.
Modularize the C++ codebase
- Ensure that the codebase has no circular dependencies (Reduce the C++ compilation duration #2203)
- Make this a required status check before merging
- Ideally, build the logically-related source files into its own static library.
Provide a more intuitive way for the users to leverage GPU shared memory
- Rationale: If they know what shared memory is, they probably are experienced GPU programmers and care about the performance (over productivity).
Productionize the async engine
Support LLVM 12 Upgrade to LLVM12 #3281
Restore CPU vectorization
Fault-tolerance + checkpointing
Isolating CHI IR into its standalone library. However, this work are blocked on several aforementioned items:
- modularize the C++ codebase
- real function support
- C++ symbol visibility
  You

The text was updated successfully, but these errors were encountered:

ifsheldon · 2021-12-14T15:10:58Z

For the point of polishing the autodiff system, I think this kernel simplicity rule comes from the "reverse replay" (if you will call it like this) of Autodiff, if I remember the DiffTaichi paper correctly. With kernel simplicity rule, in the "reverse replay" phase, each inner iteration can have a concrete compute graph that is not dependent on anything in the outerloop iterations (since we cannot have anything in the outerloop iterations).
For example, we have nested loops like this

@ti.kernel
def broadcast_add_bad(array0: ti.template(), array1: ti.template(), matrix: ti.template()):
    for i in array0:
        num0 = array0[i]
        for j in array1:
            num1 = array1[j]
            matrix[i, j] = num0 + num1

@ti.kernel
def broadcast_add_good(array0: ti.template(), array1: ti.template(), matrix: ti.template()):
    for i in array0:
        for j in array1:
            num0 = array0[i]
            num1 = array1[j]
            matrix[i, j] = num0 + num1

Suppose we have 2 10-element arrays, then we will have 100 iterations. In the bad example, we will have 10*10 inner loop iterations bundled with 10 num0s, but in the good example, we can have independent 100 iterations.
From my guess of "reverse replay", it's understandable to have such a rule. Moreover, I think this rule actually originates from this design choice mentioned in the diffTaichi paper.

a light-weight tape that only stores function pointers and arguments for end-to-end simulation differentiation

but if you want to relax or even remove the kernel simplicity rule, you have to record more information in the tape. For this example, the tape need to record that 10 inner loop iterations are bundled to a num0. This may lead to a complete redesign of the tape and it's non-trivial I guess.

I don't know if this will help or if you know this, but you can check out Julia Flux's Zygote, which is for source-to-source AD. The link to its paper is here. Flux can handle AD on complex control flows in Julia programs, and if you consider a kernel instance as a thread of a "Julia program", then maybe you can transfer the idea of Zygote to Autodiff.

k-ye · 2022-04-15T12:06:58Z

Hi there,

Taichi has finally made its v1.0.0. We will follow the semver and make minor version bumps for feature releases. The next minor release will be v1.1.0. Closing this one now.

k-ye added welcome contribution discussion Welcome discussion! roadmap labels Jun 3, 2021

k-ye pinned this issue Jun 3, 2021

yuanming-hu mentioned this issue Jun 3, 2021

[Roadmap] v0.8 #1989

Closed

21 tasks

k-ye mentioned this issue Jun 7, 2021

[release] v0.7.21 #2409

Merged

k-ye mentioned this issue Jun 21, 2021

[release] v0.7.22 #2445

Merged

k-ye mentioned this issue Jul 2, 2021

[release] v0.7.25 #2482

Merged

ljcc0930 mentioned this issue Jul 12, 2021

[release] v0.7.26 #2511

Merged

squarefk unpinned this issue Aug 20, 2021

k-ye pinned this issue Aug 20, 2021

k-ye mentioned this issue Oct 11, 2021

[misc] Version bump: v0.8.1->v0.8.2 #3149

Merged

qiao-bo mentioned this issue Oct 14, 2021

[misc] Version bump: v0.8.2->v0.8.3 #3188

Merged

yolo2themoon mentioned this issue Oct 27, 2021

[misc] Version bump: v0.8.3->v0.8.4 #3295

Merged

Leonz5288 mentioned this issue Nov 23, 2021

[misc] Version bump: v0.8.6 -> v0.8.7 #3602

Merged

lin-hitonami mentioned this issue Dec 7, 2021

[misc] Version bump: v0.8.7 -> v0.8.8 #3736

Merged

ailzhang mentioned this issue Dec 12, 2021

[Roadmap] Long-term plan #1988

Closed

11 tasks

bobcao3 mentioned this issue Dec 13, 2021

WASM backend with webgl/webgpu support #2813

Closed

strongoier mentioned this issue Dec 21, 2021

[misc] Version bump: v0.8.8 -> v0.8.9 #3846

Merged

qiao-bo mentioned this issue Jan 4, 2022

[misc] Version bump: v0.8.9 -> v0.8.10 #3935

Merged

yolo2themoon mentioned this issue Jan 18, 2022

[misc] Version bump: v0.8.10 -> v0.8.11 #4053

Merged

This was referenced Mar 8, 2022

[misc] Version bump: v0.9.1 -> v0.9.2 #4484

Merged

[misc] Add a convenient script for testing compatibility of Taichi releases. #4485

Merged

This was referenced Mar 31, 2022

Welcome, new Taichi contributors! taichi-dev/community#2

Open

[20220404]Welcome, new Taichi contributors! taichi-dev/community#3

Open

k-ye closed this as completed Apr 15, 2022

k-ye unpinned this issue Apr 15, 2022

lucywsq mentioned this issue Aug 8, 2022

[20220808] Welcome, new Taichi contributors! taichi-dev/community#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One-Year Roadmap #2398

One-Year Roadmap #2398

k-ye commented Jun 3, 2021 •

edited by ailzhang

Loading

ifsheldon commented Dec 14, 2021 •

edited

Loading

k-ye commented Apr 15, 2022

One-Year Roadmap #2398

One-Year Roadmap #2398

Comments

k-ye commented Jun 3, 2021 • edited by ailzhang Loading

New Features

Backends

Performance

Documentation

Cleanups

Release

CI/Productivity

ifsheldon commented Dec 14, 2021 • edited Loading

k-ye commented Apr 15, 2022

k-ye commented Jun 3, 2021 •

edited by ailzhang

Loading

ifsheldon commented Dec 14, 2021 •

edited

Loading