Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One-Year Roadmap #2398

Closed
27 of 43 tasks
k-ye opened this issue Jun 3, 2021 · 2 comments
Closed
27 of 43 tasks

One-Year Roadmap #2398

k-ye opened this issue Jun 3, 2021 · 2 comments

Comments

@k-ye
Copy link
Member

k-ye commented Jun 3, 2021

Hi,

We are sharing this one-year road map to let you know what features are planned ahead or have been actively worked on. Hopefully some of these sound interesting/exciting! Let us know what you think so that we can adjust accordingly. Thanks!


New Features

Backends

Performance

Documentation

  • Add Python docstring for the public APIs (Add Python docstring for public APIs #2579)
  • Add documentation for the essential C++ APIs (e.g. Program, Kernel)
  • Automatically generate the documentation webpages from the Python/C++ API documentation.

Cleanups

  • Codebase refactoring
    • Remove the global variables, especially the global program and the frontend AST context ([refactor] [ir] IR system refactorings #988)
    • Clean up the frontend AST transformer
    • Simplify the C++ Expr, Expression classes and the Python Expr class.
    • ti.field should return an instance of either SNode instance or a dedicated field class (Currently it returns an Expr).
    • Simplify ti.Matrix and ti.Vector
  • Modularize the taichi package hierarchy (Clean up python import statements #2223)
    • Do not expose everything to the top-level namespace.
  • Control the symbol visibility of libtaichi_core.so and remove the linker script
  • Deprecate TI_NAMESPACE_{BEGIN, END}

Release

CI/Productivity

  • CI presubmit improvements
  • Reduce the number of CI platforms in use
    • Try to get rid of Appveyor
  • Enforce license header for each source code file
  • Provide a conda environment config file that automatically sets up the developer installation steps.
  • Move CI related scripts to ci/

In addition, the items below are also on our radar, but we haven't thought about too much yet:

  • Polish the autodiff system
    • Either remove/relax the Kernel Simplicity Rule, or provide a concrete error message explaining why the kernel breaks.
  • Modularize the C++ codebase
    • Ensure that the codebase has no circular dependencies (Reduce the C++ compilation duration #2203)
    • Make this a required status check before merging
    • Ideally, build the logically-related source files into its own static library.
  • Provide a more intuitive way for the users to leverage GPU shared memory
    • Rationale: If they know what shared memory is, they probably are experienced GPU programmers and care about the performance (over productivity).
  • Productionize the async engine
  • Support LLVM 12 Upgrade to LLVM12 #3281
  • Restore CPU vectorization
  • Fault-tolerance + checkpointing
  • Isolating CHI IR into its standalone library. However, this work are blocked on several aforementioned items:
    • modularize the C++ codebase
    • real function support
    • C++ symbol visibility
      You
@ifsheldon
Copy link

ifsheldon commented Dec 14, 2021

For the point of polishing the autodiff system, I think this kernel simplicity rule comes from the "reverse replay" (if you will call it like this) of Autodiff, if I remember the DiffTaichi paper correctly. With kernel simplicity rule, in the "reverse replay" phase, each inner iteration can have a concrete compute graph that is not dependent on anything in the outerloop iterations (since we cannot have anything in the outerloop iterations).
For example, we have nested loops like this

@ti.kernel
def broadcast_add_bad(array0: ti.template(), array1: ti.template(), matrix: ti.template()):
    for i in array0:
        num0 = array0[i]
        for j in array1:
            num1 = array1[j]
            matrix[i, j] = num0 + num1

@ti.kernel
def broadcast_add_good(array0: ti.template(), array1: ti.template(), matrix: ti.template()):
    for i in array0:
        for j in array1:
            num0 = array0[i]
            num1 = array1[j]
            matrix[i, j] = num0 + num1

Suppose we have 2 10-element arrays, then we will have 100 iterations. In the bad example, we will have 10*10 inner loop iterations bundled with 10 num0s, but in the good example, we can have independent 100 iterations.
From my guess of "reverse replay", it's understandable to have such a rule. Moreover, I think this rule actually originates from this design choice mentioned in the diffTaichi paper.

a light-weight tape that only stores function pointers and arguments for end-to-end simulation differentiation

but if you want to relax or even remove the kernel simplicity rule, you have to record more information in the tape. For this example, the tape need to record that 10 inner loop iterations are bundled to a num0. This may lead to a complete redesign of the tape and it's non-trivial I guess.

I don't know if this will help or if you know this, but you can check out Julia Flux's Zygote, which is for source-to-source AD. The link to its paper is here. Flux can handle AD on complex control flows in Julia programs, and if you consider a kernel instance as a thread of a "Julia program", then maybe you can transfer the idea of Zygote to Autodiff.

@k-ye
Copy link
Member Author

k-ye commented Apr 15, 2022

Hi there,

Taichi has finally made its v1.0.0. We will follow the semver and make minor version bumps for feature releases. The next minor release will be v1.1.0. Closing this one now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants