[Relay][RFC] Algebraic Data Type #2175

MarisaKirisame · 2018-11-26T20:53:25Z

[RFC]: Algebraic Data Type for Relay

Motivation:

Algebraic Data Type(ADT) allow one to express compound data structure with great expressiveness, performance, and safety.** I will explain in detail when showing what an ADT is, and what they can do.

Explanation:

Very loosely speaking, an ADT is an enum/disjoint union (think std::variant). Here is a basic usage of ADT.
The syntax is make up on the fly, and we can talk to find out one that is most consistent with relay IR text format.

data Bool = True | False

This line define a new type, called Bool. There are exactly two way to make a Bool: by calling a function called True, with no argument (True()), or False(). True and False are constructor. Internally, every constructor has a tag, and ADT is a tagged union, but we never expose this fact to relay to retain type safety.

just like struct, an ADT can has subfield as well:

data Contact = Phone Int | Email String
Contact has two constructor: Phone and Email. Phone take an Int as argument, and return a Contact, Email take String as argument.

and they can be recursive:
data IntList = IntNil | IntCons Int IntList
IntNil take no argument, and IntCons take two : Int and IntList.

Take type parameter:
data List a = Nil | Cons a (List a)
data BinTree n = BinLeaf | BinNode (BinTree n) n (BinTree n)

Has only one constructor, and include other ADT:
data Age = MakeAge Int
data People = MakePeople Contact Age
data RoseTree a = MakeRoseTree (List (RoseTree a))

By representing data using ADT, we gain safety - there are no nullptr (so no NPE), and unlike tagged union, it is impossible to write a relay program that have the tag of 1 constructors, and subfield of another constructor.

ADT has more benefit when we combine them with pattern matching, the canonical way to use adt.
I will follow with some example which show what is pattern matching, and show why they allow better performance, expressiveness, and safety.

data Nat = Z | S Nat define natrual number
def @isZero(n) { n match { Z() => True() | S(_) => False() } } test whether the argument is Z.
We can read it as:
if n is constructed by Z(), the match expression evaluate to True()
if n is constructed by S(someNat), the match expression evaluate to False()
This give us safety over tag, as we can statically enforce all case are matched (or turn such exhaustiveness check off)

def @isEven(n) { n match { Z() => True() | S(Z()) => False() | S(S(m)) => isEven(m) } } test whether the argument is Even number.
We can read it as:
if n is constructed by Z(), evaluate to True()
if n is constructed by S(Z()), evaluate to False()
if n is constructed by S(S(someNat)), evaluate to isEven(someNat). Note that m is binded to someNat when we do pattern matching - this is not just another way to write if!
We can statically enforce m is of type Nat, or S can only take another subpattern, so there are no way to use invalid data type, compared to raw tagged union.
Another thing is, by expressing selection by pattern matching, compiler can batch the two S test into one (or even compile a decision tree to minimize amount of tag testing). Thus pattern matching is quicker then naive nested if dispatch.

armed with higher order function, we can maintain extreme expressiveness by defining NLP task as ADT, and define neural network using preexisting function, as suggested by http://colah.github.io/posts/2015-09-NN-Types-FP/.

For example, ignoring weight issues (which I have a potential solution, and I will open PR shortly), we can define
iterate : (Int, ((a) -> Tuple<a, b>), a) -> List b,
rnn_cell : (Tuple<hidden_type, input_type>) -> Tuple<Tuple<hidden_type, input_type>, output_type>
and define def char_rnn_generator : (h : hidden_type, i : input_type) : List output_type { iterate(20, rnn_cell, <h, i>) }
the a for iterate is Tuple<hidden_type, inut_type>, and b is output_type
Another example is, we can define treelstm as data RoseTree a = MakeRoseTree (List (RoseTree a)), and sum child map will be map : ((a -> b), List a) -> List b into all subrosetree (with mapper as itself, indirect recursion), followed by a reduce : (((a) -> a), List a, a) -> a (with reducer as sum).

Implementation

0: Add TypeCall, which take a std::vector
1: Add GlobalTypeVar, which is similar to GlobalVar but point to type
2: In unifier, unify inside TypeCall
3: Add TypeData, which hold a GlobalTypeVar, a std::vector
4: TypeConstructor hold a std::vector which is it's subfield, and it's name hint
5: Get Environment to store GlobalTypeVar -> TypeConstructor mapping, and KindCheck GlobalTypeVar
5: Add ConstructorVal, which hold a tag of the constructor, and std::vector child
6: Add Pattern, which is a family of Node outside of Type/Expr, and Match, an expression.
7: Get typechecking to work with pattern/match, and expand Interpreter too.

The text was updated successfully, but these errors were encountered:

tqchen · 2019-02-18T21:59:03Z

closed by #2442

tqchen added the status: RFC label Nov 26, 2018

MarisaKirisame mentioned this issue Dec 5, 2018

[Relay][RFC] Automatic Differentiation #2237

Closed

slyubomirsky mentioned this issue Jan 16, 2019

[Relay] Algebraic data types #2442

Merged

4 tasks

slyubomirsky mentioned this issue Feb 8, 2019

[Relay][Docs] Documentation for Algebraic Data Types #2575

Merged

tqchen closed this as completed Feb 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay][RFC] Algebraic Data Type #2175

[Relay][RFC] Algebraic Data Type #2175

MarisaKirisame commented Nov 26, 2018 •

edited

Loading

tqchen commented Feb 18, 2019

[Relay][RFC] Algebraic Data Type #2175

[Relay][RFC] Algebraic Data Type #2175

Comments

MarisaKirisame commented Nov 26, 2018 • edited Loading

Motivation:

Explanation:

Implementation

tqchen commented Feb 18, 2019

MarisaKirisame commented Nov 26, 2018 •

edited

Loading