Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][RFC] Algebraic Data Type #2175

Closed
MarisaKirisame opened this issue Nov 26, 2018 · 1 comment
Closed

[Relay][RFC] Algebraic Data Type #2175

MarisaKirisame opened this issue Nov 26, 2018 · 1 comment

Comments

@MarisaKirisame
Copy link
Contributor

MarisaKirisame commented Nov 26, 2018

[RFC]: Algebraic Data Type for Relay

Motivation:

Algebraic Data Type(ADT) allow one to express compound data structure with great expressiveness, performance, and safety.** I will explain in detail when showing what an ADT is, and what they can do.

Explanation:

Very loosely speaking, an ADT is an enum/disjoint union (think std::variant). Here is a basic usage of ADT.
The syntax is make up on the fly, and we can talk to find out one that is most consistent with relay IR text format.

data Bool = True | False

This line define a new type, called Bool. There are exactly two way to make a Bool: by calling a function called True, with no argument (True()), or False(). True and False are constructor. Internally, every constructor has a tag, and ADT is a tagged union, but we never expose this fact to relay to retain type safety.

just like struct, an ADT can has subfield as well:

data Contact = Phone Int | Email String
Contact has two constructor: Phone and Email. Phone take an Int as argument, and return a Contact, Email take String as argument.

and they can be recursive:
data IntList = IntNil | IntCons Int IntList
IntNil take no argument, and IntCons take two : Int and IntList.

Take type parameter:
data List a = Nil | Cons a (List a)
data BinTree n = BinLeaf | BinNode (BinTree n) n (BinTree n)

Has only one constructor, and include other ADT:
data Age = MakeAge Int
data People = MakePeople Contact Age
data RoseTree a = MakeRoseTree (List (RoseTree a))

By representing data using ADT, we gain safety - there are no nullptr (so no NPE), and unlike tagged union, it is impossible to write a relay program that have the tag of 1 constructors, and subfield of another constructor.

ADT has more benefit when we combine them with pattern matching, the canonical way to use adt.
I will follow with some example which show what is pattern matching, and show why they allow better performance, expressiveness, and safety.

data Nat = Z | S Nat define natrual number
def @isZero(n) { n match { Z() => True() | S(_) => False() } } test whether the argument is Z.
We can read it as:
if n is constructed by Z(), the match expression evaluate to True()
if n is constructed by S(someNat), the match expression evaluate to False()
This give us safety over tag, as we can statically enforce all case are matched (or turn such exhaustiveness check off)

def @isEven(n) { n match { Z() => True() | S(Z()) => False() | S(S(m)) => isEven(m) } } test whether the argument is Even number.
We can read it as:
if n is constructed by Z(), evaluate to True()
if n is constructed by S(Z()), evaluate to False()
if n is constructed by S(S(someNat)), evaluate to isEven(someNat). Note that m is binded to someNat when we do pattern matching - this is not just another way to write if!
We can statically enforce m is of type Nat, or S can only take another subpattern, so there are no way to use invalid data type, compared to raw tagged union.
Another thing is, by expressing selection by pattern matching, compiler can batch the two S test into one (or even compile a decision tree to minimize amount of tag testing). Thus pattern matching is quicker then naive nested if dispatch.

armed with higher order function, we can maintain extreme expressiveness by defining NLP task as ADT, and define neural network using preexisting function, as suggested by http://colah.github.io/posts/2015-09-NN-Types-FP/.

For example, ignoring weight issues (which I have a potential solution, and I will open PR shortly), we can define
iterate : (Int, ((a) -> Tuple<a, b>), a) -> List b,
rnn_cell : (Tuple<hidden_type, input_type>) -> Tuple<Tuple<hidden_type, input_type>, output_type>
and define def char_rnn_generator : (h : hidden_type, i : input_type) : List output_type { iterate(20, rnn_cell, <h, i>) }
the a for iterate is Tuple<hidden_type, inut_type>, and b is output_type
Another example is, we can define treelstm as data RoseTree a = MakeRoseTree (List (RoseTree a)), and sum child map will be map : ((a -> b), List a) -> List b into all subrosetree (with mapper as itself, indirect recursion), followed by a reduce : (((a) -> a), List a, a) -> a (with reducer as sum).

Implementation

0: Add TypeCall, which take a std::vector
1: Add GlobalTypeVar, which is similar to GlobalVar but point to type
2: In unifier, unify inside TypeCall
3: Add TypeData, which hold a GlobalTypeVar, a std::vector
4: TypeConstructor hold a std::vector which is it's subfield, and it's name hint
5: Get Environment to store GlobalTypeVar -> TypeConstructor mapping, and KindCheck GlobalTypeVar
5: Add ConstructorVal, which hold a tag of the constructor, and std::vector child
6: Add Pattern, which is a family of Node outside of Type/Expr, and Match, an expression.
7: Get typechecking to work with pattern/match, and expand Interpreter too.

@tqchen
Copy link
Member

tqchen commented Feb 18, 2019

closed by #2442

@tqchen tqchen closed this as completed Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants