Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtyping #8

Open
shelby3 opened this issue Sep 21, 2016 · 1,023 comments
Open

Subtyping #8

shelby3 opened this issue Sep 21, 2016 · 1,023 comments

Comments

@shelby3
Copy link

shelby3 commented Sep 21, 2016

Union types express a subtyping relationship, but I am unclear as to whether typeclasses (i.e. Rust's traits) do?

If a trait B extends another trait A and B reuses the implementations of A, can we assign a trait object that has a bound B to a trait object that has a bound A?

Seems the answer based on prior discussion is yes. But that is a subtyping relationship, which means we would need to deal with covariance on type parameters both when they are trait objects and when they are unions. Correct?

Prior discussion:
#6 (comment)
#1 (comment)
#2 (comment)
#1 (comment)

@shelby3
Copy link
Author

shelby3 commented Sep 21, 2016

@keean wrote:

It will be interesting to see if we still need-trait objects

They are the only way (other than subclassing) I can see to get variance in the polymorphism, i.e. we can assign new types into the trait object (Rust's name for an "existential type" with a typeclass bound) without breaking the invariance of the Array.

The advantage of trait objects compared to subclasses, is that types that can be inserted into a trait object don't have to conform to a pre-existing subclass hierarchy, i.e. it solves some aspect of the Expression Problem¹ but doesn't solve the scenario that unions solve. I am thinking we can view trait objects and unions as complementary in functionality offering different trade-offs on the multi-dimensional design space.

  1. Specifically when we use trait objects for polymorphism, we can add new operations to existing data types (i.e. the trait bound of the trait object), and we can add new data types to existing operations (i.e. implement data types for existing traits), but we can not add new operations to an existing trait object (because the data types have been erased from the static compile-time knowledge of the trait bound type of the reference when assigned to the trait object). And that is the limitation of trait objects that my solution addresses with unions and delayed binding of the union to a trait bound at the function call site. OTOH, unlike unions employed in my solution, trait objects do not require invariant containers in order to add heterogeneous data types to the container because the type of the trait object is invariant. An invariant container is for example a List. Trait objects would also work with an Array which is not an invariant data structure. Apology I am not employing academic terminology such as kinds, rank, F-bounded, etc.. Someone else can do a better job of translating my solution into that terminology.

@keean
Copy link
Owner

keean commented Sep 21, 2016

@shelby3 I am thinking along the same lines. As far as I understand it there is no subtyping. With type classes when we say trait B extends trait A we mean that any implementation of trait B for a type requires there to also be an implementation of trait A, but that is it.

This applies to trait-bounds (type constraints) and trait-objects. Trait-objects are not really a valid type-system concept, these are more correctly called existential types. In haskell if we have an existential data type like this:

data Singleton = forall a. (TraitB a) => Singleton a

The forall a introduces a new scoped type variable, and because its scope is the container, we cannot ever from outside the container know the type of a, but it is constrained to implement trait B and that implies it also must implement trait A. This means we can call any function from the interfaces trait A and trait B on the contents of the container. An alternative way of writing the above (not valid Haskell) is:

data Singleton = Singleton (exists a . (TraitB a) => a)

which is where the term 'existential type' comes from. Generally forall corresponds to the logical concept of any (but may be none), and exists corresponds to the logical concept of some (but at least one).

@shelby3
Copy link
Author

shelby3 commented Sep 21, 2016

This was referenced Sep 21, 2016
@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

I didn't see any mention of type-classes when I looked at Ceylon. It also looks like Ceylon provides classical object inheritance and subtyping, which are both things ZenScript aims to avoid to keep the language small and simple.

Don't we need to be careful about differentiating between subclassing versus subtyping?

ZenScript will have subtyping because it will offer structural unions. An Int is a subtype of an Int|String. ZenScript will not have subclassing.

Ceylon has anonymous structural unions, but it doesn't have typeclasses. <truthful hyperbole>Also it has that Jurassic, ossifying, rigor mortise, anti-pattern subclassing paradigm, which will infect with that said virus any program written with Ceylon. Ditto every statically typed language that offers subclassing including Java and Scala (because programmers won't be disciplined enough to not use the subclassing virus in their design patterns).</truthful hyperbole> 😜


Update:

@naasking wrote:

Ceylon models Java's subtyping via first-class union and intersection types. It's not at all a classical subtyping model.

He replied as quoted above after I wrote the above. Please don't use the word 'subtyping' where you really mean 'subclassing'.

@naasking is apparently unaware of what the programmer can't accomplish with unions and intersections in terms of the Expression Problem if the language does not have the delayed binding of implementation to interface that typeclasses provide.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

A typeclass's type parameters can't express genericity that doesn't specialize on implementation. Thus we can't for example parametrize a typeclass trait List<A> to track the type of the elements stored in the list.

Thus we must use a data type to express this:

data List a = Nil | Cons a (List a)

Or add member property names if don't want to be forced to destructure with pattern matching:

data List a = Nil | Cons { head :: a, tail :: (List a) }

In ZenScript perhaps:

data List<A> = Nil | Cons(A, List<A>)

Or perhaps:

interface List<A>
singleton Nil implements List<Never>  // Never is¹ the bottom ⊥ type
sealed class Cons<A>(head: A, tail: List<A>) implements List<A>

Note afaik, Haskell's data type expresses a 'hasA' not an 'isA' subclassing relationship between Cons and List and a can not be heterogeneous because Haskel doesn't have a first-class union type (their global inference doesn't allow for it). If we use this syntax in ZenScript then when instantiating a Cons("string", Cons(1, Nil)) then the type will be inferred List<Number|String>. And if we instantiate first a let x = Cons(1, Nil) its type will be inferred List<Number>. And if we then instantiate Cons("string", x) then the type will be inferred List<Number|String>.

But note even though the above is subtyping, there are no virtual methods on data types, thus no virtual inheritance and no subclassing. Of course any function can input a data type, so that is synonymous with a method, except it isn't virtual dispatch.


  1. Since an eager, CBV language makes Bottom an effect, I have argued the name of the Bottom type should be Never and not Nothing (which appears to be the choice TypeScript made). Bottom is for example the type of functions that never return (terminate). Whereas, in a lazy, CBN language such as Haskell, Bottom becomes a value, so I argue it should be named Nothing and not Never.

@keean
Copy link
Owner

keean commented Sep 22, 2016

Or, with types

data Nil = Nil
data Cons<A, B> = Cons<A, B>
trait List A
impl List Nil
impl<A, B : List<A>> List Cons<A, B>

@keean
Copy link
Owner

keean commented Sep 22, 2016

As a single type with multiple constructors (tagged union):

data List<A> = Nil | Cons<A, List<A>>

I'm still not sure about which keywords we should be using for trait, impl and data.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

data Cons<A, B> = Cons<A, B>

data List<A> = Nil | Cons<A, List<A>>

That seems incorrect. It doesn't tell me how to construct an instance of Cons. Mine was correct:

data List<A> = Nil | Cons(A, List<A>)

And we can add member property names if we don't want to be forced to employ pattern matching to destructure:

data List<A> = Nil | Cons(head: A, tail: List<A>)

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

data Nil = Nil
data Cons<A, B> = Cons<A, B>
trait List A
impl List Nil
impl<A, B : List<A>> List Cons<A, B>

That seems incorrect. I think it should be instead:

data Nil = Nil
data Cons<A> = Cons(A, List<A>)  // Edit: `List` is a typo and should be `Cons` per subsequent discussion
trait List
impl List Nil
impl<A> List Cons<A>

Remember the trait List should know nothing about A if its methods don't specialize on A.

@shelby3 wrote:

A typeclass's type parameters can't express genericity that doesn't specialize on implementation. Thus we can't for example parametrize a typeclass trait List<A> to track the type of the elements stored in the list.

Note I'd prefer to write that:

pluggable List
implement Nil for List
implement Cons<A> for List   // no need to write the type parameters twice per my
                             // proposal¹ that all UPPERCASE names are type parameters

Or much better:

pluggable List
Nil implements List
Cons<A> implements List

I think the last is best because it remains very similar to Java, yet we change the meaning of what is being implemented from interface ('isA' relationship) to pluggable ('hasA' relationship).

Thinking about a typeclass as a pluggable interface seems very intuitive. We can't use Rust's trait because trait has a different meaning in several other languages.

Q: "What is a pluggable API?"
A: "It means that you can replace the implementation."

Remember we both decided that clarity trumps brevity (e.g. implement instead of impl), especially for syntax which is not expression-level (because such declarations won't appear often because most frequently appear in source code are expressions).

  1. Function Syntax #6 (comment)

@keean
Copy link
Owner

keean commented Sep 22, 2016

Not quite :-) some things to discuss. You have given value constructors round brackets, that seems okay to me.

data List<A> = Nil | Cons(head: A, tail: List<A>)

Normally the arguments to cons are positional like function arguments, and deconstructed by pattern matching. You would use record syntax to name them, so either of the following:

data List<A> = Nil | Cons(A, List<A>)
data List<A> = Nil | Cons {head: A, tail: List<A>}

We don't have to stick to that but it's how I was thinking.

This has more problems:

data Nil = Nil
data Cons<A> = Cons(A, List<A>) // list is not a type
trait List
impl List Nil
impl<A> List Cons<A> // cons needs two type parameters

So correcting this:

data Nil = Nil
data Cons<A, B> = Cons(A, B) // constraints on data bad.
trait List<A>
impl List<A> for Nil
impl<B : List<A>> List<A> Cons<A, B>

Note this is still Rust syntax that gives special treatment to the first type class parameter, and I am not sure that is best, but let's have a different topic for that when we have agreed this.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

You have given value constructors round brackets, that seems okay to me.

Yeah to differentiate them from type constructors, and because value constructors in the Java-like languages use round brackets (aka parenthetical grouping).

Normally the arguments to Cons are positional like function arguments, and deconstructed by pattern matching. You would use record syntax to name them, so either of the following:

You are repeating what I wrote. I even linked to the Haskell record syntax upthread.

However the following is not naming the members of Cons (rather is only providing their positions and types), and can only be destructured with pattern matching as I already wrote in my prior comment:

data List<A> = Nil | Cons(A, List<A>)

And to stick with the Java-like syntax (and not mixing in Haskell syntax), I would prefer the following which I think will be much more clear to mainstream programmers coming from popular programming languages:

data List<A> = Nil | Cons(head: A, tail: List<A>)

The following is mixing a JavaScript unnamed Object with some new concept of a tag of Cons, which has no analogous concept to people using JavaScript or OOP languages (and our guiding principle is not to introduce unnecessary syntax, i.e. the new concept of a { ... } where we don't need to):

data List<A> = Nil | Cons {head: A, tail: List<A>}

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

This has more problems:

data Nil = Nil
data Cons<A> = Cons(A, List<A>) // list is not a type
trait List
impl List Nil
impl<A> List Cons<A> // cons needs two type parameters

I had a typo and Cons does not need two type parameters (two would mess up other things):

data Nil = Nil
data Cons<A> = Cons(A, Cons<A>)
trait List
impl List Nil
impl<A> List Cons<A>

The type of the type parameter A in Cons<A> will be subsumed to the GLB of the union of two types used to construct a Cons. I had already explained that as follows.

@shelby3 wrote:

If we use this syntax in ZenScript then when instantiating a Cons("string", Cons(1, Nil)) then the type will be inferred List<Number|String>. And if we instantiate first a let x = Cons(1, Nil) its type will be inferred List<Number>. And if we then instantiate Cons("string", x) then the type will be inferred List<Number|String>.

Note in the above quoted text, I was referring to a data type List not a typeclass List. Refer to that quoted comment for the declaration I employed there (which differs from the List in this comment).


@keean wrote:

So correcting this:

That is still incorrect. You have a type parameter A on typeclass List which I already explained (and you even agreed!) is incorrect as follows.

@shelby3 wrote:

Remember the trait List should know nothing about A if its methods don't specialize on A.

@shelby3 wrote:

A typeclass's type parameters can't express genericity that doesn't specialize on implementation. Thus we can't for example parametrize a typeclass trait List<A> to track the type of the elements stored in the list.

Follow the link in the above quote to see where you had agreed. In fact, you were the one who explained the issue to me. And now it seems you forget what you explained to me.

Actually the above is revealing a deeper issue to me about higher-kinds which I had realized when I woke up this morning. I am preparing to write about that.

@keean
Copy link
Owner

keean commented Sep 22, 2016

The {} have the same use in 'C' for structs, C++ and Java for object definition so they are not new as such.

In C, C++ and Rust we would write:

struct Cons { 
    head : Int, // this syntax is different for C
}

Which we are writing:

data Cons = Cons {
    head : Int
}

Or

data Cons = Cons (
    head : Int
)

I am happy with either, providing the field names are optional, but I wanted to point out that the data statement can be viewed as an extension of struct and object definition.

@keean
Copy link
Owner

keean commented Sep 22, 2016

@shelby3 wrote

I had a typo and Cons does not need two type parameters (two would mess up other things):

Yes it does, what you wrote cannot ever end in a Nil.

@keean
Copy link
Owner

keean commented Sep 22, 2016

@shelby3 this version is correct in Rust syntax:

data Nil = Nil
data Cons<A, B> = Cons(A, B) // constraints on data bad.
trait List<A>
impl List<A> for Nil
impl<B : List<A>> List<A> Cons<A, B>

Cons needs a second type parameter because B can either be another Cons or a Nil which are different types.

The trait List needs a type parameter for the type that in the list, which is not the same as the type which is a member of the class (that is Nil or Cons<A, B>)

When we add Cons to the List type class we need to constrain B to be in the List type class so that you cannot put any random type as the second Cons parameter.

@keean
Copy link
Owner

keean commented Sep 22, 2016

@shelby3 wrote:

A typeclass's type parameters can't express genericity that doesn't specialize on implementation. Thus we can't for example parametrize a typeclass trait List to track the type of the elements stored in the list

I guess I was wrong, a multi parameter type class can represent an arbitrary relation on types. I must have been sleepy when I agreed :-)

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

Yes it does, what you wrote cannot ever end in a Nil.

Yup. Your idea was fundamentally flawed. We can't express a generic List type as typeclass interface. You'd have to implement the List for every possible data type you can put into a List which is the antithesis of a generic List. I was responding that yours was incorrect and I was demonstrating that if I try to write it correctly, I can't.

That is why I had written (before you commented with the erroneous idea) the correct way to define a generic List as follows.

@shelby3 wrote:

In ZenScript perhaps:

data List<A> = Nil | Cons(A, List<A>)

Or perhaps:

interface List<A>
singleton Nil implements List<Never>  // Never is¹ the bottom ⊥ type
sealed class Cons<A>(head: A, tail: List<A>) implements List<A>

@keean wrote:

A typeclass's type parameters can't express genericity that doesn't specialize on implementation. Thus we can't for example parametrize a typeclass trait List to track the type of the elements stored in the list.

I guess I was wrong, a multi parameter type class can represent an arbitrary relation on types. I must have been sleepy when I agreed :-)

You weren't wrong the first time. It makes no sense to specialize the List on every data type we can add to the List.

@keean
Copy link
Owner

keean commented Sep 22, 2016

Sorry you are wrong here. You can express a list as a type class and I have done it in Haskell and Rust. The HList paper I co-wrote with Oleg Kiselyov makes extensive use of this.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

In C, C++ and Rust we would write:

struct Cons { 
    head : Int, // this syntax is different for C
}

Seems I recall that in the early days of C, it was only possible to use typedef to give a name to a struct.

I forget about struct because I rarely code in C any more (and C++ I haven't touched since I stopped coding CoolPage in 2002). And when I think about struct from C, I don't think in terms of a language with objects and high-order typing concepts, since Java, Scala don't have struct. So I guess that is why I didn't relate it. And afaik, { ... } in JavaScript is not tagged with a name, e.g. Cons.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

Sorry you are wrong here. You can express a list as a type class and I have done it in Haskell and Rust. The HList paper I co-wrote with Oleg Kiselyov makes extensive use of this.

I will need to review this, so I can comment meaningfully. Where may I read the most succinct example which shows how I won't have to specialize the typeclass list for every data type I want to put into the list?

I presume a lot of HList boilerplate again?

Edit: I suppose the point I am making is that we are trying to eliminate boilerplate for ZenScript. If you are expecting mainstream programmers to use HList, I doubt it. But I need to review the examples before I can comment not just from guessing. The link above is I think probably instructive about this.

@keean
Copy link
Owner

keean commented Sep 22, 2016

In Haskell this:

data Nil = Nil
data Cons a b = Cons a b
class List a
instance List Nil
instance (List b) => List (Cons a b)

We can look at Peano numbers as another example:

data Z = Z
data S x = S x
class Nat n
instance Nat Z
instance (Nat a) => Nat (S a)

Note the main difference between Haskell type classes an rust traits syntactically is a rust trait has a concept of 'self' but Haskell does not. You can liken this to function syntax:

x.f(y) // object special syntax like Rust
f(x, y) // all parameters equal, better for multiple dispatch

Likewise with type classes rust makes the first type parameter special, so the Peano numbers above become:

struct Z {}
struct<A> S (A) // Rust tuple syntax
trait Nat // note no type parameter
impl Nat for Z
impl<A : Nat> Nat for S<A>

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@shelby wrote:

You weren't wrong the first time. It makes no sense to specialize the List on every data type we can add to the List.

.

I presume a lot of HList boilerplate again?

Edit: I suppose the point I am making is that we are trying to eliminate boilerplate for ZenScript.

I expect you are taking what should be an orthogonal concept of a generic list and binding it to the data type in the list, and then using some boilerplate scaffolding to simulate genericity? This appears to be the basic theme of HList concepts as far as I can discern thus far (I may be wrong?), to sidestep a weakness in the type system and simulate type system in code with scaffolding?

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

I had started to sniff a problem yesterday. I was starting to realize we probably have an unsolved problem in the design incorporating first-class anonymous structural unions.

@keean wrote:

In Haskell this:

data Nil = Nil
data Cons a b = Cons a b
class List a
instance List Nil
instance (List b) => List (Cons a b)

We need to remember that Haskell does not allow heterogeneous unions, because I've read that at least it would break the global inference of Haskell.

Thus afaik in the above b will also be the same as a | Nothing which is just a where Nothing is at the top of all types (because Haskell's call-by-name type system is inverted so we use Bottom type where we would use Top type in a call-by-value language1).

So afaics, that is not specializing the List typeclass for every data type a that can be put into the List because only one homogeneous type can be put into any list object due to Haskell's type system restrictions (lack of a first-class anonymous structural union type). There will ever be only two implementations (aka instances) of List: Nil and List (Cons a b) where b is (List a) | Nil and Nil is List Nothing.

I presume the same for Rust, but bottom type instead of top.

But for ZenScript we are proposing to support heterogeneous lists, so I am trying figure out now what changes and what the problems and challenges are. I am thinking we will need higher-kinded types and there may be other problems.

I suppose you are saying we can simulate heterogeneous lists with HList concepts, but the point of the first-class union was to eliminate that boilerplate and make everything more extensible as I had attempted to explain/discuss at the Rust forum:

I presume a lot of HList boilerplate again?

It is possible you didn't realize how extensively I wanted to use the first-class unions. Perhaps you were thinking we'd be using HList concepts instead?

We have design quagmire now. I am trying to get my mind wrapped around it. I am suspecting we have failure now in my concept, but I need to confirm by understanding this design quagmire fully.


  1. Something I published at the now defunct copute.com in 2011 when I was teaching myself some type theory (can still be found on archive.org):

    Inductive and coinductive types are categorical duals (if they produce the same objects in reversed partial-order), because inductive and coinductive category morphism functions have reversed directions[8]. The initial fixedpoint must be the least in the partial-order, thus inductive types have objects which are the output of a unique morphism function (i.e. the algebra recursively) that inputs the initial fixedpoint. Dually, the final object must be the greatest in the partial-order, thus the coinductive types have objects which are the side-effect "output" of a unique morphism function (i.e. the coalgebra recursively), which terminates with the final object when an object of the type is destructed.

    Since Monad and Comonad are categorical duals, they compose on outputs or inputs respectively.

    [8] Declarative Continuations and Categorical Duality, Filinski, section 1.3.1 Basic definitions.

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

Even more edits to my prior comment. I am suspecting potential failure of my design concept. :hurtrealbad: 😭

@shelby3
Copy link
Author

shelby3 commented Sep 22, 2016

@keean wrote:

We can look at Peano numbers as another example:

Haskell doesn't have any inductive types, thus it doesn't have (the type of) Peano numbers.

We have to be careful when using Haskell's coinductive call-by-name type system where laziness-by-default makes non-termination a value, as a model for an inductive type system with eager evaluation strategy by default that adds first-class unions. Many aspects appear to change.

@keean
Copy link
Owner

keean commented Sep 22, 2016

A couple of things.

First I think we should support multi-parameter type classes (with type-families, aka associated types) in full. People do not have to use the full power of this, but I don't want a false ceiling to the abstractions we can build.

Haskell has iso-recursive types (not equi-recursive) so it does have a kind of inductive type, thus Peano numbers work in Haskell :-) I can go on to define addition, subtraction, full arithmetic in the type system. Using polymorphic recursion you can even convert a value to a type for faked dependent types, but I don't think we should support this... That why we are using parametric types not universally quantified types.

So in our system those Haskell type Peano numbers have to be statically determined (effectively known at compile time). They cannot support runtime polymorphism without combining with existential types.

We are discussing different ways to do things, first-class union types give us:

type List<a> // a patial declaration which we need to tie the knot
data Nil = Nil
data Cons<a> = Cons(head: a, tail: List<a>)
type List<a> = Nil | Cons<a> // tie the knot, note the RHS are type in a 'type' declaration

@shelby3
Copy link
Author

shelby3 commented Dec 26, 2018

Re-summarizing and expounding on what I learned about subtyping from the MLsub paper.

@NodixBlockchain
Copy link

I thought about all this, and i think i'm exploring what i think come close to a good system.

When looking at OOP one of the thing that makes it confusing, is that class can be both equivalent to implementation of an interface, and used to encapsulate a context of execution for those 'methods'.

If the concept of 'class' in OOP is replaced by modules, and the functions definition in the modules follow a functional style with immutable input type (optionally an operand), and a result type, there is no need for the 'encapsulation' of OOP's class definition , only the interface definition.

Modules can still define a record type, if runtime polymorphism is being achieved via dynamic linking, it means the runtime need to convert the record type used by a module to the type used by the program using dictionaries to find an intersection between the two records types and copying the values from one record to another.

Having types informations in modules at runtime can also be useful for serialization / networking, if modules can be used in distributed environment, and necessary if runtime polymorphism is made using dynamic module linking.

References to 'dynamic modules' in a program can only be an interface, defined using a typeclass that allow the runtime to check the type of a dynamic module, and relink the program with the functions that can take an immutable record as input or output a new one as a result, copying the fields in and out of the modules to the types used by the program based on dictionary definitions.

If the type of the module being used in the program can be determined statically at compile type, the name of the module can be implicit like in haskell typeclass.

If a module is being represented as a variable in a program, loading a new module would involve matching the type of the new module with the type of the interface used by the program, and relinking the program with the address of the good monorphized function in the new module, which would be the equivalent of runtime polymorphism.

I would still keep the possibility to have dynamic typing for things like closures + lambda function when the type of the variables in the closure are defined close to the function, and can't cross modules interfaces, the closures would be essentially the only form of global variables with dynamic typing.

The modules definition as interfaces with a functional style, allow for composability, and can also be represented as an acyclic graph, and used for 'functional reactive programing' / CSP, and also for AI or language processing when the processor need to build a tree of possible path to evaluate and select from, it can easily build the path as a graph of modules interfaces with more typing information than classic neural networks, and i'm sure it can be very useful also to make parser combinators and language processors.

@sighoya
Copy link

sighoya commented Feb 2, 2019

@NodixBlockchain wrote:

When looking at OOP one of the thing that makes it confusing, is that class can be both equivalent to implementation of an interface, and used to encapsulate a context of execution for those 'methods'.

That wouldn't be a contradiction, though, but I know what you mean...
As @keean said, they conflate value level and type level with each other, i.e. methods as values are bundled together with field signatures which can be seen as types.
In return, are fields and methods members of a class?
If yes, why are fields instantiated by class instantiation while methods are not?
If true further, why fields are overlooked by interface implementation?

If the concept of 'class' in OOP is replaced by modules, and the functions definition in the modules follow a functional style with immutable input type (optionally an operand), and a result type, there is no need for the 'encapsulation' of OOP's class definition , only the interface definition.

And what you do with fields? In my eyes classes should be structs. Then we have the complementary problem, what we do with the methods inside a class?
There are at least two options:

Either you define methods in the surrounding module and prepend the input tuple of each method with the former implicit class type (This is probably what @keean and @shelby3 seek for).
Then methods are called either in prefix syntax or in dot syntax where the object of the said class type is located for the dot and the method called with the rest of arguments after the dot (uniform function call syntax (UFCS)) .
Or you define methods as associated/extension functions ala Kotlin's or Rust's syntax.
I only partially like UFCS as you have to import all methods extra when importing the struct or you do it automatically with all methods matching the imported struct with their first argument type, but then you may import to much methods as I wouldn't see all methods as associated to the said struct.
So I'm leaning more to Rust's solution.

here is no need for the 'encapsulation' of OOP's class definition , only the interface definition.

Well, modules also encapsulate.

Modules can still define a record type, if runtime polymorphism is being achieved via dynamic linking, it means the runtime need to convert the record type used by a module to the type used by the program using dictionaries to find an intersection between the two records types and copying the values from one record to another.

No, modules should be values not types, but it is okay for me to merge structs with records, as they seem to be the same concept.
I wouldn't consider dynamic linking being capable to do runtime polymorphism rather it allows for late compile time polymorphism as you still lack runtime information but you may be right, here.

record type used by a module to the type used by the program using dictionaries to find an intersection between the two records types and copying the values from one record to another.

Sounds like structuralism. When you do this? If they overlap or only if they subsume each other?

If a module is being represented as a variable in a program, loading a new module would involve matching the type of the new module with the type of the interface used by the program, and relinking the program with the address of the good monorphized function in the new module, which would be the equivalent of runtime polymorphism.

Yes, what you describe is the use of existentials.

I would still keep the possibility to have dynamic typing for things like closures + lambda function when the type of the variables in the closure are defined close to the function, and can't cross modules interfaces, the closures would be essentially the only form of global variables with dynamic typing.

Dynamic typing is using the Any Type with auto generated pattern matching when assigning a value of Any Type to a variable of type <: Any. I also like this in some cases though I don't know what you mean with "types of variables close to the surrounding function".
Why no Any type, @keean,@shelby3? Problems with principal typing or violation of invariants?

The modules definition as interfaces with a functional style, allow for composability,

If modules are first class, then yes. But what you may mean is that hiding modules between interfaces increases modularity and reduces coupling.

and can also be represented as an acyclic graph, and used for 'functional reactive programing' / CSP, and also for AI or language processing when the processor need to build a tree of possible path to evaluate and select from, it can easily build the path as a graph of modules interfaces with more typing information than classic neural networks, and i'm sure it can be very useful also to make parser combinators and language processors.

I think all you need here is first class functions, you can all do that with classes, too.

Further you talk of interfaces and you are right because interfaces can be really seen as types of modules but I would either choose between structs or interfaces as they are the same concept unless interfaces take the role of typeclasses, too.

Having types informations in modules at runtime can also be useful for serialization / networking, if modules can be used in distributed environment, and necessary if runtime polymorphism is made using dynamic module linking.

Do you talk of runtime reflection (RR), isn't needed here, because at the sender's side it is possible to partly serialize modules at compile time (string template) and feed it at runtime with the corresponding values.
At the receiver's side, you expect to load a module of certain type(s). The module string/binary blob contains all typing information but the value drawn from this doesn't need RR unless you assign modules to a module variable of expecting an union of struct types but union types support anyway RR.

@NodixBlockchain
Copy link

NodixBlockchain commented Feb 10, 2019

I wouldn't consider dynamic linking being capable to do runtime polymorphism rather it allows for late compile time polymorphism as you still lack runtime information but you may be right, here.

The way i see it, it would not be very far from the way it's done with c++ and virtual function table, if the linking is done like visual studio with an imported function table, it could become very close to virtual method of OOP.

Type information can be included in the linking information, it doesn't have to be only matching a symbol name with an address, this lack in the standard ABI in elf or PE is probably what lead to requiring the COM layer above the base DLL ABI to have better dynamic loading with more type informations, which also allow cross language interoperability with a transtyping in the runtime to an internal format at interface boundaries.

I'm not sure if it's really type reflection, the way i see it for the moment, the compiler/runtime could have an internal representation of the module, and operations on module reference would not be just storing a value at an address or simple arithmetic, but equivalent of setting the virtual function table of an interface, and the runtime could have access to type information for the purpose of loading module like a dynamic loader would do, but those would not necessarily be exposed to the programming language, much like the virtual function table of a class in C++.

If all the modules than can be assigned to an interface are known at compile time, the type checking can be done at compile time and subtyping relationship established at compile time, and the runtime just need to fill the function table of the interface from the module when it's assigned to it. In itself in this case it could be close to haskell dictionary passing.

But if it can load new modules dynamically, it needs to have type information for the call site and the module, if both are bound to an interface definition, the runtime can find subtyping relationship between those interface definition, and fill the function pointer table with the module implementation.

I would easily see structural typing for record, and nominal for interfaces, as function names can give good hint on the semantic of the function. The subtyping relationship would be different for interfaces and records, but if modules can't store global/permanent references to records, and only read argument and create new record/values, like with functional programming, structural typing would make it easier for the linker to match interface parameters with input from another module, but still keeping name semantic for interfaces definitions.

@sighoya
Copy link

sighoya commented Feb 10, 2019

@NodixBlockchain wrote:

the way i see it for the moment, the compiler/runtime could have an internal representation of the module, and operations on module reference would not be just storing a value at an address or simple arithmetic, but equivalent of setting the virtual function table of an interface, and the runtime could have access to type information

The way i see it, it would not be very far from the way it's done with c++ and virtual function table, if the linking is done like visual studio with an imported function table, it could become very close to virtual method of OOP.

Afaics, the interface mechanism is much like or the same? as transmitting a vtable additional to a(n) module/object, so I tend to agree, here.

Type information can be included in the linking information, it doesn't have to be only matching a symbol name with an address, this lack in the standard ABI in elf

You mean the lack in the C++ Standard ABI because you can insert a new section into elf's section list to encode type information.

I'm not sure if it's really type reflection,

Probably, the inspection of the underlying type behind an interface type can be considered as a limited form of reflection.

and the runtime could have access to type information for the purpose of loading module like a dynamic loader would do

Yes, dynamic loading in difference to dynamic linking happens at runtime of the program and would require runtime polymorphism/structural matching/runtime reflection to inspect the shape of the included module.

I think the most interesting parts of reflection can be covered at compile time, thats the reason why people of dlang are working on compile time reflection, e.g. list all classes in the project, get values from strings and so on, only small set of tasks would involve runtime reflection, mostly for objects/functions/interfaces/types created at runtime.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@shelby3,
You said:

Trait objects bind the implementation to the interface at the time of assignment to the trait object, which is premature. My solution extends delayed binding of implementation to the function call site for the polymorphism case.

in the Union and Intersections Issue.

Further you state:

My solution instead retains the typing information about the data types in the structural union (instead of assigning to a trait object), so that the instances of the said structural union can be passed as input to any function with any trait bound

Did you mention the following problem with your statements:

object module1
                                                     
import Trait
fun()=
    traitObject:Trait= 2:Int
    module2.fun(traitObj)

object module2 

trait Trait
{
...
}
fun(traitObject:Trait)= otherFun(traitObject) //error traitObject doesn't implement Show, but Int does it.
fun(finiteExistential:Int|Float)=otherFun(finiteExistential) //works because Int + Float does implement Show
otherFun(s:S) requires Show[S] = ...

Did you further propose that the caller passing a union value should also implicitly pass typeclass instances of all variant types from the caller (module1) to the callee (module2) or should the instances of the callee(module2) be of interest only:

object module1

instance Addable[Int] {(+):T->T->T=...} //module1:Addable

fun(i:Int)= module2.fun(2)

object module2

instance Addable[Int] {(+):T->T->T=...} //module2:Addable
fun(i:Int|Float)= i+i //did we consider module1:Addable or module2:Addable for i:Int here?

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya I think the implementation mechanism is this: We assign a globally unique type ID, (GUID? UUID? GUT? UUT?) to all types. For statically known types this can be erased as usual. For runtime dynamic types we can tag the type with the GUID tag (instead of a locally unique tag in a sum type). Type class dictionaries can be stored in a hash map from GUID tag to set of dictionaries. When a value with dynamic type is imported (into a module scope where the set of required type classes is closed), it's tag can be looked up.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@keean wrote:

f : (forall a . a -> a) -> (String | Int)
f : ((String | Int) -> a) -> a
f : ((String -> String) & (Int -> Int)) -> (String | Int)

can be inferred for:

f x = (x "ABC", x 123)

True, but I would always prefer Intersection types because of the highest degree of abstraction/freedom over Union Types and HRT. The correct signature is:

f[S,T] : ((String -> S) & (Int -> T)) -> (S | T) //for f x= (x "hello", x 2)

Note that Intersection types represent a specialization of HRTs.
Note, even if you could infer (String|Int->String)->(String,String) for the example above, I would prefer (String->String & Int-> String)-> (String,String) as by by the law of contravariant subtyping for row polymorhic function types you can pass a (String|Int)->String to String->String & Int->String

And if you forbid function types in general you shouldn't have any issues with intersection types, I think.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@keean wrote:

When a value with dynamic type is imported (into a module scope where the set of required type classes is closed), it's tag can be looked up.

But what did you do if the dynamic value imported from module1 to module2 has variant data types Int, Float, String which have different typeclass instances in module1 and in module2 for type classes known in both modules?
Which did you choose?

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya Specialisation and scope of instances is a different topic.

The type tag identifies they type exactly, so we can recover the type information at runtime, therefore any resolution mechanism we can do statically can be done dynamically.

What I would like to avoid is having the compiler generate code for the runtime. I would rather an approach where the compiler checks that statements are sufficiently guarded at runtime. So if we default types to 'Unkown', you must test before using any interface

Something like:

var data = file.parse() // type of data is "Unknown"
if (data `instanceof` Int) { // type of data is "Int" inside the if...
   ... now any typeclass implemented for Int can be used
}
if (data `implements` Print) { // data implements "Print" inside the if...
   ... can be any type that provides an implementation for Print
}

Edit: so the question is how could you determine which of two Print interfaces to use? All the type system has to do is to check that there is exactly one implementation of Print in scope inside the 'if' statement. So for example:

if (data `implements` Discriminator) { // we know data implements the Discriminator typeclass
   if (data.disc() > 32) {
      using PrintA
      ...
   } else {
      using PrintB
      ....
   }
}

so we can use any value based runtime logic to bring only one of the possible typeclasses into implicit scope.

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya So perhaps to formalise the above, one possibility that would have the desired semantics would be Agda like implicit variables, where functions can be defined to take additional implicit variables. To allow intermediate functions to pass through typeclasses without modification we would need row-polymorphic implicits. We would also need the ability to filter the implicit environment to reduce the implementations in scope.

So if you take Agda implicit variables, and combine with row-polymorphism, I think you end up with Algebraic Effects. With effect inference, you would not have to write the effects into the type signature, except where you want to make these things explicit, like in module interfaces.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@sighoya Specialisation and scope of instances is a different topic.

True, but I had related it to union types implying the use of subtyping. Maybe another thread is more appropriate.

Edit: so the question is how could you determine which of two Print interfaces to use? All the type system has to do is to check that there is exactly one implementation of Print in scope inside the 'if' statement. So for example:

if (data `implements` Discriminator) { // we know data implements the Discriminator typeclass
  if (data.disc() > 32) {
     using PrintA
     ...
  } else {
    using PrintB
     ....
  }
}

so we can use any value based runtime logic to bring only one of the possible typeclasses into implicit scope.

Hmm, this seems to utilize named instances to disambiguate issues of global coherence.
But the point is, you may don't know all the names of the caller's typeclass instances because the callee is decoupled from the caller over library mechanism.

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya I think the important point to remember is that typeclasses have dynamic scope. So the instance in scope can be determined anywhere further up the call-stack.

This works fine except where the type of a variable is 'Unknown', in which case we must use the type tag or value of the variable to determine the instance to use.

If we consider the two methods of selection I mentioned above: if we guard on a type like 'Int' then we would use the same instance we would use if the type were statically determined to be 'Int'. If we guard on a typeclass like 'implements Print' then we still require that whatever type is dynamically passed resolves to a single instance.

We could summarise this requirement as we require 'global coherence' of typeclasses for in every expression, but we do not require the same resolution in every expression. In Haskell we require global coherence globally, that is wherever you are in a program the typeclass for a given type must resolve to exactly one instance, and it must be the same instance everywhere. What I am suggesting here is global coherence locally, IE there must be exactly one resolution at every point in the program, but the resolution can be different for every point in the program.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@keean wrote:

I think the important point to remember is that typeclasses have dynamic scope. So the instance in scope can be determined anywhere further up the call-stack.

So you prefer dynamic resolution for trait objects and all the types which has instances associated to the typeclass of the trait object.

If we guard on a typeclass like 'implements Print' then we still require that whatever type is dynamically passed resolves to a single instance.

There is a problem with this view.

object module1
instance Trait for Int

fun()=module2.fun(2:Int)
gun()=module2.gun(2:Int)

object module2
instance Trat for Int

show[S](s:S)requires Show[S] =...
fun(t:Trait)=if(s `implements` Show) show(s)  else ... //what impl is chosen for Show? The one of module1 or module2
gun(t:Trait)=if(s `implements` Show) (i=s:Int;show(i))  else ... //different to fun above?

If you take the instance from module1 for s, then your instance resolution is dynamic, which I may assume you prefer. But what about module2.gun? If you do it likewise, any call of a function with any type which doesn't necessarily have to be typeclass bounded have to include all the dictionaries of the parameter types of the called functions. This would allow to override any called typeclass method in the calle.
For instance:
You want to add numbers: linear(a:Int,x:Int,c:Int)=a*x+b, but the caller of fun provides new implementations of (+) and (*) for Int which may doesn't correspond to addition/multiplication to any means. Now, calling linear doesn't make sense anymore as it cozenage in what it does.

In my eyes, it is better that only trait bounds and trait objects have dynamic resolution implying dynamic resolution in module2.fun, but not once the inner variant data type is visible from the trait object implying no dynamic resolution in module2.gun.

This is the reason I'm against dynamic instance resolution for the variant data types in the union type because the variant data types are visible.

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya Hmm I struggle with examples in Scala, because its a mess, and I don't want to pollute my brain :-)

Instead lets think about simple runtime semantics. We pass dictionaries into functions (as implicit arguments) which is an implementation of typeclasses. This has 'dynamic scope' because the caller can choose which dictionaries to pass in.

We can see this working for all static cases, because if we statically know the types, we know which dictionaries to pass in.

For any function we know the complete list of functions it calls, which means we can always implicitly know the set of required functions for any call. We need this to be inferred, so library functions can pass through dictionaries to callbacks etc.

So given a heterogeneous collection we can calculate at link time the set of dictionaries required by any object put into the collection. This solves one degree of the expression problem because we can add any datatype providing it implements the correct set of typeclasses.

Alternatively we can know at link time the set of types that are put into the heterogeneous collection, in which case we can call any function on the collection that is defined for all types in the collection. This allows us to solve the other side of the expression problem, because we can implement a new function on the collection by providing overloads for all the types in the collection typelist.

Both of these methods work statically and we can imagine combing them, so to solve the expression problem we need to statically know all the types inserted into the list, and all the functions called on the list, which is of course possible to do statically by type inference and typeclass inference.

So this is great, the expression problem is solved, except that we must statically know the whole program. Of course extending this to modular compilation is easy, as we just need to know the list of types that are returned by every function, and the typeclass requirements for every argument. So providing our interface definitions carry sufficient information, there is no problem with modular compilation.

So the only remaining problem is with runtime module loading. Again we can solve this by implementing the correct checks on dynamic loading. If we are loading a module that applies functions to the values in a heterogeneous collection, then the dynamic loader needs to check that the functions typeclasses are implemented for all possible types that can be in the collection (remember we can determine this statically at link time). If the module inserts values into the collection then the dynamic loader needs to check that all typeclasses that are called on collection values are implemented for the types inserted.

To think about this another way, the problems with typeclasses come from the Open World assumption. We can avoid this by moving to a closed world assumption, but the problem is this limits the expression of extensibility. We can recover expression by deferring closing the world until link time by improving our module interface definitions with the extra information outlined above, but this precludes dynamic loading. We can recover dynamic extensibility by having a closed world with extensions. So the world is closed before the dynamic module load and it is closed after. This makes it the responsibility of the dynamic loader to prove the world is closed by checking the interface of the loaded module against the interface/signature type it is being loaded into.

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya Okay, I think I understand your example now, and the answer is, we cannot tell which module will be used, because:

  • neither module seems to define "show"
  • we cannot see the call to 'show' to determine the context.
  • 'show' seems to be recursive which must be some kind of error?

What we can say is that we must pass show a dictionary for typeclass 'Show' for the type s in the arguments to the function definition. Which dictionary is passed depends only on the caller, and not the definition of show.

@sighoya
Copy link

sighoya commented Jun 30, 2020

@keean wrote:

'show' seems to be recursive which must be some kind of error?

Forgot that show was part of the Show interface.

Which dictionary is passed depends only on the caller, and not the definition of show.

Does it hold for both module2.fun and module2.gun with variant data type integer?
Note, both module1 and module2 can have an implementation of Show for Int? Is then the Show implementation used from module1 (the caller) or module 2 (the callee)?

@sighoya
Copy link

sighoya commented Jun 30, 2020

To put it more clearly:
If I add two integers, which instance is chosen for (+)::Num p=>p->p->p?.
The lexical or dynamic one?
If you state: the dynamic one, you never can add two integers because integers are trait objects in your language and by dynamic instance resolution you can't know if there is an instance for Int provided by the caller, you may have an lexical instance for Integer but you would never use this because the caller decides if there is one or not.

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya

dynamic instance resolution you can't know if there is an instance for Int provided by the caller, you may have an lexical instance for Integer but you would never use this because the caller decides if there is one or not.

This doesn't make sense to me. The dictionary for Num is an implicit parameter. so its always dynamically passed from the caller, even in Haskell:

double  :: (Num a) => a -> a 
double x = x + x

The typeclass constraint (trait bound) Num a is implemented by passing a dictionary for Num into double.

@sighoya
Copy link

sighoya commented Jun 30, 2020

The typeclass constraint (trait bound) Num a is implemented by passing a dictionary for Num into double.

This is the staticgeneric case, but what about this:
foo::Int->Int->Int
foo x y= x+y //Is this valid?

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya

This is the generic case, but what about this:

Well that's optimised, and not the "real" type:

foo :: (Num Int) => Int -> Int -> Int
foo x y = x + y

This is valid, and foo needs an argument of an implementation of Num for Int...

@sighoya
Copy link

sighoya commented Jun 30, 2020

This is valid, and foo needs an argument of an implementation of Num for Int...

This is a typical question if lexical entities has priority over/under dynamic entities.

To illustrate again the problem:

foo::Int->Int->Int 
foo x y = x+y

is in your language equivalent to:

foo :: (R requires Int[R],Num[R]) -> (S requires Integer[S],Num[S]) -> (T requires Integer[T],Num[T])
foo x y = x+y

So x,y,$returnValue are typeclass objects which may get dictionaries for Num for the types R,S,T from the caller (dynamic scoping).
Another possibility would be to inject the instances from Num for Int from the module declaring foo, in this case all Int Types are not typeclass objects.

What about this?:

foo2::Int->Void
foo2 a= (b::Int=a; fun(b))

Does it become?:

foo2::(T requires Integer[T])->Void
foo2 a = b::(T requires Integer[T])=a; fun(b)

or

foo2::(T requires Integer[T])->Void
foo2 a = b::Int=a; fun(b)

@keean
Copy link
Owner

keean commented Jun 30, 2020

@sighoya
Depends on the requirements of fun. If we assume fun is foo:

foo2:: (Num Int) => Int -> ()
foo2 a = b::Int=a; foo(b

This is true even for:

foo3 :: (Num Int) => ()
foo3 = foo 3

@keean keean closed this as completed Jun 30, 2020
@keean keean reopened this Jun 30, 2020
@keean
Copy link
Owner

keean commented Jul 1, 2020

@sighoya Thinking about unifying typeclasses and algebraic effects, we might prefer to write:

foo2 :: Int -> {Num Int} ()
foo2 a = b::Int=a; foo(b)

The reason for this is that effects can be seen as a row-polymorphic monad, so like we write IO a for the IO monad we can write {...} a where the curly brackets are the row. Row polymorphism solves the problem with monads in that they do not compose, but row-polymorphic monads do.

In this way we can also keep the language pure, and encapsulate all state in algebraic effects.

https://overreacted.io/algebraic-effects-for-the-rest-of-us/

What is interesting about this article is that it shows how algebraic effects can remove the 'function colour' from sync/async functions in exactly the same way it can from pure/impure functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants
@shelby3 @skaller @keean @gavinking @sighoya @NodixBlockchain and others