Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typeclass objects #27

Open
shelby3 opened this issue Oct 8, 2016 · 302 comments
Open

Typeclass objects #27

shelby3 opened this issue Oct 8, 2016 · 302 comments

Comments

@shelby3
Copy link

shelby3 commented Oct 8, 2016

Typeclass objects offer dynamic (i.e. run-time and open modularity1) polymorphism by binding the typeclass bound to the static (compile-time) type of the reference and tagging the data type instances that the reference can contain, so that the correct dictionary can be dispatched dynamically at run-time.

They are named "trait objects" in Rust and "existential quantification" (e.g. forall a) or "existential typeclass" in Haskell.

1 Means we can't compile the universe, so we can't know all the data types required in dependency injection scenarios. Dynamic polymorphism is not just a run-time artifact, but it is also the fact of the real-world where total orders can't exist (i.e. the Second Law of Thermodynamics, Halting Problem, Incompleteness Theorem, Russell's Paradox, Byzantine Generals Problem, Byzantine fault-tolerance with CAP theorem, impossibility of existence without a quantified speed-of-light, etc). As well, there are no total orders of trustless, decentralization either.

Expression Problem

They have the benefit w.r.t. to Wadler's Expression Problem of delaying the binding of interface to data on construction of the typeclass object as opposed to OOP subclassing (and Scheme/JavaScript prototype2) which binds the interface to all instances of the subclassed type (on instantiation of the data). So they solve the Expression Problem up the point of instantiation of the typeclass object, but beyond that point while they allow adding new data types (via the aforementioned dynamic dispatch) they don't allow extending new operations after the instantation of the typeclass object which binds the allowed typeclass interface(s). Typeclass objects enable heterogeneous collections with dynamic extension of the collection of data types without requiring an invariant collections data structure (i.e. they can be the element type of an array). Whereas, typeclass bounds on function definitions delay binding the interface until the function call site and thus solve the Expression Problem up to that point, but they don't enable heterogeneous collections with dynamic extension of the collection of data types except via unions. Unions can both enable heterogeneous collections with dynamic extension of the collection of data types and delay binding of the interface until the function call site, but they require an invariant collections data structure such as a list.

2 Note that prototype programming can add new operations (but forces all instances of a prototype to take those new interfaces) and add new data types, but it can't add different interfaces to instances of the same prototype. Whereas, type classing can differentiate. (Edit: correction)

Syntax and Pecularities

@shelby3 wrote:

Well obviously that is useless and not what I was referring to. I mean for example, I have a collection of typeclass objects and I want to sort or find them, so I need a relation.

A relation only applies to two objects of the same type.

Exactly. So thus Eq<A, A> is useless for what I wrote above.

One answer is to provide a mapping from each object type to a partial order (or total order).

So akin to subsumption to a subclass, except open to delayed definition. That might work for sort.

Or another alternative is Eq<A, B> accepts any type B, but always returns false if B is not A. So find will work.

Okay I see how maybe typeclass objects are still useful. Thanks.

@shelby3 wrote:

@keean wrote:

data Variant = for<A, B> Variant(A) requires A : Eq<B>

data Variant = Variant(A | Any) where A: Eq<Any>

That seems to be a general solution and I like it very much, unless someone can point out flaws.

So typeclass objects are always conceptually with | Any, e.g.:

let x: A where Show<A | Any> = MyType(1, 2)
let y: A where Eq<A | Any, Any> = MyType(1, 2)

Or in my (and Rust's) preferred OOPish style:

let x: A where (A | Any): Show = MyType(1, 2)
let y: A where (A | Any): Eq<Any> = MyType(1, 2)

But I prefer we can write that shorthand (and so we only need where clauses on function types and definitions):

let x: Show = MyType(1, 2)
let y: Eq<Any> = MyType(1, 2)

Hope you are also starting to see another reason why I favored an OOPish style because it is also more consistent with typeclass objects (and I actually did intuitively anticipate/expect this but not with complete solution worked out).

P.S. my understanding is that any function defined for a typeclass bound would also accept typeclass objects with that bound. The compiler would create a dynamic dispatch version of the said function on demand to be used when ever a reference to it is passed around. Seems to me that only functions called nominally can be monomorphized.

@shelby3
Copy link
Author

shelby3 commented Oct 8, 2016

@keean wrote:

This callback can only show the type A which is passed to f, the requirements should be at the top level.

f(callback: for<B> (B) => ()) where B: Show

Where the callback can be called on any type. The first example the callback is monomorphic with respect to f, in the second polymorphic.

I am instead proposing that be written the same as any typeclass bounded function without any for<B>:

f(callback: (B) => ()) where B: Show

I wrote in the OP:

P.S. my understanding is that any function defined for a typeclass bound would also accept typeclass objects with that bound. The compiler would create a dynamic dispatch version of the said function on demand to be used when ever a reference to it is passed around. Seems to me that only functions called nominally can be monomorphized.

@keean
Copy link
Owner

keean commented Oct 8, 2016

This:

f(callback: (B) => ()) where B: Show

Doesn't work because you are not specifying the scope for 'B' you could mean:

for<B> f(callback: (B) => ()) where B: Show

Or you could mean:

f(callback: for<B> (B) where B: Show => ())

@shelby3
Copy link
Author

shelby3 commented Oct 8, 2016

@keean wrote:

Doesn't work because you are not specifying the scope for B you could mean:

I don't understand why that is important. Please explain.

@keean
Copy link
Owner

keean commented Oct 8, 2016

I don't understand why that is important. Please explain.

  • With the for<B> in the outer scope the callback is monomorphic with respect to f so the callback can only be called on one type, that is fixed at the time f is called.
  • With the for in the inner score we are passing a polymorphic function that can be called on any B

@shelby3
Copy link
Author

shelby3 commented Oct 8, 2016

@keean wrote:

  • With the for<B> in the outer scope the callback is monomorphic with respect to f so the callback can only be called on one type, that is fixed at the time f is called.

Okay what you mean is that when f is called, then at the call site the types of the callback are monomorphized. This means f is only calling the callback with the said typeclass bounded instances and not with concrete data instances.

I don't think the for<B> is necessary. The scope can be determined by whether the where clause appears at the top-level scope or not! So non-top-level indicates a typeclass object; whereas, top-level indicates monomorphization at the call-site of f. So now, you provided another more important reason that we need non-top-level where clauses. I hate that ugly/noisy for<B>.

  • With the for in the inner score we are passing a polymorphic function that can be called on any B

Right that is what I wrote else where, that it doesn't make any sense for f to request a typeclass bounded callback (which isn't monomorphized at the call site for f) unless it is for dynamic dispatch typeclass object(s).

@keean
Copy link
Owner

keean commented Oct 8, 2016

Yes but I would rather make the programmer declare a datatype for a polymorphic callback function. This keeps the function syntax simpler, and is called first-class-polymorphism.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote in other thread:

I think its a bit more of a problem. Its not the typeclass object that is needed. Consider:

f(callback : MyCallbackFn) where ...

We want to create the where clause when declaring f but we don't yet know the callback function that will be passed. We don't know what typeclass bounds the callback will have.

If we want the callback to print it may need a Show constraint, we cannot know at the point f is declared what typeclass bounds callback will have, and to define them would be to restrict what the callback is allowed to do, which does not seem right.

I am pleased you raised this point.

First re-read my prior comment and click the link in it to remember what you pointed out to me about the scoping of the callback.

If the callback is scoped by the call site to function f, then the caller chooses the typeclass bound(s), which satisfies your point.

And if the callback is scoped by the function f, then f decides the typeclass bound(s) of the typeclass object(s) of the callback.

What needs to happen is we need to infer the typeclass bounds on f when f is called with a specific callback which has known typeclass bounds.

I think the callback example needs type-class bounds inference at the callsite of f in order to be useful.

If you are not referring to augmentation of type-class bounds, that would only be in the case when the callback is scoped by the call-site of f, except it is generally impossible. Let's clarify what kind of inference we are taking about.

The function f may call some operations on the callback and/or it may store f in a data structure imperatively and thus we have no way of knowing which future operations on the callback will be allowed if f does not declare them. The compiler will force f to declare them. You could I suppose infer the operations that f internally requires in its function body using principal typings, but we don't know yet to what extent we can support principal typings, as generally they are undecidable in an open system.

So f probably must declare its typeclass bound(s) requirements (at least we decided at a minimum, it must be explicit at module boundaries).

If we want the callback to print it may need a Show constraint, we cannot know at the point f is declared what typeclass bounds callback will have, and to define them would be to restrict what the callback is allowed to do, which does not seem right.

But you are making a different point. You are talking about the caller of f being able to augment (not remove the typeclass bounds f requires) the available typeclass bounds that the callback requires in the case where the callback is scoped at the call-site of f. In this case, I agree with you, since you want to delay unification of the type variables until the call-site.

So what you are saying is that f can require operations on the type variables of the callback and the caller can add some more required operations.

The caller can add more where requirements (either inferred or explicit) at the call site of f. The way the explicit will be done at call site is with a cast either top-level:

f(callback): (MyCallback) => () [B: Print+Show]

Or non-top-level:

f(callback: (B) => () [B: Print+Show])

Note above that either will apply to the typeclass object or monomorphization cases, because per my proposal, f's scoping of its where clause determines which case it is.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

Yes but I would rather make the programmer declare a datatype for a polymorphic callback function. This keeps the function syntax simpler, and is called first-class-polymorphism.

But that forces structural typing otherwise we can't reuse functions which have the same signature but different data type names. That is why I had suggested type aliases instead.

And type aliases must be expandable to any general types, so we won't be able to force (make) the programmer use type aliases for non-top-level where clauses (as they can use the general types any where a type signature is required).

Edit: more discussion.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

@shelby3 wrote:

@keean wrote:

@shelby3 wrote:

@keean wrote:

By using first class polymorphism we can assume all function signatures are monomorphic which avoids the need to keep track of the nesting where type variables are brought into scope

Can you please provide an example or more detailed explanation so I can grok this? (Also for readers' benefit)

, and allows us to standardise on a single top level where clause for all datatypes and function definitions. We can still construct arbitrary higher ranked types by using datatypes.

See this paper, I think it explains it quite well. http://web.cecs.pdx.edu/~mpj/pubs/fcp.html

@keean wrote:

However I think we should use first class polymorphism so we don't have such complex function signatures. With first class polymorphism we can declare a data-type:

data MyCallback = (B) => () requires B : Show

f(callback : MyCallback)

Please show me example syntax for instantiating an instance of MyCallback?

Shouldn't that be instead:

data MyCallback = MyCallback((B) => () requires B : Show)

Yes, you are right, we require the constructor for the type to have () brackets. Basically it would be the normal monomorphic function declaration inside a datatype, and any type variable that is free on the right hand side gets automatically universally quantified.

As the paper says, it keeps the syntax simpler, is easier to implement, but has the same expressive power as free annotation of functions. I think it will make for cleaner APIs too, as you can use the named type to give check you have the right type for your callback (so less typing when using libraries).

Okay so the research paper you cited is all about achieving type inference.

So as I wrote in my prior comment in this thread, you are advocating abandoning the ability to reuse functions (which contain a type variable) based on their structural type and instead forcing said functions to be instantiated nominally (thus making equivalent structural function types non-composable because they have different nominal constructors), all because you want to be able use principal typings on these said functions. I've already warned and feared that you are going to making your principal typings goal into a weapon to defeat degrees-of-freedom of expression. @skaller has also warned similarly that type inference is the enemy of expressiveness.

I don't think it is often going to be the case that we want to have a function taking a callback, where both said function and the callback function are both not annotated with quantified (aka concrete) types. And for those cases where might have wanted to do that, we won't. We will have to annotate. For myself and also for readers, please provide me a convincing example showing this annotation overhead (in this specific case of type variables we are discussing) is a significant burden justifying abandoning the ability to compose functions generally?

Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.

I don't think I like complex code without type annotations. Raises cognitive load due to lack of explicit local reasoning. Type inference is useful where the reader can easily figure out in their head what the type annotations would have been. This is one of my big complaints that makes Haskell code so obtuse to read IMO.

@keean
Copy link
Owner

keean commented Oct 9, 2016

I don't think it is often going to be the case that we want to have a function taking a callback, where both said function and the callback function are both not annotated with quantified (aka concrete) types. And for those cases where might have wanted to do that, we won't. We will have to annotate. For myself and also for readers, please provide me a convincing example showing this annotation overhead (in this specific case of type variables we are discussing) is a significant burden justifying abandoning the ability to compose functions generally?

There is no loss in expressive power (it even says so in the paper), and it will make the type signatures shorter and more readable.

There is no loss in composition, and I am not sure why you would suggest there is? Function type signatures don't compose, unless you are talking about copy-and-paste composition? Copy-and-paste just leads to stupid errors. Why bother with named functions at all, why not just copy and paste the raw-code? The same argument applies to types, it is better to have a named type you can re-use (like you call functions) rather than have people copy-and-paste huge complex and unintelligible type signatures.

@keean
Copy link
Owner

keean commented Oct 9, 2016

Okay so the research paper you cited is all about achieving type inference.

I disagree, it is all about avoiding having to have higher ranked type annotations. In Haskell (and all other languages I know of except those that use FCP) you must annotate higher ranked types because they cannot be inferred. Haskell cannot infer them, no language I know of can infer them. So this is not about inference. This is about how you achieve the annotation. We already need first-class-polymorphism to correctly handle type-class dictionaries, so we already will have the mechanism for it.

We can keep the language simpler and keep the full expressive power, how could anyone not want that?

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

There is no loss in expressive power (it even says so in the paper),

I guess that is correct. I think I also used the term 'composition' and so it is really boilerplate that is the issue.

and it will make the type signatures shorter and more readable.

At the cost of causing other boilerplate...

There is no loss in composition, and I am not sure why you would suggest there is?

I can't reuse a MyCallback every where I can use a MyCallback2 (and vice versa) even if the signature of the functions are otherwise identical.

Granted I can destructure those constructors and re-construct a compatible instance, but this is boilerplate and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one. Afaics, FCP is increasing boilerplate of composition in exchange for the hope of not having to write type annotations for higher-ranked types (and I say hope because I don't know if FCP inference will integrate with our unions and other features and I don't know if we won't end up needing higher-ranked types any way).

I disagree, it is all about avoiding having to have higher ranked type annotations.

Which is precisely all about achieving (a similar level of expressiveness with) type inference. But the entire point of type inference is to not have boilerplate. I hope you didn't really expect that FCP is some Holy Grail without trade-offs.

So this is not about inference. This is about how you achieve the annotation.

It is about how we support type variables with minimal annotation and boilerplate, with the goal of enabling type inference with a boilerplate wrapping versus losing type inference with no boilerplate wrapping.

We can keep the language simpler and keep the full expressive power, how could anyone not want that?

How is it simpler? You trade off one thing for another.

The same argument applies to types, it is better to have a named type you can re-use (like you call functions) rather than have people copy-and-paste huge complex and unintelligible type signatures.

The difference is we don't normally expect structurally equivalent types to be equivalent (excluding our planned unions and intersections). But functions we normally do expect them to be composable on structurally equivalence (which is why we allow anonymous functions).

Note you did not address my comment here:

@shelby3 wrote:

Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.

@keean
Copy link
Owner

keean commented Oct 9, 2016

How is it simpler? You trade off one thing for another.

You avoid nested where clauses in the syntax. The grammar will be simpler, (and this will be visible in the simple EBNF for the language).

The difference is we don't normally expect structurally equivalent types to be equivalent. But functions we normally do expect them to be composable on structurally equivalence.

We dont normally expect any equivalence operator to work on functions, so I am not sure this is important. Remember top-level functions can be fully polymorphic, so you only need to wrap the argument in the datatype at the call site. Like this:

data MyCallback = MyCallback((A) => A)

id(x : A) : A => x

fn(callback : MyCallback)

fn(MyCallback(id))

@skaller
Copy link

skaller commented Oct 9, 2016

What is wrong with type annotations?

They’re great! They add to the ability to reason about code in principle.
They allow error messages to make sense.

The downside of annotation is verbosity which makes code harder to read.
The programmer should have flexibility chosing the tradeoff.


john skaller
skaller@users.sourceforge.net
http://felix-lang.org

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

We dont normally expect any equivalence operator to work on functions, so I am not sure this is important.

Disagree. We expect assignment of functions to work on structural equivalence.

fn(MyCallback(id))

And so you are right back to type annotations again.

How is it simpler? You trade off one thing for another.

You avoid nested where clauses in the syntax. The grammar will be simpler, (and this will be visible in the simple EBNF for the language).

We avoid that but add the boilerplate back in other places. So isn't necessarily a net gain in brevity. It also loses some local reasoning.

The nested where clauses can be avoided without forcing FCP, by offering a type alias as I mentioned twice before.

And you didn't address my comment:

@shelby3 wrote:

and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one

Generally wrapping interacts badly because it becomes deeply nested!!!

Which is why we want a first-class anonymous union instead of massive case-tree logic to destructure Either. OTOH, first-class without wrapping makes inference undecidable.

@keean
Copy link
Owner

keean commented Oct 9, 2016

Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate.

Because it adds complexity to the implementation and the grammar, and complexity is the enemy of good software. Expressive power is achieving the most with the least.

What is wrong with type annotations?

Nothing is wrong with writing types, but the more complex the nesting of types, especially higher-ranked types is confusing. If we are aiming to make this easy for people coming from OO to pick up then we will want to make the type signatures as easy to understand as possible.

As I said above, this has nothing to do with inference, you have to write the types in either case, but the main difference is that with FCP all the quantifiers are implicit and all the where clauses are at the top level, simplifying the syntax, for no loss in expressive power. It makes the code cleaner and easier to read.

@keean
Copy link
Owner

keean commented Oct 9, 2016

and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one

No a single record or typeclass object encapsulates all the polymorphic functions so you do not need this.

For example:

data MyCallback = MyCallback {
    fna (A -> A) : A
    fnb (A -> A -> A) : A
}

Works exactly the same as the single argument tuple datatype posted before.

@keean
Copy link
Owner

keean commented Oct 9, 2016

Generally wrapping interacts badly, which is why we want a first-class anonymous union instead of massive case-tree logic to destructure Either. OTOH, first-class without wrapping makes inference undecidable.

It has nothing to do with inference as you have to provide the type signature in either case, it is just where and how that signature is written that changes. FCP uses the same mechanism as type-class dictionaries.

@keean
Copy link
Owner

keean commented Oct 9, 2016

We avoid that but add the boilerplate back in other places. So isn't necessarily a net gain in brevity. It also loses some local reasoning.

Callbacks are not local in any case, as the callback could happen from anywhere. Consider async-IO, you call write with a callback, and flow immediately continues with the next statement. The callback will be called by some non-local code in the IO complete event handler. In JavaScript this can only happen after the current "program" has finished returning control to the main event-loop.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

Readers to understand what we are discussing now, I think it is helpful to understand this way of emulating higher-ranked types which I was introduced to me at the Rust language forum:

trait Foo {
  fn call<A: Read>(&self, a: A);
}

fn f<G: Foo>(g: G, pred: bool) {
  if pred {
    g.call(File::open("foo").unwrap());
  } else {
    g.call(Cursor::new(b"foo"));
  }
}

By giving a name Foo to the type constructor for the function call, we are able to delay quantification of type variable <A: Read> until the body of the function f. Compare that to the example which failed to compile:

fn f<A: Read, G: Fn(A) -> ()>(g: G, pred: bool) {
  if pred {
    g(File::open("foo").unwrap());
  } else {
    g(Cursor::new(b"foo"));
  }
}

The above won't compile without higher-ranked types (which Rust doesn't support), because the quantification of <A: Read> is at the call-site for function f. So even if we did not place the : Read bound on A, the inference engine could not infer the correct type for A, which happens to be an intersection of two quantified functions where A is a File or a Cursor.

In general, inference of function application over unions and intersections diverges into infinite recursion in the inference and is undecidable. @keean showed me an example of that using expansion variables in private discussions last May. The two higher-order features are like mirrors facing each other, thus enabling mutual recursion.

@keean
Copy link
Owner

keean commented Oct 9, 2016

Generally wrapping interacts badly because it becomes deeply nested!!!

It is still better than the alternative of a huge multi-line type signature all in one block.

@keean
Copy link
Owner

keean commented Oct 9, 2016

@shelby3 looking at the direction you want to go, we may be better off ditching quantified types and type-classes altogether and going for a full intersection and union type system with ad-hoc overloading?

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean I haven't made any decision. I am trying to analyze the trade-offs.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

Generally wrapping interacts badly because it becomes deeply nested!!!

It is still better than the alternative of a huge multi-line type signature all in one block.

Disagree! The deep nesting totally breaks any reasonable composition.

Also we don't need to force FCP in order to get type aliases:

@shelby3 wrote:

The nested where clauses can be avoided without forcing FCP, by offering a type alias as I mentioned twice before.

@keean
Copy link
Owner

keean commented Oct 9, 2016

@shelby3 I feel you are making statements about a style you have no experience of coding, so I am not sure how much weight to give your objections.

@keean
Copy link
Owner

keean commented Oct 9, 2016

I haven't made any decision. I am trying to analyze the trade-offs.

Intersection types allow functions to be like a type-class where a single function can have different signature that apply to different types. You no longer need the boilerplate of declaring a type-class you just define two functions with different type signatures that have the same name.

@shelby3
Copy link
Author

shelby3 commented Oct 9, 2016

@keean wrote:

and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one

No a single record or typeclass object encapsulates all the polymorphic functions so you do not need this.

You misunderstood. The context of my point was MyCallback and MyCallback2 both having the same structural type, but requiring separate implementations for a typeclass bound, because they have different (nominal) types.

And then factor in deep nesting and you are in implementation proliferation hell.

@keean
Copy link
Owner

keean commented Oct 9, 2016

The context of my point was MyCallback and MyCallback2 both having the same structural type

Why would you declare two of them then? Thats like saying add and add2 both have different names despite both adding numbers exactly the same way. You don't need someone to stop you defining the add function more than once, why would you need someone to stop you defining "MyCallback" more than once. Its something you just wouldn't do.

@shelby3
Copy link
Author

shelby3 commented May 23, 2017

Of course it should be obvious to me that there is no way for the iterator object T to take a writable reference to the container when it is constructed by the iterator factory, if the compiler is enforcing a read-only invariant annotation on the container. Thus the T implementation (even though we do not see the details of it from the perspective of the Iter typeclass) can not implement a Writable interface without violating its internal invariants which are enforced by the hypothetical compiler.

@keean
Copy link
Owner

keean commented May 23, 2017

You are correct. I might explain it a bit differently, an 'Iterator' is a concrete type (like the above ReadOnlyIntegerArrayIterator) so if it is 'readonly' it does not provide the write interface.

It might be good to read about the 'Capability' security model, if you are not already familiar: https://en.wikipedia.org/wiki/Capability-based_security

The idea is the creator of the array is the 'owner' and the owner always has complete access. The owner can delegate authority to other code. So in any function like this:

f() =>
    let x = new Array()
    let ro = x.readOnlyIterator()
    let rw = x.readWriteIterator() 
    ...

We own the array and can create and use readonly and writeonly and readwrite iterators as we want to.

However a 'called' function like:

g1(i : I) requires Iterator[I], Readable[I]
g2(i : I) requires Iterator[I], Writable[I]
g3(i : I) requires Iterator[I], Readable[I], Writable[I]

g1 needs to be called with a readonly or readwrite iterator, g2 needs a writeonly or a readwrite iterator, and g3 requires a readwrite iterator. Note: If g1 tried to write to I we would either get a compiler error because we did not declare the requirement in the function definition (if the signature is explicit), or the compiler would automatically infer the typeclass and g1 would have the same type signature as g3 inferred for it. In either case the program will only compile if g1 has the correct signature and and iterator with the correct capabilities is passed to g1.

So just like the capability model, called code needs to be annotated with the permissions it requires, and it is up to the caller to delegate the permissions as required:

f() =>
    let x = new Array()
    let ro = x.readOnlyIterator()
    g1(ro)
    g2(ro)
    g3(ro)

So here the call to g1 will pass type-checking and the others will fail to type check, because the iterator I pass does not provide the capabilities required. So here the 'capability token' is actually encoded into the type of the iterator.

This means we can write secure code, where the type checker prevents code that violates the assumptions from compiling. Of course this is not 'runtime secure' and you could 'byte edit' the machine code to break the security. In a secure capability based operating system, the tokens are runtime values that are issued from the kernel, and on each access they are checked against the operating systems copy of the tokens. The enforcement here, using the type system is about stopping programs going wrong because they are written wrong, it is not about stopping malicious attacks. To prevent malicious attacks where the user has access to the binary requires privilege separation (like the Intel CPU security rings), and that all access is indirected by the kernel. This is why message passing performs well on multi-core systems, because we don't want each read to indirect via the kernel, it is better to receive and send a whole buffer at time, which you own, and therefore have full read/write access to.

@shelby3
Copy link
Author

shelby3 commented May 23, 2017

It might be good to read about the 'Capability' security model, if you are not already familiar

Are you aware I know of the Joe-E programming language.

Your explanation was good for other readers. Yes I am aware of all of that (one of the reasons I was concerned about read-only invariants).

@keean
Copy link
Owner

keean commented May 23, 2017

Yes I am aware of all of that (one of the reasons I was concerned about read-only invariants).

No programming language enforces read-only invariants. We can always patch the machine code to work around any security, as long as all the code is in the user security ring of the CPU hardware. Enforcing such things at runtime is the domain of the operating system, so unless we are going to re-invent Windows and Linux, and have a language that cannot run on mainstream OSs, we are stuck with the security mechanisms the OS provides as a library outside the language itself.

@shelby3
Copy link
Author

shelby3 commented May 25, 2017

We can always patch the machine code to work around any security

That is not true for example for Java or any equivalent sandbox.

You are essentially arguing that nothing is safe because of viruses, but that is not relevant to the discussion in the context of the security of a high-level sandbox, which was the context which is relevant to the language I am creating.

@keean
Copy link
Owner

keean commented May 25, 2017

That is not true for example for Java or any equivalent sandbox

You can for example copy the Java VM, and patch it, then run as the user. I am arguing that no language level feature provides security against malicious attacks, only the operating system can do this.

@shelby3
Copy link
Author

shelby3 commented May 25, 2017

You can for example copy the Java VM, and patch it,

The definition of a sandbox prevents that.

I will reiterate that you are referring to context which is inapplicable to a sandbox. For example, my computer may be vulnerable to viruses, but when I run untrusted code in my sandbox, that untrusted code can not escape the sandbox and do the things you are contemplating.

@keean
Copy link
Owner

keean commented May 25, 2017

Only if the code is interpreted, (in which case the interpreter forms the secure kernel), or you have byte code verification. But this is not easy to get right, Java has been vulnerable to all sorts of attacks due to errors in the byte code verifier. How are you going to do better?

@shelby3
Copy link
Author

shelby3 commented May 25, 2017

Only if the code is interpreted

You mean only if the code is compiled by the sandbox and not the untrusted party.

@keean
Copy link
Owner

keean commented May 26, 2017

That is not true for example for Java or any equivalent sandbox

Compiling by the sandbox can work, but compilers are big and complex things to distribute. Nobody really does this, probably because of platform differences between windows/macos/Linux and compilation time for big applications can be really long.

JavaScript is interpreted / JIT compiled, so is possibly the closest, but it's not really compiled, and WebAssembly will be a target you compile to.

Java compiled to a byte code, and the byte code has to be verified before you can run it, which is prone to problems.

Remote code execution has so far proved to be a major security hole, and is generally to be avoided if you don't want your machine compromised. Browser exploits that allow machine compromise exist in the wild.

Maybe when the web browsers are written in Rust they will have a lower bug rate (where the bugs allow security to be compromised).

So in theory it can work, but in practice it's not a good idea.

@shelby3
Copy link
Author

shelby3 commented May 26, 2017

but compilers are big and complex things to distribute

And C++ has a 1500 page manual. <joke>Maybe you and @skaller are the only humans on the planet who understand every nuance of C++.</joke>

And thus I am here trying to find an 80/20 rule sweet spot between Lucid™ simplicity and priorities on performance.

Complexity is not good. We should apply the K.I.S.S. principle.

Browser exploits that allow machine compromise exist in the wild

JavaScript is not Joe-E, i.e. it was not designed well for security. For example, the bug I identified in these issue threads wherein all programming languages that do not have a set of brackets (which the user will never type!) for delimiting strings that can contain executable code, thus enable script injection attacks. (And the people in charge of browser security, including the guy Ian in charge of HTML5 (also), would rather do security theater, i.e. Almost Better Than Nothing™ security, than actual security)

Yeah many things suck in the real world of available programming languages and sandbox systems.

That is one of the many reasons I am working on Lucid.

We may not get it perfect the first iteration, but we can hopefully experiment on ideas for improvements.

Operating system and hardware security are also not impervious. Security will never be an absolute.

EDIT: security discussion continued in 2018 in a different thread.

@shelby3
Copy link
Author

shelby3 commented Jul 5, 2018

Although we noted up-thread that it’s possible to request a typeclass interface with factory for a parametrised CONTAINER (and even request an iterator or an iterator factory typeclass interface for that CONTAINER), the problem which @keean correctly pointed out is that there’s no type safety checked by the compiler for the implementation of the typeclass interface instance assuring that it operates on ELEMents of the CONTAINER. The only type checking comes from the contextual use of the said typeclass interface, but it’s not assured that such contextual usage would be correct. One could imagine different scenarios (perhaps not the specific iterator example considered here) in which type safety would be violated and not caught by the compiler because of the lack of HKT.

Analogously without HKT, there’s no way to express a functor map typeclass interface that is typesafe because there’s no way to express that the map function A → B operates on the element of the CONTAINER such that A and B are both elements of CONTAINER. We can create a map operation that type checks and works, but the necessary invariants have not been enforced by the compiler in the typeclass interface instance implementation.

For this reason, I conclude we must support HKT.

Any objections?

@keean
Copy link
Owner

keean commented Jul 6, 2018

HKT tend to be associated with partial-evaluation/currying so may not be the best fit for our language. We should consider alternatives, like type-families (type level functions).

@shelby3
Copy link
Author

shelby3 commented Jul 6, 2018

HKT tend to be associated with partial-evaluation/currying so may not be the best fit for our language.

Could you please elaborate with an example that illustrates why you think this is the case?

And why would you assume our PL will not be using partial-evaluation/currying?

We should consider alternatives, like type-families (type level functions).

Could you show by example how this is an alternative?

I am thinking that iteration with typeclasses and even for your past sort example are not really typesafe without HKT. Please show me how they are type safe with type families? I thought about associated types and AFAICT they do not solve the following quoted problem:

that there’s no type safety checked by the compiler for the implementation of the typeclass interface instance assuring that it operates on ELEMents of the CONTAINER

AFAICS, we end up assuming that the implementation of the typeclass does not violate that invariant but the compiler is unable to check it.

Also note that for Zer0, I’m proposing to forsake decidable global type inference, so HKT don’t present a hindrance in that respect.

@sighoya
Copy link

sighoya commented Aug 29, 2018

@shelby3 wrote:

the problem which @keean correctly pointed out is that there’s no type safety checked by the compiler for the implementation of the typeclass interface instance assuring that it operates on ELEMents of the CONTAINER.

What about this:

typeclass ListLike<C>
   type E requires C.ValueType<E>
   add: C->E->C
   remove: C->E->C

@keean
Copy link
Owner

keean commented Aug 29, 2018

@sighoya I suggested allowing typeclass constraints on the associated type like this:

typeclass Collection(A)
   type Element = A requires Printable(A)
   add : A -> Element -> A
   first : A -> Element

fun(A) f(x : A) required Collection(A)
   y := add x 27
   print(first(y))

would require that Int is Printable. In this way an associated type becomes an existential.

@shelby3
Copy link
Author

shelby3 commented Aug 29, 2018

@keean that is premature specialization. Put the requires Printable(Collection[A].Element) on the fun instead.

@keean
Copy link
Owner

keean commented Aug 30, 2018

@shelby3
That does not solve the runtime type dependency though. We don't want to have the type of element determined by which implementation of Collection is used, because the Collection could be different at runtime. By constraining element to Printable we know that no matter what type it is we can print it.

This takes us back to the same general problem. This type naturally wants to be an existential. We could make it a closed union, but then we would have to edit the typeclass every time we add an instance.

An alternative solution would be an open-union that is closed at link-time.

@shelby3
Copy link
Author

shelby3 commented Aug 30, 2018

@keean wrote:

We could make it a closed union, but then we would have to edit the typeclass every time we add an instance.

I have told you this before and I will repeat it again. The compiler can generate the typeclass instance automatically for a closed union type (the non-disjoint, structural unions proposed for Zer0, not Haskell’s sum types which not even isomorphic to unions and thus are not unions) if every type in the closed union has a Printable instance.

@keean
Copy link
Owner

keean commented Aug 30, 2018

@shelby3

I have told you this before and I will repeat it again. The compiler can generate the typeclass instance automatically for a closed union type

Yes it can, I even said this:

We could make it a closed union, but then we would have to edit the typeclass every time we add an instance.

We would have to write:

typeclass Collection(A)
   type Element = Int |  String
   add : A -> Element -> A
   first : A -> Element

The problem now is if we want to add an instance for a collection with a Float element type we cannot do so without editing the typeclass definition, which may be in a different module.

@shelby3
Copy link
Author

shelby3 commented Aug 30, 2018

@keean Element is an output type of the instance, so do not hardcode it on the typeclass definition as you have shown. The instance can set the type when it ever it changes to a different closed union. Thus the compiler can generate it.

No need to edit the typeclass definition.

@keean
Copy link
Owner

keean commented Aug 30, 2018

@shelby3

The instance can set the type when it ever it changes to a different closed union. Thus the compiler can generate it.

Not if we don't know the type that is selecting the instance. Consider

typeclass Test(A)
   type T 
   f : A -> T
instance Test(Int)
   type T = String
   f = "hello"
instance Test(Float)
   type T = Int
   f = 2.9

var x : exists a . Test(a) => a
x = get_x_from_io()
print(f(x))

This does not work. If x is existential then T must be existential too. However if x is a closed union, then type T can be a closed union too. If x is an open union then T would need to be an open union as well.

Personally I think it is more flexible to allow open unions, but I think you only want to allow closed ones?

For me an open union (resolved at link time) would offer all the functionality of a close union and an existential, with none of the complexities of the existential.

The good news is we can keep the same syntax and associated type mechanics for both, or even existentials if they are allowed, they type checking rule is straightforward, and we require associated types to be either open/closed unions or existentials dependent on the type that is selecting the instance.

@shelby3
Copy link
Author

shelby3 commented Aug 30, 2018

@keean wrote:

This does not work.

And you’re premature specialization “solution” is an oxymoron. It isn’t a solution. It just aids and abets the clusterfuck of open union bounds. Rather I prefer to use my list-of-lists “chunking” solution in the Subtyping issues thread #8 that I need to explicate better soon so we always have closed union bounds for chunks, but open union bounds effectively without incurring that clusterfuck of non-solutions you’re advocating.

Personally I think it is more flexible to allow open unions, but I think you only want to allow closed ones?

Premature specialization is not flexibility. It is an anti-pattern. Yeah I want my solution, not non-solutions.

Please don’t ask me to explain all this now in great detail. We will just get into an argument.

When the time is right I will. This isn’t my priority right now to explain it. I know what is best and I am moving forward.

Because I don’t have time to write another 100 posts to explain something which is obvious for me, but apparently impossible to explain easily to others in my current state-of-mind and haste.

You still don’t seem to grok the anti-modularity of your total orders linking proposal even I have written dozens of posts in the Subtyping thread about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants