-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typeclass objects #27
Comments
I am instead proposing that be written the same as any typeclass bounded function without any
I wrote in the OP:
|
This:
Doesn't work because you are not specifying the scope for 'B' you could mean:
Or you could mean:
|
@keean wrote:
I don't understand why that is important. Please explain. |
|
@keean wrote:
Okay what you mean is that when I don't think the
Right that is what I wrote else where, that it doesn't make any sense for |
Yes but I would rather make the programmer declare a datatype for a polymorphic callback function. This keeps the function syntax simpler, and is called first-class-polymorphism. |
I am pleased you raised this point. First re-read my prior comment and click the link in it to remember what you pointed out to me about the scoping of the callback. If the callback is scoped by the call site to function And if the callback is scoped by the function
If you are not referring to augmentation of type-class bounds, that would only be in the case when the callback is scoped by the call-site of The function So
But you are making a different point. You are talking about the caller of So what you are saying is that The caller can add more
Or non-top-level:
Note above that either will apply to the typeclass object or monomorphization cases, because per my proposal, |
@keean wrote:
But that forces structural typing otherwise we can't reuse functions which have the same signature but different And Edit: more discussion. |
Okay so the research paper you cited is all about achieving type inference. So as I wrote in my prior comment in this thread, you are advocating abandoning the ability to reuse functions (which contain a type variable) based on their structural type and instead forcing said functions to be instantiated nominally (thus making equivalent structural function types non-composable because they have different nominal constructors), all because you want to be able use principal typings on these said functions. I've already warned and feared that you are going to making your principal typings goal into a weapon to defeat degrees-of-freedom of expression. @skaller has also warned similarly that type inference is the enemy of expressiveness. I don't think it is often going to be the case that we want to have a function taking a callback, where both said function and the callback function are both not annotated with quantified (aka concrete) types. And for those cases where might have wanted to do that, we won't. We will have to annotate. For myself and also for readers, please provide me a convincing example showing this annotation overhead (in this specific case of type variables we are discussing) is a significant burden justifying abandoning the ability to compose functions generally? Edit: also nothing stops you from using the method you prefer in your code and achieving type infererence. I am asking why we must force everyone to abandon function composition generality in all code, when they can choose to annotate. I don't think I like complex code without type annotations. Raises cognitive load due to lack of explicit local reasoning. Type inference is useful where the reader can easily figure out in their head what the type annotations would have been. This is one of my big complaints that makes Haskell code so obtuse to read IMO. |
There is no loss in expressive power (it even says so in the paper), and it will make the type signatures shorter and more readable. There is no loss in composition, and I am not sure why you would suggest there is? Function type signatures don't compose, unless you are talking about copy-and-paste composition? Copy-and-paste just leads to stupid errors. Why bother with named functions at all, why not just copy and paste the raw-code? The same argument applies to types, it is better to have a named type you can re-use (like you call functions) rather than have people copy-and-paste huge complex and unintelligible type signatures. |
I disagree, it is all about avoiding having to have higher ranked type annotations. In Haskell (and all other languages I know of except those that use FCP) you must annotate higher ranked types because they cannot be inferred. Haskell cannot infer them, no language I know of can infer them. So this is not about inference. This is about how you achieve the annotation. We already need first-class-polymorphism to correctly handle type-class dictionaries, so we already will have the mechanism for it. We can keep the language simpler and keep the full expressive power, how could anyone not want that? |
@keean wrote:
I guess that is correct. I think I also used the term 'composition' and so it is really boilerplate that is the issue.
At the cost of causing other boilerplate...
I can't reuse a Granted I can destructure those constructors and re-construct a compatible instance, but this is boilerplate and additionally afaics if I have these in a typeclass object, then I will have need a different implementation for each one. Afaics, FCP is increasing boilerplate of composition in exchange for the hope of not having to write type annotations for higher-ranked types (and I say hope because I don't know if FCP inference will integrate with our unions and other features and I don't know if we won't end up needing higher-ranked types any way).
Which is precisely all about achieving (a similar level of expressiveness with) type inference. But the entire point of type inference is to not have boilerplate. I hope you didn't really expect that FCP is some Holy Grail without trade-offs.
It is about how we support type variables with minimal annotation and boilerplate, with the goal of enabling type inference with a boilerplate wrapping versus losing type inference with no boilerplate wrapping.
How is it simpler? You trade off one thing for another.
The difference is we don't normally expect structurally equivalent types to be equivalent (excluding our planned unions and intersections). But functions we normally do expect them to be composable on structurally equivalence (which is why we allow anonymous functions). Note you did not address my comment here: @shelby3 wrote:
|
You avoid nested where clauses in the syntax. The grammar will be simpler, (and this will be visible in the simple EBNF for the language).
We dont normally expect any equivalence operator to work on functions, so I am not sure this is important. Remember top-level functions can be fully polymorphic, so you only need to wrap the argument in the datatype at the call site. Like this:
|
What is wrong with type annotations? They’re great! They add to the ability to reason about code in principle. The downside of annotation is verbosity which makes code harder to read. — |
@keean wrote:
Disagree. We expect assignment of functions to work on structural equivalence.
And so you are right back to type annotations again.
We avoid that but add the boilerplate back in other places. So isn't necessarily a net gain in brevity. It also loses some local reasoning. The nested And you didn't address my comment: @shelby3 wrote:
Generally wrapping interacts badly because it becomes deeply nested!!! Which is why we want a first-class anonymous union instead of massive case-tree logic to destructure |
Because it adds complexity to the implementation and the grammar, and complexity is the enemy of good software. Expressive power is achieving the most with the least.
Nothing is wrong with writing types, but the more complex the nesting of types, especially higher-ranked types is confusing. If we are aiming to make this easy for people coming from OO to pick up then we will want to make the type signatures as easy to understand as possible. As I said above, this has nothing to do with inference, you have to write the types in either case, but the main difference is that with FCP all the quantifiers are implicit and all the where clauses are at the top level, simplifying the syntax, for no loss in expressive power. It makes the code cleaner and easier to read. |
No a single record or typeclass object encapsulates all the polymorphic functions so you do not need this. For example:
Works exactly the same as the single argument tuple datatype posted before. |
It has nothing to do with inference as you have to provide the type signature in either case, it is just where and how that signature is written that changes. FCP uses the same mechanism as type-class dictionaries. |
Callbacks are not local in any case, as the callback could happen from anywhere. Consider async-IO, you call |
Readers to understand what we are discussing now, I think it is helpful to understand this way of emulating higher-ranked types which I was introduced to me at the Rust language forum: trait Foo {
fn call<A: Read>(&self, a: A);
}
fn f<G: Foo>(g: G, pred: bool) {
if pred {
g.call(File::open("foo").unwrap());
} else {
g.call(Cursor::new(b"foo"));
}
} By giving a name fn f<A: Read, G: Fn(A) -> ()>(g: G, pred: bool) {
if pred {
g(File::open("foo").unwrap());
} else {
g(Cursor::new(b"foo"));
}
} The above won't compile without higher-ranked types (which Rust doesn't support), because the quantification of In general, inference of function application over unions and intersections diverges into infinite recursion in the inference and is undecidable. @keean showed me an example of that using expansion variables in private discussions last May. The two higher-order features are like mirrors facing each other, thus enabling mutual recursion. |
It is still better than the alternative of a huge multi-line type signature all in one block. |
@shelby3 looking at the direction you want to go, we may be better off ditching quantified types and type-classes altogether and going for a full intersection and union type system with ad-hoc overloading? |
@keean I haven't made any decision. I am trying to analyze the trade-offs. |
@keean wrote:
Disagree! The deep nesting totally breaks any reasonable composition. Also we don't need to force FCP in order to get type aliases: @shelby3 wrote:
|
@shelby3 I feel you are making statements about a style you have no experience of coding, so I am not sure how much weight to give your objections. |
Intersection types allow functions to be like a type-class where a single function can have different signature that apply to different types. You no longer need the boilerplate of declaring a type-class you just define two functions with different type signatures that have the same name. |
@keean wrote:
You misunderstood. The context of my point was And then factor in deep nesting and you are in implementation proliferation hell. |
Why would you declare two of them then? Thats like saying |
Of course it should be obvious to me that there is no way for the iterator object |
You are correct. I might explain it a bit differently, an 'Iterator' is a concrete type (like the above It might be good to read about the 'Capability' security model, if you are not already familiar: https://en.wikipedia.org/wiki/Capability-based_security The idea is the creator of the array is the 'owner' and the owner always has complete access. The owner can delegate authority to other code. So in any function like this:
We own the array and can create and use readonly and writeonly and readwrite iterators as we want to. However a 'called' function like:
So just like the capability model, called code needs to be annotated with the permissions it requires, and it is up to the caller to delegate the permissions as required:
So here the call to g1 will pass type-checking and the others will fail to type check, because the iterator I pass does not provide the capabilities required. So here the 'capability token' is actually encoded into the type of the iterator. This means we can write secure code, where the type checker prevents code that violates the assumptions from compiling. Of course this is not 'runtime secure' and you could 'byte edit' the machine code to break the security. In a secure capability based operating system, the tokens are runtime values that are issued from the kernel, and on each access they are checked against the operating systems copy of the tokens. The enforcement here, using the type system is about stopping programs going wrong because they are written wrong, it is not about stopping malicious attacks. To prevent malicious attacks where the user has access to the binary requires privilege separation (like the Intel CPU security rings), and that all access is indirected by the kernel. This is why message passing performs well on multi-core systems, because we don't want each read to indirect via the kernel, it is better to receive and send a whole buffer at time, which you own, and therefore have full read/write access to. |
Are you aware I know of the Joe-E programming language. Your explanation was good for other readers. Yes I am aware of all of that (one of the reasons I was concerned about read-only invariants). |
No programming language enforces read-only invariants. We can always patch the machine code to work around any security, as long as all the code is in the user security ring of the CPU hardware. Enforcing such things at runtime is the domain of the operating system, so unless we are going to re-invent Windows and Linux, and have a language that cannot run on mainstream OSs, we are stuck with the security mechanisms the OS provides as a library outside the language itself. |
That is not true for example for Java or any equivalent sandbox. You are essentially arguing that nothing is safe because of viruses, but that is not relevant to the discussion in the context of the security of a high-level sandbox, which was the context which is relevant to the language I am creating. |
You can for example copy the Java VM, and patch it, then run as the user. I am arguing that no language level feature provides security against malicious attacks, only the operating system can do this. |
The definition of a sandbox prevents that. I will reiterate that you are referring to context which is inapplicable to a sandbox. For example, my computer may be vulnerable to viruses, but when I run untrusted code in my sandbox, that untrusted code can not escape the sandbox and do the things you are contemplating. |
Only if the code is interpreted, (in which case the interpreter forms the secure kernel), or you have byte code verification. But this is not easy to get right, Java has been vulnerable to all sorts of attacks due to errors in the byte code verifier. How are you going to do better? |
You mean only if the code is compiled by the sandbox and not the untrusted party. |
Compiling by the sandbox can work, but compilers are big and complex things to distribute. Nobody really does this, probably because of platform differences between windows/macos/Linux and compilation time for big applications can be really long. JavaScript is interpreted / JIT compiled, so is possibly the closest, but it's not really compiled, and WebAssembly will be a target you compile to. Java compiled to a byte code, and the byte code has to be verified before you can run it, which is prone to problems. Remote code execution has so far proved to be a major security hole, and is generally to be avoided if you don't want your machine compromised. Browser exploits that allow machine compromise exist in the wild. Maybe when the web browsers are written in Rust they will have a lower bug rate (where the bugs allow security to be compromised). So in theory it can work, but in practice it's not a good idea. |
And C++ has a 1500 page manual. <joke>Maybe you and @skaller are the only humans on the planet who understand every nuance of C++.</joke> And thus I am here trying to find an 80/20 rule sweet spot between Lucid™ simplicity and priorities on performance. Complexity is not good. We should apply the K.I.S.S. principle.
JavaScript is not Joe-E, i.e. it was not designed well for security. For example, the bug I identified in these issue threads wherein all programming languages that do not have a set of brackets (which the user will never type!) for delimiting strings that can contain executable code, thus enable script injection attacks. (And the people in charge of browser security, including the guy Ian in charge of HTML5 (also), would rather do security theater, i.e. Almost Better Than Nothing™ security, than actual security) Yeah many things suck in the real world of available programming languages and sandbox systems. That is one of the many reasons I am working on Lucid. We may not get it perfect the first iteration, but we can hopefully experiment on ideas for improvements. Operating system and hardware security are also not impervious. Security will never be an absolute. EDIT: security discussion continued in 2018 in a different thread. |
Although we noted up-thread that it’s possible to request a typeclass interface with factory for a parametrised Analogously without HKT, there’s no way to express a functor For this reason, I conclude we must support HKT. Any objections? |
HKT tend to be associated with partial-evaluation/currying so may not be the best fit for our language. We should consider alternatives, like type-families (type level functions). |
Could you please elaborate with an example that illustrates why you think this is the case? And why would you assume our PL will not be using partial-evaluation/currying?
Could you show by example how this is an alternative? I am thinking that iteration with typeclasses and even for your past
AFAICS, we end up assuming that the implementation of the typeclass does not violate that invariant but the compiler is unable to check it. Also note that for Zer0, I’m proposing to forsake decidable global type inference, so HKT don’t present a hindrance in that respect. |
@shelby3 wrote:
What about this:
|
@sighoya I suggested allowing typeclass constraints on the associated type like this:
would require that
|
@keean that is premature specialization. Put the |
@shelby3 This takes us back to the same general problem. This type naturally wants to be an existential. We could make it a closed union, but then we would have to edit the typeclass every time we add an instance. An alternative solution would be an open-union that is closed at link-time. |
@keean wrote:
I have told you this before and I will repeat it again. The compiler can generate the typeclass instance automatically for a closed union type (the non-disjoint, structural unions proposed for Zer0, not Haskell’s sum types which not even isomorphic to unions and thus are not unions) if every type in the closed union has a |
Yes it can, I even said this:
We would have to write:
The problem now is if we want to add an instance for a collection with a Float element type we cannot do so without editing the typeclass definition, which may be in a different module. |
@keean No need to edit the typeclass definition. |
Not if we don't know the type that is selecting the instance. Consider
This does not work. If Personally I think it is more flexible to allow open unions, but I think you only want to allow closed ones? For me an open union (resolved at link time) would offer all the functionality of a close union and an existential, with none of the complexities of the existential. The good news is we can keep the same syntax and associated type mechanics for both, or even existentials if they are allowed, they type checking rule is straightforward, and we require associated types to be either open/closed unions or existentials dependent on the type that is selecting the instance. |
@keean wrote:
And you’re premature specialization “solution” is an oxymoron. It isn’t a solution. It just aids and abets the clusterfuck of open union bounds. Rather I prefer to use my list-of-lists “chunking” solution in the Subtyping issues thread #8 that I need to explicate better soon so we always have closed union bounds for chunks, but open union bounds effectively without incurring that clusterfuck of non-solutions you’re advocating.
Premature specialization is not flexibility. It is an anti-pattern. Yeah I want my solution, not non-solutions. Please don’t ask me to explain all this now in great detail. We will just get into an argument. When the time is right I will. This isn’t my priority right now to explain it. I know what is best and I am moving forward. Because I don’t have time to write another 100 posts to explain something which is obvious for me, but apparently impossible to explain easily to others in my current state-of-mind and haste. You still don’t seem to grok the anti-modularity of your total orders linking proposal even I have written dozens of posts in the Subtyping thread about it. |
Typeclass objects offer dynamic (i.e. run-time and open modularity1) polymorphism by binding the typeclass bound to the static (compile-time) type of the reference and tagging the data type instances that the reference can contain, so that the correct dictionary can be dispatched dynamically at run-time.
They are named "trait objects" in Rust and "existential quantification" (e.g.
forall a
) or "existential typeclass" in Haskell.1 Means we can't compile the universe, so we can't know all the data types required in dependency injection scenarios. Dynamic polymorphism is not just a run-time artifact, but it is also the fact of the real-world where total orders can't exist (i.e. the Second Law of Thermodynamics, Halting Problem, Incompleteness Theorem, Russell's Paradox, Byzantine Generals Problem, Byzantine fault-tolerance with CAP theorem, impossibility of existence without a quantified speed-of-light, etc). As well, there are no total orders of trustless, decentralization either.
Expression Problem
They have the benefit w.r.t. to Wadler's Expression Problem of delaying the binding of interface to data on construction of the typeclass object as opposed to OOP subclassing (
and Scheme/JavaScript prototype2) which binds the interface to all instances of the subclassed type (on instantiation of the data). So they solve the Expression Problem up the point of instantiation of the typeclass object, but beyond that point while they allow adding new data types (via the aforementioned dynamic dispatch) they don't allow extending new operations after the instantation of the typeclass object which binds the allowed typeclass interface(s). Typeclass objects enable heterogeneous collections with dynamic extension of the collection of data types without requiring an invariant collections data structure (i.e. they can be the element type of an array). Whereas, typeclass bounds on function definitions delay binding the interface until the function call site and thus solve the Expression Problem up to that point, but they don't enable heterogeneous collections with dynamic extension of the collection of data types except via unions. Unions can both enable heterogeneous collections with dynamic extension of the collection of data types and delay binding of the interface until the function call site, but they require an invariant collections data structure such as a list.2 Note that prototype programming can add new operations (but forces all instances of a prototype to take those new interfaces) and add new data types, but it can't add different interfaces to instances of the same prototype.
Whereas, type classing can differentiate.(Edit: correction)Syntax and Pecularities
@shelby3 wrote:
@shelby3 wrote:
That seems to be a general solution and I like it very much, unless someone can point out flaws.
So typeclass objects are always conceptually with
| Any
, e.g.:Or in my (and Rust's) preferred OOPish style:
But I prefer we can write that shorthand (and so we only need
where
clauses on function types and definitions):Hope you are also starting to see another reason why I favored an OOPish style because it is also more consistent with typeclass objects (and I actually did intuitively anticipate/expect this but not with complete solution worked out).
P.S. my understanding is that any function defined for a typeclass bound would also accept typeclass objects with that bound. The compiler would create a dynamic dispatch version of the said function on demand to be used when ever a reference to it is passed around. Seems to me that only functions called nominally can be monomorphized.
The text was updated successfully, but these errors were encountered: