-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler Implementation #18
Comments
The parser now has an implementation of the indentation aware parser described in this paper: https://pdfs.semanticscholar.org/cd8e/5faaa60dfa946dd8a79a5917fe52b4bd0346.pdf Here's the implementation of the indentation parser: function IndentationParser(init) {
this.indent = init
}
IndentationParser.prototype.get = function() {
return this.indent
}
IndentationParser.prototype.set = function(i) {
this.indent = i
}
IndentationParser.prototype.relative = function(relation) {
var self = this
return Parsimmon.custom((success, failure) => {
return (stream, i) => {
var j = 0
while (stream.charAt(i + j) == ' ') {
j = j + 1
}
if (relation.op(j, self.indent)) {
self.indent = j
return success(i + j, j)
} else {
return failure(i, 'indentation error: ' + j + relation.err + self.indent)
}
}
})
}
IndentationParser.prototype.absolute = function(target) {
var self = this
return Parsimmon.custom((success, failure) => {
return (stream, i) => {
var j = 0
while (stream.charAt(i + j) == ' ') {
j = j + 1
}
if (j == target) {
self.indent = j
return success(i + j, target)
} else {
return failure(i, 'indentation error: ' + j + ' does not equal ' + target)
}
}
})
}
IndentationParser.prototype.eq = {op: (x, y) => {return x == y}, err: ' does not equal '}
IndentationParser.prototype.ge = {op: (x, y) => {return x >= y}, err: ' is not equal or greater than '}
IndentationParser.prototype.gt = {op: (x, y) => {return x > y}, err: ' is not greater than '}
IndentationParser.prototype.any = {op: (x, y) => {return true}, err: ' cannot fail '} This is what a parser using these new parser combinators looks like: block = Parsimmon.succeed({}).chain(() => {
var indent = Indent.get()
return Parsimmon.seqMap(
Indent.relative(Indent.gt).then(statement),
(cr.then(Indent.relative(Indent.eq)).then(statement)).many(),
(first, blk) => {
blk.unshift(first)
Indent.set(indent)
return {'blk' : blk}
}
)
}) This parses a block of statements, the first line of the block must be more indented than the previous line, and the remaining lines must be indented the same amount as the first line. |
@keean I will catch up with you later on the parser combinator implementation. I haven't employed them ever, so I will need to dedicate some time to that. My first priority is to write the grammar into an EBNF file and check that it is conflict-free, LL(k), and hopefully also context-free. I read that parser combinators can't check those attributes. Also I will want to understand whether using a monadic parser combinator library, forces our AST into a monadic structure and whether that is the ideal way for us to implement. Any way, you are rolling on implementation, so I don't want to discourage you at all. I will try to rally around one way and help code. I will need to study. My focus so far has been on nailing down the syntax and early design decisions. Btw, congrats on getting rolling so quickly on the implementation! Btw, I hate semicolons. Any particular reason you feel you need to litter the code with them? There are only a very few ASI gotchas in JavaScript (and these I think can be checked with jslint) with not including semicolons and these are easy to memorize, such as not having the rest of the line blank after a Also I prefer the style of this latest code compared to what I saw before, because I don't like trying cram too many operations on one LOC. It makes it difficult to read the code IMO. Also, I think I would prefer to employ arrow functions as follows (we'll be porting to self-hosted later so we'll have arrow functions as standard to any ES version and to compromise at 3 spaces indentation (even though I prefer 2 spaces lately): block = Parsimmon.succeed({}).chain(() => {
var indent = Indent.get()
return Parsimmon.seqMap(
Indent.relative(Indent.gt).then(statement),
(cr.then(Indent.relative(Indent.eq)).then(statement)).many(),
(first, blk) => {
blk.unshift(first)
Indent.set(indent)
return {'blk' : blk}
}
)
}) Also I would horizontally align as follows because I love pretty code, which is easier to read: IndentationParser.prototype.eq = {op: eq(x, y) => {return x == y}, err: ' does not equal ' }
IndentationParser.prototype.ge = {op: ge(x, y) => {return x >= y}, err: ' is not equal or greater than '}
IndentationParser.prototype.gt = {op: gt(x, y) => {return x > y}, err: ' is not greater than ' }
IndentationParser.prototype.any = {op: gt(x, y) => {return true }, err: ' cannot fail ' } I may prefer: IndentationParser.prototype.eq = { op: eq(x, y) => {return x == y},
err: ' does not equal ' }
IndentationParser.prototype.ge = { op: ge(x, y) => {return x >= y},
err: ' is not equal or greater than '}
IndentationParser.prototype.gt = { op: gt(x, y) => {return x > y},
err: ' is not greater than ' }
IndentationParser.prototype.any = { op: gt(x, y) => {return true },
err: ' cannot fail ' } Above you are implicitly making the argument again that we should have the ability to name inline functions (without
Which would have helped you catch the error on the duplication of the Or (unless we change the syntax):
|
The main reason to use With regards to our syntax, function definition should be an expression, so you should be able to include it inline in the object declaration. I think we would end up with something like this:
|
@keean wrote:
I know. That is why I wrote:
I had already explained we will get backwards compatibility for free, and by not putting Who can't run our compiler in a modern browser in the meantime? This is only alpha. Please re-read my prior comment, as I added much to the end of it. |
Regarding semi-colons, Douglas Crockford in "JavaScript: The Good Parts" recommends always using semi-colons explicitly because JavaScripts semi-colon insertion can result in the code not doing what you intended. |
I think you are right about '=>' for functions, as it is running in Node which supports them, however, I don't think porting will be that straightforward, as we won't directly support prototypes etc. |
@keean wrote:
Did you not read what I wrote?
http://benalman.com/news/2013/01/advice-javascript-semicolon-haters/ |
Regarding semi-colons:
|
Semicolons won't help you here: return
some long shit; You have to know the rules, whether you use semicolons or not. That is why I am happy we are going to use a Python style indenting. Semicolons are training wheels that don't protect against every failure. |
Also jshint wants you to put them in, and I am using jshint as part of the build process. jshint catches the above error :-) |
JSHint can be configured to allow ASI. And I think it will still warn you about ambiguous implicit cases, if I am not mistaken (it should). |
without semi-colons JSHint cannot recognise the above error because you might mean:
or
|
Bottom line is you have something at the start of the line which could possibly be a line continuation, then check to make sure you have made it unambiguous. That is the simple golden rule and it applies whether using semicolons or not. That is not complicated. One simple rule. |
JavaScript was never designed to be used without semi-colons... lets design our new language not to require them, but I don't see any point in fighting JavaScript... We will emit the semi colons into JS :-) |
@keean wrote:
It should be warning that the case is ambiguous. I can't be faulted for the JSHint programmers being derelict (if they are, did not confirm). |
@keean wrote:
The intent isn't relevant. What is, is what is relevant. We need to know the rules whether we use semicolons or not. We are not stupid programmers who need to make ourselves feel we are more secure by not knowing the rules. I lost my training wheels 30 years ago. Otherwise we need to find a linter that warns of all ambiguous cases with or without semicolons.
If JSHint isn't doing that checking, then it is derelict. Need to find a better linter. Wonder if Douglas Crockford ever considered that. Some influential people decide that semicolons every where is the prescribed way, then why the heck did JS offer ASI any way? Perhaps he could have realized that the only sure way, is to have a linter which properly warns of every ambiguous case, whether using semicolons or not. Instead perhaps these talking heads influenced the JSHint people to not add proper checking for the ASI case? Sigh. |
So here's what the guy that created JS thinks: https://brendaneich.com/2012/04/the-infernal-semicolon/ |
It doesn't matter. It is just logic. There you go. Cockford doesn't agree to support ASI in his tool and thus promulgates that ASI is an error:
That's right:
Know the rules. Newline is not a statement nor expression terminator in JavaScript. Simple as that. Resolve all ambiguous cases. Analogous superfluous redundancy as one wouldn't write |
|
So I don't write code with syntactic errors... I write Python without I write C++ with... it doesn't bother me, I go with what the language standard says... |
Then why did he put it in JavaScript. Linters should do their job correctly. There is absolutely no reason you ever need a |
|
There you go. JS can't require semicolons. So why do you? Probably because we can't use a proper (non-derelict) linter, probably because JSHint probably doesn't warn of all ambiguities with 'ASI' enabled (but I didn't confirm that). We are moving to block indenting to avoid this entire mess. |
Okay, so conclusion, I will use '=>' for anonymous functions, but leave the ';' in for now... Our language won't require semi-colons, just like Python does not... |
The language standard says ASI is a supported feature. You have to know the rules whether you use semicolons or not. I will not repeat this again. Let's try to find a linter which isn't brain-dead. |
The standard says it is a syntax error to omit the semi-colon. |
Then why does it compile. Brendan told you that JS can not require semicolons every where because it breaks other things. |
You are right. If you really can't work with the semi-colons, I will get rid of them for this project. |
Can I delete the semi-colon discussion, as its cluttering the implementation thread... I am going to remove them. I discovered |
Why parser combinators don’t make grammar analyzers extinct: https://research.swtch.com/yaccalive I had made a similar argument to @keean about the need to prove the lack of ambiguities that force backtracking. |
That article is referring to regular expressions not parser combinators as far as I can see. |
See the section Linear vs Exponential Runtime. He is writing about context-free grammars in general. |
That sounds to be in contention with my (our) stated goal:
Because OOP is an anti-pattern and we’ll probably prefer to use functional programming and typeclasses. But I have not yet dug into this detail. Perhaps @keean has an opinion given he has experience in creating a PL which I do not have?
Intuitively that probably also means carrying around a lot of cruft. Being a jack-of-all-trades has its downsides. But I would need to dig in to really analyse accurately the issues. Btw, I did briefly look at ANTLR in the past.
That’s the same argument I made up-thread to @keean in favor of some integration with a generator. Whereas, @keean had begun implementation with parser combinator libraries. Although I presume his work paused due to ongoing discussions and lack of consensus about what the design of the PL should be. Yet I think for myself the essence of what the design should be has solidified, although I am still remaining open to discussion on the design while moving forward on implementation. |
Well the OOP part is only to make the visitor class and code analyser, in
itself it can parse also functional languages or anything else, even text
formats like xml/html.
Im not extremely fond of OOP either, but It's only the parser structure, in
the v4 normally they already removed lot of cruft, they mostly entierely
remade it.
Grammar parser are quite generic, i would believe they will all look the
same, with more or less capacity to parse complex grammar, and more or less
facility to define language structure, antlr is known to be able to handle
all forms of grammars.
What language would you use otherwise ? Im not sure Go is the best language
to make a compiler. But maybe can start from the official Go compiler and
tweak it to add the other features.
There are not that many languages who support typeclass that i know of,
other than haskell and scala, and im not sure its extremely useful to make
a parser/compiler. Need languages that are efficient with text processing,
they dont necessarily needs to handle complex type relationship to do the
compiler.
Not sure of what would make functional language better to make a parser /
compiler. In itself grammar parsing is not something extremely algebraic.
But your goal is to make a functional language like haskell, and then
transpile it to go ?
I'm not sure the structure of functional languages would be very fit to
express high level of concurrency , and Go is still pseudo OOP, i'm not
sure how efficient it would be to transpile a functional language to Go in
order to exploit the concurrency capacity of Go.
Or is the goal to get to some form of dataflow like FRP, and translate this
to Go with channels ?
Le lun. 28 mai 2018 à 08:53, shelby3 <notifications@github.com> a écrit :
… @NodixBlockchain <https://github.com/NodixBlockchain> wrote
<https://github.com/keean/zenscript/issues/35#issuecomment-392429033>:
At the basic it can generate a parser, but its only the lower level, the
power of it come from listener/visitor class who can work on the parse tree.
That sounds to be in contention with my stated goal:
I prefer the compiler to eventually be bootstrapped to self-hosted.
Because OOP is an anti-pattern and we’ll probably prefer to use functional
programming and typeclasses.
As far as i know its the best tool of its kind.
Intuitively that probably also means carrying around a lot of cruft. Being
a jack-of-all-trades has its downsides. But I would need to dig in to
really analyse accurately the issues. Btw, I did briefly look at ANTLR in
the past.
Lexer/tokenizer are easy to do, but making a good parser for complex
grammar can become tricky, especially for protyping when things can change
often, its easy for bugs to creep in, or for the code to become a bit ugly
without a real grammar parser like antlr.
That’s the same argument I made up-thread to @keean
<https://github.com/keean> in favor of some integration with a generator.
Whereas, @keean <https://github.com/keean> had begun implementation with
parser combinator libraries. Although I think he work paused due to ongoing
discussions and lack of consensus about what the design of the PL should
be. Yet I think for myself the essence of what the design should be has
solidified
<https://github.com/keean/zenscript/issues/35#issuecomment-392353454>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ab61I4lsEDf2T2gbrNy9NC_APx82wiMdks5t257ngaJpZM4KHkRu>
.
|
Antlv4 is supposed to have improved the error handling, didnt test it heavily though. But the hard part is generally not the grammar itself, but the step after it, and there its possible to have complete error message with context etc But even with the grammar parsing, antlr v4 is supposed to be able to generate error messages that are better than previously. For the performance i guess it depends, but if the grammar file is not well structured, they can become slow.
For simple straightforward grammar its probably ok, but it can be easy to write rules that become heavy to process without realizing it. http://www.antlr3.org/pipermail/antlr-interest/2009-January/032345.html With hand made parser, its much less likely it will choke on complex rules. Parser generators grammar definition can still be optimized, but its easy to write grammar rules that will become long to process, and as its general purpose, it might also generate some informations that are not useful for the task. But it seems depends on the type of parsing, sometime antlr method is not best adapted to the type of grammar and can become inefficient. For simple grammar, they seem to be equivalant to average hand made, but when the grammar become more complex, it can probably become slower, but depend on the cases. |
@NodixBlockchain wrote:
I don’t know which language I would suggest to use to bootstrap with. Maybe which ever language accomplishes the bootstrapping most quickly, since the goal is to bootstrap to self-hosted. @keean might have an opinion on this?
And Rust. And Swift has something similar to typeclasses. Haskell might be the best choice for bootstrapping, but I feel awkward using it. Just can’t seem to adjust to every function takes one argument. |
For me the fastest way is clearly to use parser generator, can work out a
grammar in a few days or few weeks max, and the result will be reliable.
And it's easy to make changes and tweak the thing around. And antlr has lot
of tools to analyze the grammar, the parse tree, and integrate with IDE.
For performance, the test i made with the C grammar and just half a dozains
rules added to it to parse my object graph, it take several seconds with
the C++ runtime to parse the files i tested, where it take less than a
second to compile the file using gcc or visual studio. And it doesnt even
do the actual compilation, only to do the parsing.
So i would say to parse complex grammar, its still much slower than
production/industrial grade compilers. And it doesnt seem to be easy to
optimize it significantly.
I think the problem with error messages is mostly for parser generators
before antlr v4, or other parser generators, and It's more manageable with
antlr v4, who keep track of tokens position in the source code, and can
have a system of hooks to handle error during grammar parsing, but even the
default error messages are not too bad, it can still say things like
unexpected token on Line X, and give some infos, and for the post grammar
step when the parse tree is generated, It's easy to make error messages
with full context.
Le lun. 28 mai 2018 à 11:59, shelby3 <notifications@github.com> a écrit :
… @NodixBlockchain <https://github.com/NodixBlockchain> wrote:
What language would you use otherwise ? Im not sure Go is the best
language to make a compiler.
I don’t know which language I would suggest to use to bootstrap with.
Maybe which ever language accomplishes the bootstrapping most quickly,
since the goal is to bootstrap to self-hosted. @keean
<https://github.com/keean> might have an opinion on this?
There are not that many languages who support typeclass that i know of,
other than haskell and scala, and im not sure its extremely useful to make
a parser/compiler.
And Rust. And Swift has something similar to typeclasses.
Haskell might be the best choice for bootstrapping, but I feel awkward
using it. Just can’t seem to adjust to every function takes one argument.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ab61I4OCwIPRL9ld8Ldvga9kofMwKfLCks5t28qBgaJpZM4KHkRu>
.
|
@NodixBlockchain I saw your comment in the other thread about ANTLR. I’m currently working on a draft of the proposed syntax for a hopefully context-free LL(k=1) grammar using the analysis and generation SLK tool. When it’s ready, you’re free to try to plug it into ANTLR and see what advantages and disadvantages fall out. EDIT: here is what I have thus far and it is LL(k=2) thus far. Also I found some criticism of the SLK tool I am using on the ANTLR forum. |
@keean wrote in the Modularity thread #39:
You want the AST to have a canonical structure so that changes to the parser grammar don’t require changes to compiler stages that manipulate the AST? Yet doesn’t this still requires a translation stage from the parser grammar to the s-expression grammar? Sounds like this will make the compiler more complex because for example error messages won’t be finely tuned to the parser grammar. And the compiler would be slower. You’re presumably envisioning a more reusable compiler layer upon which different parser grammar can be deployed. Or envisioning much experimentation and changes to parser grammar. This has been your thesis since the beginning that the compiler should be built in canonical orthogonal layers. Frankly that sounds like an academic research project. Am I missing any important advantages you envision? Also I want a self-hosted compiler so there wouldn’t be homoiconicity when programming the compiler passes over the AST. So we won’t be able to write s-expression macros and remain self-hosted, unless we’re using a different s-expression programming language for the AST compiler stages. As you know, a self-hosted compiler provides that anyone who knows the programming language that the compiler compiles, also can read the source code of the compiler. I’m trying to build a commercial tool. I better know what parser grammar the market wants. Otherwise, I shouldn’t be doing this job. I don’t say “we” because I think your focus is different than mine. I think I sort of know what you want based on our past interactions. You want a research project which will a foundational layer for programming language design experimentation. However, if I’m somehow missing some key advantages or if my understanding is myopic, I do hope to learn from you why we should do it the way you’re contemplating? I need a commercial tool completed yesterday. I’m not wanting to dig into some “project for the ages.” I know you have much more hands-on experience than I both building compilers and using many different programming languages. So I always value your knowledge and opinion. But I also have to balance this against my instincts, intuitions, and some modicum of observation of extant programming languages and their issues. I have observed as I read various posts by others on Quora and what not, that I seem to be attuned with what other programmers are thinking and wanting. We need a very streamlined Go with typeclass polymorphism, better modules, and some functional programming features. And we need it pronto. |
@shelby3 software spends longer in the support and maintenance phase than it does in the writing stage. Separating the grammar allows unit testing on the parser (tests consist of input syntax and expected output s-expression) and other layers of the compiler. For something like a compiler where regression testing after bug-fixing is essential it's vital that we follow a Test-Driven-Design approach, and write the tests before the code we want to test. |
@keean I fail to understand why regression tests must operate on an s-expression instead of an AST in our preferred programming language and data structures. |
@shelby3 well you have to write the tests somehow, and human readable is better for validation. Remember an s-expression is just a univeral way of writing a tree (abstract or otherwise), the 'function' the first word after an open parenthesis '(' is the name of the node, and all the subesquent words are its children. For example we can parse an expression like (3 * x) + (2 * y) as:
This is just a way of writing the abstract syntax tree that is human readable. Now it has some drawbacks, and the main one is that we only know the semantics of the arguments to the tree nodes by the position of their arguments, which is fine for things like
The alternative is to use something like XML or JSON to give each node named properties:
or
Whats interesting is that both the s-expression and XML have named objects, and both the JSON and the XML have named propeties, although XML is not regular in that named properties (attributes) do not contain nested XML, which actually has positional arguments like s-expressions as well. This annoys me, it's like there are so many formats, but none of them quite get it right. So lets just try something like:
So here we can imagine that a named argument like "type" could contain further nested s-expressions, and we chose to omit the argument name where it is obvious. We have named objects and named properties for decoration, and those properties can be full expressions so type itself could be a tree structure like:
Its probably also worth mentioning "i-expressions" at this point too, which are s-expressions with the parentheses replaced by indentation like this (and keeping the named arguments too):
Which seems a nice universal way to write down data-structures, but sometimes using the parentheses on a single line can be clearer for short expressions. Overall allowing mix-and-match seems best:
|
@keean we agreed in the Macros and DSLs thread to just use the language syntax as the textual serialization format. I already have named and positional arguments for I like the idea of being able to use indenting in-lieu of parentheses when the nesting is deep. I added that to the rough draft proposal for the Zer0 syntax as an optional function application syntax along with the parenthetical tuple. Note the draft Zer0 grammar I proposed requires instead:
|
I haven't read all comments, but which parser technology you prefer, LL or LR or some extension of it and why? |
I personally prefer parser combinators, and parsing and lexing together, otherwise literals like strings cause problems. As for LL or LR, with parser combinators you can be LL where possible, and then backtrack where necessary. You can do this with combinators that are LL but then having decision points that save the state for backtracking. |
@keean |
@sighoya have a look at my parser combinator library for C++. This works about as well as I could get it in C++. The interesting thing is the state for backtracking, currently this just saves a copy of the whole state at a decision point, and backtracking happens by resetting the file offset and restoring the saved state, I think something better could be done with a read only state and 'diff' based updates that can be undone, a bit like how we can unpick state in a Prolog interpreter by rolling back unifications. Prolog makes writing compilers so easy. If the self interpreter was Lisp's special trick I think the self compiler us probably Prologs's party piece. |
I am working on a proposed LL(k) grammar for the syntax of Zer0 and I hope to entirely avoid backtracing. Hopefully k limited to 2 tokens of look ahead. I will continue to update that linked post as I do more work on the grammar. I got sidetracked with the recent discussion about Subtyping and typeclasses. I need those design decisions so I know what to put in the grammar. Notwithstanding whether we implement that grammar with parser combinators or table driven LL(k) algorithm, I think checking the grammar in a LL(k) parser generator such as SLKpg is important. This allows me (us) to experiment rapidly with grammar tweaks and to check that it’s sound. Note these comments from the ANTLR creators about the SLKpg parser generator. I suppose soon I will submit that grammar to a Zer0 Git repository so others can contribute. I wanted to get something fairly solid before doing that, since our design discussions are still in flux. I am thinking of using Gitlab instead of Github since Github has been bought by Microsoft. My recollection from prior discussion is that @keean thinks parser combinators are better for more precise compile-time errors. Note the SLKpg documentation says:
P.S. I hope others will help develop Zer0. I really don’t want to try to do it all by myself. I am trying to find someone to hire to work on it with us. If anyone knows anyone sufficiently qualified who wants to work on it and be compensated, please have them contact me via email: shelby@coolpage or they can add me on LinkedIn: https://www.linkedin.com/in/shelby-moore-iii-b31488b0 I do not know yet whether what I want to do for Zer0 can incorporate sufficient set of what @keean wants so that we work on the same code base. @keean had wanted to make the typeclasses sort of like a Prolog logic which compiler builders could build on top of. I think we discussed that up-thread. Also see me arguing against that on a specific issue recently in the Subtyping discussion. I am wanting to go more directly the specific goal that I want for Zer0 and not try to build the most generalized underlying programmable engine possible. Because my needs are for this to be production ready ASAP, not an ongoing experimental project. Yet I remain open-minded to whatever seems to make the most sense and is realistic within available resources and mutual goals. |
Now that we’re contemplating targeting Go as the initial compiler target for Zer0 and since we won’t have a self-hosting compiler on the first working version of the compiler, what programming language do you guys think would be the best to implement the compiler? Which PL could we most readily complete the compiler? Haskell? (OMG I suck at Haskell but I guess I could learn). |
@keean wrote in 2016 in response to my comment:
Could you give me an example of hand modification you would envision, if my responses below don’t ameliorate your point? Because I’m contemplating that if we are the ones creating the parser generator, then we can modify it as necessary to accommodate whatever we need, including making sure the performance of the generated code is awesome. LL(k) table driven parsing is more efficient than parser combinators or hand-coded recursive decent, because there’s no backtracking.
Agreed. I was not proposing to do that. I was proposing to have the source code file parsed by the generated parser into the initial AST. Hand-coded compilation stages would operate on that automatically generated AST.
I presume what you mean is that a particular function can operate on a subset of the grammar? Why wouldn’t that also be the case with a LL(k) grammar? Backtracking is because you don’t have a LL(k) table of (up to
Trying to grok your point. Seems you’re referring to the boilerplate necessary to visit each node (object) the DAG tree of the AST and do useful work on it such as transforming it or (apparently as shown in your example) serializing it to output code of the compiler. So AFAICT it seems that for every action we want to apply to the AST, we define a type class for that action and write a type class instance for every data type (i.e. where data types are declared with The main boilerplate to eliminate in this hand-coded code is to automate the case where the type class instance does nothing but walk its children, so the only thing that changes between type classes is the name of the type class. Is it possible to express this in Haskell? I think we can do this in Scala with a macro. Seems very straightforward and elegant to me. What if anything am I missing from your point? It’s interesting to compare how the above would be accomplished without type classes. In that case we would write functions that input each object type, and which call other such functions for their child objects. So superficially there seems to be no significant advantage for type classes. But for example let’s consider the multiple choice productions of a grammar rule such as:
The possible productions of So in the type class case the type class bounded function (i.e. a function with Without type classes the Note function overloading seems to accomplish much of what type classes do, except no concept of a default function and no associated output types (other than a return type of the overloaded function if overloading allows varying the return type). The §10.2.3 Ad-hoc polymorphism is recognizing the congruence to function overloads with dependent return types (which is analogous to a type class with output associated type): As I alluded to, the case for type classes becomes even more compelling if some actions on the AST only process some types and so many of the types will simply apply the action to any children of the said type (thus walking the AST). So if we have some way to make a compiler smarter about this case, then we can write one type class that does nothing but walk the AST. And then we can tell the compiler that when an instance for a type class of a specific action is not available, then this means use the said default type class instead which only walks, but apply the action type class to any children. And repeat that recursively. So in this way we would only have to write instances for types which do an action, unlike in the case without type classes, were we would have to hand-code all the functions for every type for every action even if only the default of walking children applies to a type for a particular action. Those who don’t understand type classes well may not be able to visualize what I’m describing but they will learn by example when I implement this. Of course we may be able to accomplish analogous automation of coding and refactoring without type classes by employing macros but @keean already argued that macros are bad because they essentially fork the PL. Much better to have a consistent type class feature in the PL which all users of that PL already understand and so they don’t have to learn custom macros for each different code base. ¹ One could presumably write a macro in some PLs which would automate that logic, but @keean already argued that macros are bad because they essentially fork the PL. A tangential issue is how will we simulate anonymous, structural unions in Scala 2 (until Scala 3 which has unions is ready to deploy ~2020). The only way I can think that will really work correctly is to employ a Note employing such inheritance thus allows the aforementioned “vacuous reasoning” if the said traits are employed in some type parametrised logic. Normally we would want the compiler (e.g. Zer0) to exclude anonymous, structural unions from those cases that can cause “vacuous reasoning”. So if manually coding said trait simulation of unions, then programmer should avoid passing the said traits into type parameters which would cause “vacuous reasoning”. I think that could be accomplished by only applying type class bound operations on anonymous, structural unions (and thus their trait simulations), which effectively removes any overlapping non-disjoint portions of any union of said unions because the type class instances are resolved at compile-time and can’t be overlapping.³ ² With some algorithmically generated type name such as all the possible member types concatenated with underscore separator character, e.g. ³ Haskell does have an optional extension for overlapping instances but presuming not enabling overlapping instances. |
@shelby3 wrote:
Can't we lookahead in parser combinators? For the rest of non LL(k) grammar, why not spawning multiple threads parsing different rules and write their result to some shared state, but the clue is that only one will write the result as the others will kill themselves because of no production satisfaction, implying no synchronization. @keean wrote:
I don't see the reason why inheritance causes this bloat. What you may complaining about is the use of the visitor anti-pattern, a fact even accepted by a fair amount of OOP lovers. @shelby3 wrote:
Dynamic dispatch, yes but the functions are monomorphized anyways, only an initial branching is required to choose the appropriate monomorphized function instead to call one function accepting a box of possible types.
Without ad hoc polymorphism, method overloading would also suffice. And macros are indeed a bad deal to handle ad hoc dispatching.
Or inheritance in general, creating wrapper types all inheriting from the same supertype encoding the specified union type, each wrapper type wraps all the values of exactly one type in the union. Then you would wrap and unwrap values and do some ugly casts. |
@sighoya wrote:
Apparently not holistically to avoid all cases of backtracking, because that requires a holistic analysis of the grammar. You could for example familiarize yourself (if not already familiar) with the concepts of first and follow sets which is a global table for the entire LL(k) grammar.
Agreed we’re not referring to the boxed
This entails adding virtual methods to a base class (of all types in the AST) for each action instead of declaring a new type class for each action we want to perform on all types in the AST. Note that cases where the type is known at compile-time will not monomorphise the dispatch as they would be with type classes because virtual methods are always dynamic dispatch. Statically resolved function overloading would be monomorphised dispatch, not require adding methods to a base case , and avoid the subtyping issue below. But that doesn’t remove the boilerplate of needing to implement that virtual method for every type even if all that type does for a particular action is walk its children and delegate the action to its children. Whereas, I proposed an extension for type classes (not currently available in Haskell apparently) which I posit would automate that boilerplate. The problem with method overloading is it requires subtyping and in my prior post I have already explained (and linked to detailed explanations) that this subtyping causes unsoundness in the type system. I posited that only subtyping of the anonymous, structural unions can possibly be sound by restricting them only to type class bounded operations. All System F and programming languages with subtyping (along with intersections and/or generics) are apparently unsound! I think that is a profound realization. And basically vindicates @keean’s dislike of the type systems in Scala and Java.
Scala has |
@shelby3 @sighoya There are many comments I could make, but being pushed for time, I will go for the one that stands out the most. Of course overloading (providing it is multiple dispatch, so all functions arguments take place in specialisation) can do what type-classes can do, type-classes are constrained (controlled) overloading after all. The question is more the other way around, can you do everything useful with type-classes that you can do with overloads? If so then type-classes have the advantage of having a static (polymorphic) type known at compile time for all overloads. As such they are a specification for all future overloads that can be defined. Ad-hoc overloading is fine for uni-typed languages, but in typed languages a future overload could break existing code (where there is dynamic dispatch). Further funding the least general supertype is not solvable for all cases (not decidable) and is not principal (there is not always one clear least general supertype). Type-classes provide generalisation of types over overloads. |
Let me give my interpretation of @keean’s latest post above. AFAICT, he is essentially reiterating the point I was making¹ over at the Pony developer mailing list, that typeclasses can substitute for the need to have intersection types in the type system. Because instead of modeling the overloaded function as an intersection of function types in the type system (for each specific permutation of concrete types for the overloaded input arguments), we instead model the function as polymorphic (i.e. generic type parameters) on those overload input arguments. The overloaded implementations are created by the compiler by finding instances of the required typeclass interfaces, but the type system is not involved in this unless the required typeclasses have associated types. Yet as I had explained, even associated types don’t introduce the need for intersection types nor subtyping. So typeclasses enable controlling of overloading of function definition’s input argument types orthogonal to the type system. The function definition is fully polymorphic so it will have the same polymorphic type in the future. Future not yet contemplated overloads will be added orthogonal to the type of the function definition, at the function call site. Whereas, with concrete (i.e. not polymorphic) ad hoc overloading, the function definition’s type (which is an intersection type) changes when adding new overloads. This is why I suppose typeclasses are referred to as ad-hoc polymorphism, as a distinct terminology to ad hoc overloading. @keean is pointing out that intersection types inherently introduce subtyping and given a union or intersection of types then finding the greatest-upper bound (GUB, aka least general supertype) makes principal type inference undecidable. That is why MLsub put all unions and intersections in the ground types, so that GUB is always the intersection or union itself. Note however as I cited for the MLsub paper, when the return type of a function depends on input types via associated types, then the principal types are intersections which is a contradiction of @keean’s point. AFAICT, what is really going on is that Haskell has no other subtyping other than what would be introduced by the inherent intersection types created by associated types. Haskell is able to attain global principal type inference by placing restrictions on the rest of the type system (i.e. no other subtyping in the type system). I pointed out at the Pony developer mailing list how subtyping of traits and interfaces is ostensibly making the type system unwieldy and which could be discarded in favor of typeclasses as a suitable (and IMO superior) replacement. So I really don’t find @keean’s type system related point to be entirely consistent and compelling (the devil is in the details and function overloading could also form principal types if the rest of the type system is limited as necessary for Haskell). Rather I think my point was the salient one, in that typeclasses provide a more structured form of overloading and thus can enable us to better automate boilerplate. It was very difficult for me to learn the generative essence point of typeclasses, because everyone seems to mislead the reader, as for example how @keean is doing here. The truth is actually what I wrote. But no where will anyone tell you this! ¹ Although my point was about intersection types in general for forming subtypes from intersections of traits and interfaces. @keean is focused on intersections of function types, which as I explain AFAICT we still need with typeclasses if we have associated types. My point over at the Pony developer mailing list was to simplify (and remove a source of unsoundness from) the type system by removing intersection types for traits and interfaces, because we can attain analogous functionality with typeclasses. |
Discussion about the ongoing implementation of the compiler.
The text was updated successfully, but these errors were encountered: