-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax summary #11
Comments
I am not sure I like
I am not sure we want to use I prefer having 'implementation' before the type-class having the type class first for implements seems inconsistent to me. I also prefer to treat all type class parameters equally. The first is not special so why give it special syntax.
I am not sure why you put types in the function call syntax? I don't think you need or want them, you only want typed in function definitions. I don't like that the method syntax is different from the function definition syntax. I think we should have a unified record/struct syntax. If we have:
A record above is like a type-class but you can pass it as a first class value. If we can 'promote' this to implicit use, we can have a single unified definition syntax. Maybe:
|
@keean wrote:
Can't be
I personally can tolerate
Why not? Sum types are an "or" relationship. Unions are an "or" relationship.
Inconsistent with what?
Afaik, I didn't. What are you referring to? |
Ah I see:
This is ambiguous... is it calling
|
@keean wrote:
Good catch. I missed that one. It indeed conflicts with comma delimited groups in general, not just function calls. I will remove after sleep. You didn't point out that problem to me when I suggested it. Remember I was trying to make the inline syntax shorter, to avoid the Edit: there is another option (again
But that is still NFG! Because it is LL(inf) because without the leading |
Personally I would rather have a single syntax for function definitions. If that is (with
Then passing to a callback would be:
and then things are consistent. I think keeping things short is important, but I think consistency is even more important. |
@keean the only point was to have an optional shorthand syntax (instead of the inconsistent semantics of Thus we don't need
Which is shorter than and removes the garish symbol soup
That being generally useful shorthand, enables your request for an optional syntax in the special case of single argument (which I was against because it was only for that one special case):
Instead of:
However, it isn't that much shorter and the reduction in symbol soup isn't so drastic, so I am wondering if it is violating the guiding principle that I promoted? Short inline functions might be frequent? If yes, then I vote for having the shorthand alternative since it it would be the only way to write a more concise and less garish inline function in general for a frequent use case. Otherwise I vote against. |
Are we optimising too soon? I have implemented the basic function parser for the standard syntax, is that good enough for now? I think maybe we should try writing some programs before coming back to optimise the notation. I would suggest sticking to "only one way to do things" for now, because that means there is only one parser for function definitions, which will keep the implementation simpler for now. What do you think? |
Thanks for reminding me about when I reminded you about when you reminded others to not do premature optimization. I agree with not including the shorthand for now. Then we can later decide if we really benefit from it. I'll leave it in the syntax summary with a footnote. |
The compiler can now take a string like this
compile it to:
Next thing to sort is the block indenting, and then it should be able to compile multi-line function definitions and application. |
I think we should have an provisional section, so we can split the syntax into currently implemented, and under consideration. |
@keean wrote:
I'll do if the † instances become numerous enough to justify duplication. |
Link to discussion about unification of structural lexical scope syntax. |
Are we sure having keywords |
@keean wrote:
Instead I have proposed unified functions around What would be the alternative to not having |
Agreed. My suggestions on types of name tokens for the lexer to produce:
The exclusivity for type parameters for all uppercase is so they don't have to be declared with Edit: the distinction between named functions and non-functions references will be useful, because unnamed functions references should be rarer. However, I was incorrectly thinking that it wouldn't make any sense to give function naming to unnamed function references (which have re-assignable references) because the reference would indicate it is for a function but I had the incorrect thinking the reference could be reassigned a non-function type (but reference types can never change after initial assignment). So I think it would be safe to change the above to:
The other advantage of that is the lexer can tell the parser to expect a function, which is more efficient and provides two-factor error checking. Note the compiler must check that the inferred type of the reference matches the function versus non-function token for the name. |
(Aside: Very few languages have a clean lexer and often you end up with lexer state depending on compiler state (string literals are a classic example). One of the advantages of parser combinators like Parsec is that you can write lexer-less parsers, and that cleans up the spaghetti of having the lexer depend on the state of the parse. )
Conclusion, nothing is going to be perfect. My favourite would be: datatypes and typeclasses : [A-Z][a-zA-Z_0-9][']+ This would have both type variables and value variables lower case. I like the mathematical notation of having a 'prime' variable:
|
@keean wrote:
Agreed. For readers, by "If we do not introduce type-variables" you mean if we do not prefix
You have a point, but it is not an unequivocal one. We can require typeclasses begin with a lowercase It not only helps to read the code without syntax highlighting (and even 'with', if don't want rainbow coloring of everything), it also speeds up the parser (because the lexer provides context).
If the string literal delimiters are context-free w.r.t. to the rest of the grammar, then the lexer can solely determine what is inside a string literal and thus not parse those characters as other grammar tokens (aka terminals). Edit: the proposed paired delimiters will resolve this issue. I believe if the grammar is context-free (or at least context-free w.r.t. to name categories) this will reduce conflation of lexer and parser state machines. That is why I suggested that we must check the grammar in a tool such as SLK, so that we can be sure it has the desirable properties we need for maximum future optimization. I am hoping we can also target JIT and ZenScript become the replacement for JavaScript as the language of the world. Perhaps the type checker for our system will be simpler than subclassing and thus performant. Even Google now realizes that sound typing is necessary to solve performance and other issues.
I still need to come up to speed on the library you are using to know what I think about tradeoffs. Obviously I am in favor of sound principles of software engineering, but I really can't comment about the details yet due to lack of sufficient understanding. I will just say I am happy you are working on implementing and I hoping to catch up and also look at other aspects you may or may not be thinking about.
The Note I had a logic error in my prior comment, in that single word function and non-function names were indistinguishable in what I proposed. But that doesn't destroy the utility of the cases where function names are camel case.
I want to make what I think should be a convincing rational point about proper names. I don't like So I can objectively conclude your preference is not consistent to types as proper names, headlines, or titles, which is what they are. <joke>You are British, so you should be more proper than me, lol.</joke> Although my last name is "Moore" and first name was a family name "Shelby" originating from north England meaning "willow manor". And I've got "Hartwick" (German), "Primo" (southern France/Italian) and "Deason" (diluted Cherokee native American) ancestry as well.
I don't think I have an objection to this as a suffix only. Edit: however one issue with camel case and no underscores is when an entire word which is an acronym is not delimited by the capitalization of the word which follows it, e.g. |
@keean wrote:
You didn't differentiate from ALLCAPS type parameters above. Also your regular expression seems incorrect, as Note that JavaScript allows So the Here is what I arrive at now in compromise:
I like the leading Edit: no need to allow uppercase in non-function references. Who on God's earth is using camel case for variable (i.e. non-function) reference names? 😆 |
👀 Type parameters will nearly always be a single letter. We both must compromise to what is rational. I have compromised above forsaking required camel case on functions. I also compromised (well more like I fell in love once we eliminated need for subclassing syntax) and accepted Haskell's I don't want the noise of declaring Also the lowercase letter choice for type parameters is not idiomatic and it is has no visual contrast in the Also type parameters are types, thus they should not be lowercase. That would be inconsistent with our uppercase first-letter on all types. The lowercase type parameters of Haskell (combined with lack of If you are making a Haskell language, I don't think it will be popular. I am here to make a popular language, thus I will resist you on this issue. One of my necessary roles here is to provide the non-Haskell perspective. Let's do something very cool and eliminate the need to declare
😧 I absolutely hate that. First time I saw that, I was totally confused. And I hate Rust's lifetime annotations littering the code with noise. I don't like Haskell and ML syntax. Not only am I lacking familiarity (not second-nature) with their syntax, but I dislike much of the syntax (and even some of the concepts) of those academic languages for logical reasons which I have explained in prior comments. I realize their target market is the 0.01 - 0.1% of the population that are academics (and what ever subset of that which are programmers). If you want to bring in most of the syntax and the obtuseness from those languages, then I think we have different understanding of what the mainstream wants. I am not a verbal thinker. I always score higher on IQ tests that are measuring visual mathematical skills, rather than verbal skills. My I/O engine is weaker than my conceptual thought engine (I think this is why I get fatigued with discussions because my I/O engine can't keep up with my thoughts). My reading comprehension of English is 99th percentile, but my articulation and vocabulary are in the high 80s or low 90s. So apparently I dislike complex linguistic computation. I seem to struggle more with sequencing or the flattening out what I "see" in multi-dimensions into a sequential understanding. My math and conceptual engine is higher (more rare) than 99th percentile, but not genius. So someone with more highly developed linguistic computation than myself, would probably find my desire for linguistic structure to be arbitrary and unnecessary. I've been working on my weakness, but I do find it takes energy away from my thought engine, which is where I feel more happy and efficient.
Please differentiate between function declaration and function call. I had written about that 3 days ago:
Please catch up with recent corrections to the syntax. |
@shelby3 wrote:
So I am totally with you on the above. The problem is without introducing the type variables, how do we distinguish between types and variables, for example:
Are they single letter types, or type variables? We often want to re-use type variables like 'A' a lot consider:
The problem with making type variables all uppercase is it does not distinguish type names. Do we insist that all type names have more than one letter? |
@keean wrote:
Type variable per the regular expressions I proposed. 💭 I see you are preferring "type variables" to the term "type parameters". I suppose this is to distinguish from function parameter (arguments).
Yes. However... I see now our conflict in preferences. I am thinking type names should be informational; single letter proper names are extremely rare and not self documenting, so I thought it was okay to just not allow them. You are apparently thinking of supporting math notation in code. Which is evident by your Mainstream programmers typically don't (or rarely) want do math notation in code. In my proposal they can still get math notation with 💡 I think there is another solution which would give you single-letter 🔨 And when there is a single-letter data name in conflict with a type parameter in scope, then I think we should have a compiler warning that must be turned off with compiler flag. The warning should tell the programmer to use Would that solve the problem for you? I don't think the single-letter |
@shelby3 wrote:
There is a problem remaining. The order of the type parameters in the optional Edit: and that leads to a very obscure and probably very rare programmer error, in that if not all the type variables are specified in the argument list (i.e. some are only in the Edit#2: and note it should be quite odd and extremely rare that the programmer wants to constrain at the call site, a type variable that is not in the argument list or result value. Also the following function call is much more informational than 💡
And allows us to specify only some constraints:
And it is more consistent with the syntax of function declaration. Of if we prefer:
So maybe we can disallow |
The number and function of the type parameters is not the same as the number and type of the arguments, some type parameters may only occur in the where clause. Consider:
Note, Rust would not allow this, as you have to introduce all type parameters, which makes them less useful. Really we have to have the type parameters if we want to have parametric types (that is types that are monomorphisable). If we are happy to give up monomorphisation we can have universally quantified types instead, and then there is no need to have type parameters at all. In some regards I would prefer universally quantified types from a purely type system perspective, but it is much easier to implement monomorphisation with parametric types. If you really want to get rid of the type parameters, then lets switch to universal quantification. |
I would rather say minimum two letters the second of which must be lower case for datatypes, and all caps for type variables. Also we can use universal quantification to get rid of type parameters (although it does change what types are valid in the type system). |
I would suggest lexical scoping for type variables, so in my example above the I think this satisfies the principle of least surprise. |
@keean wrote:
Did you not read the comment of mine immediately before yours? I also explained that exact issue and offered a solution. |
Here's an interesting one, we need to write the type of a function, and we agree function definition should be an expression.
This should be possible too because I will want to pass functions to other functions:
|
Even with it working, it’s no good for typing at programming speed. But I bet we could write a utility which programmers could download to get their system to do easy mnemonic keystrokes. Also custom "click keys" keyboards are really something a serious programmer should have. These can be configured to dedicate a key for a custom action such as a Unicode character. They even sell customized key tops with the character laser printed on it. |
Regarding recent upthread discussion of JSON vs. XML which began far upthread, after further thought I agree it would be better to model for example HTML with the full generality our programming language such as actual data types (e.g. |
Indented blocks instead of curly brace-delimited blocks is one of (the lesser of) the significant motivations for me to create a transpiler instead of coding in TypeScript. The motivation was to reduce the noisy clutter of curly braces and gain more lucid structuring of the code for maximizing readability in the open source era. I have formed I think a firm opinion that indented (i.e. recursive) blocks within expressions constitutes code that is too much crammed into one conceptual unit and thus difficult to read. Also it creates a dangling terminating
Edit: unless Note Edit#2: note the option (c.f. the 1 footnote at the linked post) for potentially allowing semicolons. |
One reason to allow flexible indenting is to allow alignment with characters, for example:
|
@keean that is a line continuation and is supported by the #2 rules in my prior comment post excepting that I specified the line breaks be before the operators. I was thinking of The proposed 2 spaces rule (with the contextual exceptions for #1) for block indenting is for delineating blocks not for line continuations. Edit: another advantage of forcing operators to be at the end of lines (that are followed by line continuations) is that they can thus never appear at the start of line. Thus we could reuse one of them such as
The
|
If we specify line breaks after operators, then we always know when the line is continued. If before there are some circumstances when it can be ambiguous. JavaScript has this problem and needs semi-colons in certain circumstances. |
If we specify line breaks after operators, then we always know when the line is continued. If before there are some circumstances when it can be ambiguous. JavaScript has this problem and needs semi-colons in certain circumstances. I think you are okay with && and ||, but it could be confusing to make exceptions for them. |
Agreed. I also know about that. A context-free grammar prevents those ambiguities (one of the reasons I am writing the LL(k) grammar and checking it with SLK), so any such issue would have some to light when checking the grammar. In this case however, I think the grammatical ambiguity can not occur regardless whether the line continuation is allowed before or aft the operator, because the enforced block indenting differentiates a new expression from a line continuation.
Agreed. Good point. In the past with curly brace-delimited blocks, it seemed advantageous to not have them hanging off the far RHS at the end of lines given that conditional expressions can be quite long at times, and having at the start of lines helped to distinguish them as line continuations versus block indents (when the opening curly brace |
@keean made the point to me in private that we will need imperative control over iterators to achieve some more complex algorithms such as a binary search because of the need to treat iterators as points instead of ranges. I agreed and pointed out that I think I prefer my prior example HOF (higher-order function) approach for the range cases because it is less verbose and I think perhaps a smart compiler can in some cases convert those HOF variants to imperatively optimized code (e.g. filter and map algorithms), thus indicating that imperatively expressing them would be boilerplate. In general converting all HOF encoded algorithms to imperative algorithms is not practical, so I am not proposing that. For extensibility of the concept beyond some hard-coded optimization in the compiler for filter and map, I ponder if there is some way a HOF library could teach the compiler how to imperatively optimize the HOF userland (aka non-library) code employing the library. If it is boilerplate, then the transformation should be deterministic and probably can be done with an AST macro. @keean claimed we never need AST macros. Maybe we never want AST macros in the abstraction layer above the language (due to debugging obfuscation, enables unreadable non-standard syntax, etc), but below the language perhaps they are essentially compiler plugins. Hmm. |
I am modifying my original proposed syntax for Given the proposed simplification of one module per file, then we do not need the hierarchical When the non-type annotation uses of the imports are always accessed via a ( |
Personally I don't want one module per file. I have done that in Java, and I didn't like it. You can end up with a directory full of very small files, which makes the code hard to read.. A module is about data hiding and API as well as separate compilation. I would also not require asynchronous loading, it tends to make applications very slow, and people only tend to use it in development preferring to use tree-shaking tools like rollup to put everything in a single file for production (and minifying it too). I don't even use dynamic loading in development because it makes page load very slow. |
We already discussed your disagreement on this issue. Java’s case was more egregious because as you pointed out that Java requires the filename named according to the class thus forcing a proliferation of files, and I have not proposed such a limitation. I understand that if modules will be quite small then a proliferation of small files means more files to open instead of just scrolling down. But in theory an IDE can solve this problem and offering to scroll vertically across multiple modules making them appear as though they are in one file. Did no Java IDE ever offer such an improvement? (if not, what has happened to our profession, that programmers are not able to meet the needs of huge ecosystem markets!) Adding multiple modules within files requires adding a
And thus I see no valid reason to have more than one in the same file. A module is a unit of modularity, and thus separating modular items into files makes for clean repository and version control system. One module per file is compatible with JavaScript and TypeScript. The least amount of unnecessary discord with the target output language is very desirable. PL design is about extreme prioritization and leaving as much as possible out of the language and not including every little nuanced option we might want to throw in the kitchen sink. You recently stated that you are coding a lot in TypeScript. How are you coping with this “major downgrade” to one module per file?
Code should be loaded once and then run many times over the course of application, thus any overhead of a
We do not want specialization in the source code. K.I.S.S. principle (do not eat the complexity budget). I think you are putting design in the wrong abstraction layer:
|
Typescript allows multiple namespaces in the same file. It all depends on what you specifically mean by a "module".
Local disc latency is still a problem. You should have a play with dynamic loading tools like SystemJS, obviously it has to load over HTTP, how else can JavaScript load anything (unless you are talking about running in Node.js rather than in the browser, then you can output 'require' statements and have dynamic loading, but it won't work in the browser).
If you compile to TypeScript, then you can use the output options to choose CommonJS, ES6, SystemJS for loading, so it can work in Node or in the browser with dynamic loading or bundled. You would just use the normal TypeScript 'import' statements. |
And afaics it is not really recommended to combine them. It is adding complexity and I do not understand the claimed big win of doing so? I know sometimes perhaps we need to express privacy access inter-module, so if we supported that feature then we might like to have more than one module in a file (even though it still would not be absolutely necessary). But I did not yet propose offering that feature and probably want to avoid it unless we really need it.
Are you just stating that to document in detail your understanding of what I wrote I would do? Or is there some other reason you mention the details of my solution? The compiler will optimize it. If I have to fork TypeScript, then so be it. But hopefully I can work within the feature set of TypeScript. |
You wrote you wanted to use promises to dynamically load modules. I stated you were focusing on the wrong abstraction layer and should just output static import statements and let typescript handle it (or implement the same kind of compiler options as typescript if going straight to JavaScript). Edit: I think I see what you mean, you want to represent module loading as a promise in the source language, not the target language. This is problematic because it means modules must be first-class entities. This means that all module dependencies and relationships need to be expressible in the core type system so that modules a typeable value. This is possible, and is something I have been looking at doing, but it requires careful selection and combining of type system features to make sure the type system is sound. |
@keean wrote (in private):
I was thinking about this also before I read your message. I then thought for those who want these groupings of modules, we could offer a grouping keyword and they could place this in a separate file. A smart editor would know when loading the said file, to also load the referenced modules as a linear text view. With this separation-of-concerns, we K.I.S.S. principle the base format for modules keeping them coherent with EMCAScript modules (one per file) and prevent clutter of Win-win design via separation-of-concerns. I understand it looses the simplicity of being able to see the linear text grouping with any unspecialized text editor (e.g. Notepad). But it does not loose the power of reasoning about language orthogonal to GUIs. And if force everything to be literally copied in text, this is less powerful than structure by named reference, e.g. we would prefer |
I understand maybe I did not sufficiently explain my proposal. Copying TypeScript’s method where all imported typings from the module are resolved statically and if none of the runtime code of the module is accessed then nothing is loaded at runtime. Any runtime code (and state) is loaded via an asynchronous model before it can be accessed (either via an explicit assignment of the I do not understand why you think this introduces type soundness issues? TypeScript does not seem to think so? Perhaps you are thinking that the types in the module would first-class? I did not intend to propose that. I am proposing that only the object of (non- |
A module is a unit of data hiding like an object. It doesn't necessarily have anything to do with files. See 'ML modules'. |
Also type hiding.
Why have files? Let’s just put the entire program in one file. Or better yet, let’s put all programs that were ever created and will ever be created in one file. Obviously modularity has nothing to do with files. I think the more salient question is not whether we need more than one module per file, rather whether we need more than one file per module? I am approaching modules conceptually as units of reusable repositories in an open source ecosystem similar to Rust Crates or npmjs.
ML must be one of the most popular languages. Why are we creating a language. There is already an OCaML to JavaScript transpiler. I believe we discussed that typeclasses can be simulated with OCaML’s features. One of the reasons I provided was because we would lose integration with JavaScript such as So instead of appealing to authority of ML, could you instead provide examples of which features where our modules will require multiple modules per file or multiple files per module? Because I am not really healthy enough to expend the effort to research ML modules and try to figure out myself which uses cases dictate such. Edit: modularity-of-encapsulation might not be equivalent to modularity-of-reuse. A module-of-reuse might need more than one module-of-encapsulation, i.e. that each of modules of encapsulation is incomplete without each other. Yet this is expressed by typing and the consumer of the module-of-reuse has to The reason to put each module-of-encapsulation in a separate file is so that it can be referenced orthogonally by another module-of-reuse, which reuses only some of the modules of encapsulation. K.I.S.S.. rather than add complexity. If a lot of small files makes it difficult to see the modules of reuse that group these small modules of encapsulation, then put them in a separate directory (folder) in the file system. |
This is LL(k=2) thus far. Only 1 token of lookahead. I will update this as I progress to add the rest of the grammar. Eventually this will be added to a new Zer0 repository.
|
I wrote in 2016:
I propose for Zer0 to adopt Go’s decision to only support
The optional We will have iterators implemented in libraries instead of Go’s ranges:
If instead are willing to sacrifice performance by making either
So I think we should adopt a shorthand syntax sugar which our compiler will expand to the optimally performant encoding for the first example above:
Similar to Go, let’s also offer optional syntax for the key (aka index) for iterables that have it:
And when only the key is desired:
See the example which exemplifies how much more readable this is as compared to the traditional |
I wrote on Quora:
I wrote on Quora:
|
I have updated the LL(k) grammar for the proposed syntax of Zer0. It’s not yet completed. Currently it’s still I haven’t yet added the syntax for type system and typeclass. That's next to do. Note one of the main changes in this update to the work-in-progress is making the function declaration syntax not so unfamiliar, which is also reflected in the edit of the examples in my July 6 post. Note I have chosen angle brackets for type parameters because:
Does anyone have any opinion or objection about the choice of brackets? Also I have P.S. I will probably start a new repository for Zer0 soon and perhaps on Gitlab instead of Github, since Microsoft acquired Github and I have observed Skype becoming more unusable after Microsoft acquired it. |
I wrote in the Subtyping thread:
|
I like camel case for at least for type names, and I think the following does make an argument for employing hyphens instead of underscores at least between uppercase camel humps are separate words: https://yosefk.com/blog/ihatecamelcase.html https://wiki.theory.org/index.php/YourLanguageSucks#Poor_Design
I think I also prefer camel case but with first letter lowercase for function and procedure names, because camel case employs less characters than hyphens or underscores for delimiting words. When transpiling to languages which don’t support hyphens in names then need to convert to underscores and check for name collisions if underscores have also been allowed.
However the use of dashes instead of underscores doesn’t conserve any horizontal space with a monospace font and it resembles a crunched infix minus operator. So maybe I should just stick with underscores, if at least for type names? P.S. The author of the aforelinked blog is also the author of the C++ Frequently Questioned Answers satire. EDIT: another point about verbalization of case:
And being he was born in Russia, might explain why Yossi Kreinin hates camel case so much:
But I think much of this is perhaps pointless because names shouldn’t be so long and juxtaposed acronyms should be separated with an underscore:
Yet the following example convinces that underscores can be easier-to-read than camelCase in specific cases, even though a study found generally they’re roughly equivalent:
Camel case has much less symbol soupy noise: https://whatheco.de/2011/02/10/camelcase-vs-underscores-scientific-showdown/
In a block indenting, no curly brace style:
[…]
[…]
Editor considerations: https://csswizardry.com/2010/12/css-camel-case-seriously-sucks/
EDIT#2: here’s the decision I made thus far for the lexer. Essentially I decided to prioritize hyphens instead of underscores. Yet I still prefer camel case over hyphens where possible, except that identifiers which are not functions nor procedures should be all lowercase and if multiple word, must employ underscores to distinguish them from functions and procedures types. Thus:
|
I will maintain in this OP a summary of current proposed syntax as I understand it to be. Note this is not authoritative, subject to change, and it may be inaccurate. Please make comments to discuss.
: Type
is always optional.Sum, Recursive, Record, and Product parametrized data type with optional Record member names and optional Record (e.g.
Cons
) and Product (e.g.MetaData
) names:data List<T> = MetaData(Nil | Cons{head: T, tail: List<T>}, {meta: Meta<T>})
Typeclass interface and implementation:
References:
Functions:
Type parameters do not require declaration
<A,B>
.Note that iterator types can be specified for the return value to return a lazy list as a generalized way of implementing generators. The optional
(:Type)
is necessited for generator functions. Note the(x: Type y: Type): Type => x + y
form is unavailable.Assignment-as-expression:
† Not yet included in the syntax, as would be a premature optimization. May or may not be added.
¹ #10
The text was updated successfully, but these errors were encountered: