Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infix notation for more functions #4498

Closed
my-little-repository opened this issue Oct 13, 2013 · 35 comments
Closed

infix notation for more functions #4498

my-little-repository opened this issue Oct 13, 2013 · 35 comments
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative

Comments

@my-little-repository
Copy link

This is a feature request. The in function has both a prefix and infix notation. It would be nice to have infix notation for the functions beginswith, endswith and contains, e.g.

julia> "julialang" beginswith 'j'
true

@lindahua
Copy link
Contributor

If we provide infix form for beginswith, then a large number of other functions may want to get this privilege. I think it is enough to provide a very small number of infix operators that cover the most common operations.

Also "julialang" beginswith 'j' doesn't seem to be a huge improvement over beginswith("julialang", 'j') to me.

@my-little-repository
Copy link
Author

I beg to differ. There are not a large number of functions that may be written in infix notation. When it comes to string manipulation, I have not seen any other functions than the ones I am describing above.

And "julialang" beginswith 'j' seems to give the same improvement over beginswith("julialang", 'j')
that x in y gives over in(x,y). I would even dare to say that it is in fact a better improvement because the infix in notation introduces an ambiguity in the language (in is also a synonym for = in for loops). There is no such ambiguity with beginswith.

@johnmyleswhite
Copy link
Member

I agree with Dahua: there's no reason to do this unless we make it possible to use all binary functions as infix operators.

@quinnj
Copy link
Member

quinnj commented Oct 13, 2013

Also see here for the discussion on infix operators.

#2703

This was closed when the in syntax was added, but there was also discussion
of allowing for user specified infix functions along these lines. Perhaps
another issue could be opened for the generalized case.

-Jacob
On Oct 13, 2013 1:00 PM, "John Myles White" notifications@github.com
wrote:

I agree with Dahua: there's no reason to do this unless we make it
possible to use all binary functions as infix operators.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4498#issuecomment-26221653
.

@JeffBezanson
Copy link
Member

There are some problems with that, such as [1 + 2] and [x in y] being 1-element arrays and [x f y] being a 3-element array. Maybe a haskell-like x f y could work.

@johnmyleswhite
Copy link
Member

I would be pretty into the Haskell approach to infix functions.

@diegozea
Copy link
Contributor

I like the Haskell approach.
There is something similar on R: You can define an infix function in R using % around the name

@ssfrr
Copy link
Contributor

ssfrr commented Jan 27, 2014

Haskell automatically treats all-symbol functions as infix, and alphanumeric functions can be treated as infix with the backtics, as @JeffBezanson mentions. I like this in the sense that it's less special case ("these are the operators that we've hard-coded into the parser to be infix"). Of course with the unicode support it's a little more complicated to define clearly what's alphanumeric.

Though the trade off of making it easy to define custom symbolic infix operators is that people will. :)

Making symbolic functions infix and allowing alphanumeric functions to be "infixed" using backtics are orthogonal questions though, so perhaps if there's much more discussion on it we should split it out into a separate issue.

@ssfrr
Copy link
Contributor

ssfrr commented Jan 27, 2014

Reading around a little more on this, the backticks have been discussed here where @StefanKarpinski is not a fan. In #552 @JeffBezanson also says that this isn't happening because of limited ascii space.

Unicode/symbolic operators were discussed in #552 as a possible future feature but not a high priority.

@damiendr
Copy link
Contributor

The Haskell approach would be especially nice in the context of mini-domain specific languages.
Eg. being able to create a graph this way:

a -> b -> c

with -> being the "connect" function.

The downside of restricting infix functions to unicode characters is that there is no easy way to type .

@StefanKarpinski
Copy link
Member

\rightarrow<tab>

@jiahao
Copy link
Member

jiahao commented Oct 31, 2014

I'm tempted to close this in light of #552, #6929.

@damiendr
Copy link
Contributor

damiendr commented Nov 2, 2014

@StefanKarpinski This doesn't work on any of my code editors, and neither do MacOSX's substitution rules (seems like they only work when editing rich text). But even then, that's 5 times the number of keystrokes (6 times on a fr keyboard), plus remembering the macro name. Also confuses both Terminal.app and iTerm.app at the time being. I'm not entirely against the use of unicode in source code, but it feels like it's a bit ahead of its time...

The ability to neatly express embedded DSLs is a really, really powerful feature for a language, especially when coupled with Julia's metaprogramming. And freedom in defining your own infix operators is crucial for that! For instance let's say you want to use <- or :=. These operators can be typed in a completely obvious way with 2 keystrokes in Graphviz and Pascal, respectively. It would seem a bit impractical to require \colonequals<tab> or \leftarrow<tab> or alt-2254 in Julia... especially when ASCII == is accepted.

My feeling is that the current approach to infixes in Julia is a bit too restrictive for embedded DSLs, also because it's totally not obvious which of the many unicode math operators will work. An explicit approach would be much clearer IMHO.

@StefanKarpinski
Copy link
Member

The ability to neatly express embedded DSLs is a really, really powerful feature for a language, especially when coupled with Julia's metaprogramming. And freedom in defining your own infix operators is crucial for that! For instance let's say you want to use <- or :=.

I see where you're coming from but I think many Lispers would strongly disagree with this position.

@eschnett
Copy link
Contributor

eschnett commented Nov 2, 2014

I've thought about this before, and came to the conclusion that the cleanest way is to actually store unicode characters in the source code, as this gives a standardized and unambiguous representation of operators.

The presentation (i.e. how it looks in an editor) can be changed. I was thinking of adding hooks to emacs for loading/saving files in Julia mode e.g. to replace \in by , but then discovered that emacs's Julia mode already handles this (when pressing the tab key), and the visual representation of unicode characters works fine as well (in emacs).

I don't know what I would do in another editor. I would hope that displaying unicode works fine everywhere, but entering unicode still seems to be a problem. Julia's emacs mode's choice of latex notation is just the right thing for most Julia users, but also shows that this is neither standardized nor universally available.

The best I could come up with would be a preprocessor that translates "longhand" notation (e.g. ->) into unicode before compiling. This may need to be coupled to an unambiguous syntax for these operators. It seems that -> is fine in this respect, but <- is not -- the expression 1<-2 already has a meaning.

In an editor, where the unicode character is created instantaneously and on demand, this is fine, but doing this in an automated way only works if e.g. all operators need to be surrounded by white space or parentheses, as in Lisp.

Fortress had some interesting ideas in this respect.

@jiahao
Copy link
Member

jiahao commented Nov 2, 2014

Julia's emacs mode's choice of latex notation is just the right thing for most Julia users, but also shows that this is neither standardized nor universally available.

The code for tab-completion also exists in the base REPLCompletions module, so to the extent that Julia's REPL builds and is useable, we do have some consistency in the user interface.

I would hope that displaying unicode works fine everywhere

Unfortunately we have very little control over the display of Unicode; this is primarily a question of what fonts people use. In fact, some commonly used fonts have some inexcusable mistakes in their glyph tables (e.g. #8429 for a particularly insidious problem with phi and varphi due to old Unicode code point mappings). And quite a few fonts have incorrect glyph bounding box offsets for combining diacritics, so that they end up rendered on the wrong characters.

In general, we have seen that displaying with default fonts on OSX is pretty good, but atrocious on Linux and Windows. You can see for yourself by viewing the tab completion table with your font of choice.

Fortress had some interesting ideas in this respect.

I had been told this too, and yet when I dug into the details, I found Fortress's Unicode support wanting in clarity. In particular, Fortress's spec (v1.0; pdf) conveys no evidence that they have thought about the visual ambiguities inherent in Unicode support.

  • The Fortress spec does not make clear what level of Unicode equivalence the language supports. If we assume no canonicalization (default), the language treats as semantically different visually ambiguous characters like µ (micro) vs. μ (mu), and Å (Ångström) vs. Å (A with ring) vs. Å (A with ring combining character). (Julia currently supports NFC canonicalization natively for identifiers (canonicalize unicode identifiers #5434) and there may be custom canonicalization support in the future (add custom JULIA normalization? JuliaStrings/utf8proc#11) since neither NFC nor NFKC does what we really want.)
  • Fortress's insistence on a standard rendering that looks mathematical (Appendix B) only makes things far worse for producing multiple identifiers which cannot be disambiguated visually.
    • The spec states that M is rendered as an italic M, which can then (depending on the font) be visually indistinguishable (yet semantically different) from 𝑀 (U+1D440).
    • The spec states that the variable called OMEGA13 (all ASCII) is rendered the same as Ω₁₃ (U+3A9 U+2081 U+2083), yet the spec appears to treat these two as semantically distinguishable.

@JeffBezanson
Copy link
Member

Yes I think having multiple renderings of source code is a bad idea. Occasionally you typeset code for publication e.g. in a latex document, and that's fine as it's easily distinguished from other uses of the code. But having a preprocessor or multiple ways to type the "same" identifier strikes me as massive unnecessary confusion.

@eschnett
Copy link
Contributor

eschnett commented Nov 3, 2014

I didn't refer to how Fortress handles unicode -- I wasn't familiar with that, and it's sad that this is so problematic. What I meant was that Fortress defines several styles for presenting code, one of them being essentially ASCII that can be easily displayed and input everywhere, and which is still quite readable once you get past the double brackets.

One could invent something similar for Julia (or rather for unicode in general): A way to enter and/or display unicode characters on devices that either don't have the respective fonts available, or where the backslash-name-tab completion is not available. This would not be part of the Julia language, but would be a convention that editors can follow, similar to backslash-name-tab.

@cdsousa
Copy link
Contributor

cdsousa commented Nov 3, 2014

I would say that the backslash itself could be used for that purpose, after deprecating its (not so common?) current uses. Thus becoming a \times b equivalent to a × b ...

@StefanKarpinski
Copy link
Member

Good luck prying backslash out of the linear algebraists cold dead hands.

@eschnett
Copy link
Contributor

eschnett commented Nov 3, 2014

backslash-name-space won't quite work, because people may write a÷b which is then rendered as a\divb. Maybe backslash-brackets would work -- a\[div]b.

The notation does not need to be entirely unambiguous, since the regular uses of the characters one chooses could be escaped. For example (but I'm not suggesting this), one could also re-use plain square brackets for this, as in a[div]b. Any actual use of a bracket then needs to be translated to [openbracket] or [closebracket].

@ntessore
Copy link

ntessore commented Nov 3, 2014

I'm sorry if this is a naive idea, but could one not resolve \leftarrow<tab> to a :leftarrow symbol (or maybe a special syntax, ::leftarrow or :leftarrow: to not clash with other symbols), and then let the :leftarrow symbol be rendered as a Unicode <- when the terminal supports it. One could then always type out the symbols normally, and nothing exotic would get written to files. This is, as far as I know, how Mathematica handles their special symbols.

Edit: I see now that this is probably what you are discussing anyway. Carry on.

@JeffBezanson
Copy link
Member

I think the best solution is also the most realistic and most straightforward: wait for platforms, applications, and fonts to gradually get better unicode support. Even the humble misc-fixed fonts support almost all the characters you might want. I find browser, phone, and editor support is already quite good, and I don't even use any Apple products.

I don't think we need to pick up the slack for text editors. Doing technical programming in an editor that's not customizable and doesn't support entering special symbols seems like a very strange requirement. Seriously, get a better editor.

@StefanKarpinski
Copy link
Member

Agree. We're designing a language for the next 20 years, not the last 20.

@tkelman
Copy link
Contributor

tkelman commented Nov 3, 2014

Yes I think having multiple renderings of source code is a bad idea. Occasionally you typeset code for publication e.g. in a latex document, and that's fine as it's easily distinguished from other uses of the code. But having a preprocessor or multiple ways to type the "same" identifier strikes me as massive unnecessary confusion.

Thoughts on Mathematica's typeset input style? I personally think that it (or something like it) is the nicest solution to the spaces-as-delimiters problem, it's clear from context when you're typing into a typeset matrix object, as it is in a textbook, where it really isn't so obvious in generally-monospaced unicode source.

Granted this shouldn't really be an issue for the base language, I see this as something that could be done at an IDE / IME level by IJulia or Juno down the road.

@damiendr
Copy link
Contributor

damiendr commented Nov 4, 2014

As for distinguishing a <— 2 from a < -2, there is always the option to use a dash rather than a hyphen-minus, since most keyboard layouts have dashes.

In the end infix notation is syntactic sugar. It only makes sense when it's simpler and nicer to use than function call notation; else it's better to do without it. And if the expr space symbol space expr syntactic space is already too crowded to support Scala-style universal infixes, I guess the unicode solution is the only compromise left indeed... which I think is unfortunate but then, that's just a matter of taste.

Maybe what can be done is to write documentation that encourages editors to bundle both a syntax mode AND a set of useful macros for Julia.

I would also argue for making the list of accepted unicode characters as inclusive as possible, as the arity of an operator symbol might depend on whether it's being used in linear algebra, control theory or some obscure branch of topology.

@jiahao
Copy link
Member

jiahao commented Nov 4, 2014

Thoughts on Mathematica's typeset input style?

We already have tab-completion supporting LaTeX-style input, which is the de facto open source standard for non-ASCII input in an ASCII environment. Is there really a need for another completely different input method that users have to learn?

@StefanKarpinski
Copy link
Member

LaTeX style input is supported in the REPL, vim, emacs, TextMate, Sublime, IJulia and probably several others. Not sure how much better we're supposed to be doing. If there are editors that don't yet have support for this, additions are certainly welcomed.

@ntessore
Copy link

ntessore commented Nov 4, 2014

@StefanKarpinski Pardon if I go on a tangential, but how do you get LaTeX-style input in TextMate 2?

@tkelman
Copy link
Contributor

tkelman commented Nov 4, 2014

(apologies for somewhat hijacking the issue, again)

We already have tab-completion supporting LaTeX-style input, which is the de facto open source standard for non-ASCII input in an ASCII environment.

Sure, for single characters at a time. But unless you're actually full-on rendering math-mode LaTeX, you're still limited to mostly-monospaced characters, uniform line heights, etc in an ASCII environment.

Is there really a need for another completely different input method that users have to learn?

A need? No, definitely not. But I think it could be interesting to look into alternate methods of inputting and/or rendering math-formatted code down the line, to go beyond the limitations of an ASCII environment. Literate Julia, or something like it. Not that it worked out all that well for Fortress, or really any example aside from Mathematica. There may be architectural limits to IPython or mainstream editors that make this completely impractical, I don't know.

@StefanKarpinski
Copy link
Member

Not actually sure – in Sublime Text, you can use the UnicodeMath package. I assume there's a similar package in TextMate but I might be mistaken.

@jakebolewski jakebolewski added the parser Language parsing and surface syntax label Jun 2, 2015
@Glen-O
Copy link

Glen-O commented Jul 20, 2015

Note: I don't know any details about changes possibly made in 0.4, I'm using 0.3.10.

How about making use of the existing meaning of |>, and generalising it slightly?

5|>mod<|6

Right now, we have "hello" |>printprint("hello"). Could we not simply extend this to have print<| "hello"print("hello") as well, and "hello" |>print<| " world"print("hello"," world")? Not only does it maintain an existing notation, but it extends it in a completely sensible way. The only caveat is that <| is currently an "unused" infix operator, so it's possible that some package is already using it for something.

Another currently-unused infix operator that would also work nicely is --. So the notation would be 5 --mod-- 6. And it looks like ** also is available (it currently throws an error saying to use ^ instead... kind of a waste to just block its use entirely), so 5 **mod** 6 could work.

Other notations that aren't in use that could work nicely include <>, ><, and .. (another currently-unused infix operator).

(incidentally, I'm the Glen O that suggested exploiting the colon notation for custom infix)

@jakebolewski
Copy link
Member

Closing in light of #6929. Any addition I feel will have to provide a compelling implementation.

@matt2000
Copy link

I found this issue when searching for information on how to define my own infix operator/functions, so I'll share the solution I ultimately came up with, for the benefit of future searchers, and as an example of why Julia may not need these feature at the language level.

julia > macro <(arg, fn, args...)
       :($fn($arg, $args...))
       end
julia> @< "julialang" beginswith 'j'
true

On the other hand, if there's a better solution for this now, I'd appreciate pointers to any documentation.

@stevengj
Copy link
Member

stevengj commented Oct 6, 2017

See #16985 on defining custom infix operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests