Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Unicode combining characters in identifiers #194

Closed
gopherbot opened this issue Nov 15, 2009 · 9 comments
Closed

Allow Unicode combining characters in identifiers #194

gopherbot opened this issue Nov 15, 2009 · 9 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language v2 An incompatible library change

Comments

@gopherbot
Copy link
Contributor

by jjc.jclark.com:

The spec defines identifier like this:

  identifier = letter { letter | unicode_digit }

where letter is _ or class Lu, Ll, Lt, Lm, or Lo.

This doesn't work for languages with combining characters (e.g. South and
South-East Asian languages).  For example, in Thai some vowels have
combining class Lo and some vowels have combining class Mn.

There's lots of details in

http://www.unicode.org/reports/tr31/

I would actually recommend using

http://www.unicode.org/reports/tr31/#Alternative_Identifier_Syntax

This keeps things simple and ensures that the definition of an identifier
is Unicode version independent.
@rsc
Copy link
Contributor

rsc commented Nov 15, 2009

Comment 1:

Labels changed: added language-change.

Owner changed to r...@golang.org.

Status changed to Thinking.

@robpike
Copy link
Contributor

robpike commented Nov 15, 2009

Comment 2:

Thanks for this pointer.  Definitely worth considering.

@gopherbot
Copy link
Contributor Author

Comment 3 by jason.catena:

golang-nuts thread on currently invalid identifier and combining characters:
http://groups.google.com/group/golang-
nuts/browse_thread/thread/13fd9002e3b029f/85e1167816017433

@gopherbot
Copy link
Contributor Author

Comment 4 by jason.catena:

golang-nuts thread on currently invalid identifier and combining characters:
http://groups.google.com/group/golang-nuts/browse_thread/thread/13fd9002e3b029f/85e1167816017433

@gopherbot
Copy link
Contributor Author

Comment 5 by natevw:

So far this ticket has focused on character classes, but I'd like to call specific
attention to the equivalence 
issues this post in the thread in #4:
http://groups.google.com/group/golang-nuts/msg/8c84eb183dcb672e?
I suggest that since the language specification has already broken orthogonality with
the character encoding 
(as it rather must, anyway) and since the encoding of choice is Unicode, it would not be
inappropriate to 
finish what has been started as far as Unicode-awareness.
Define whitespace as General_Category Zs, Zl and Zp (or alternatively, Bidi_Class WS, B
and S to allow non-
breaking spaces within tokens) and compare tokens (or at least identifiers) in NFKD
form. I can still see 
narrowing the identifier definition even further, to exclude non-idiomatic naming
practices.

@gopherbot
Copy link
Contributor Author

Comment 6 by robpike:

A question has been added to the Language Design FAQ about this topic.  The next release
should make it visible 
directly at golang.org.

@rsc
Copy link
Contributor

rsc commented Dec 2, 2009

Comment 7:

Labels changed: added languagechange, removed language-change.

@robpike
Copy link
Contributor

robpike commented Dec 3, 2009

Comment 8:

Doing this "right" involves full canonicalization, which is not a well-defined concept
yet.  As the FAQ entry 
states, it's premature to address this issue in Go at the moment.  The current situation
is far from perfect but is 
an improvement from ASCII and permits expansion to a larger space of identifiers once
things settle in the 
Unicode standard.  Until then, the current design remains clear and straightforward, if
limiting.

Status changed to WontFix.

@gopherbot gopherbot added wontfix LanguageChange Suggested changes to the Go language labels Dec 3, 2009
@gorakhargosh
Copy link

Sigh.

And here, I was thinking, I could write code in Hindi.

package main

import "fmt"

func नमस्ते(){
    fmt.Println("Hello, world");
}

func main() {
    fmt.Println("Hello, 世界")
    fmt.Printf("Hello, %q\n", "something", "foobar")
    नमस्ते()
}

Doesn't work at all.

~/workspace/play/lib/golang/src/basics
*git(b:master)  bash$ go build hindi-hello-world.go
# command-line-arguments
./hindi-hello-world.go:5: invalid identifier character U+094d
./hindi-hello-world.go:5: invalid identifier character U+0947
./hindi-hello-world.go:12: invalid identifier character U+094d
./hindi-hello-world.go:12: invalid identifier character U+0947

@bradfitz bradfitz added the v2 An incompatible library change label Dec 24, 2014
@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc unassigned robpike Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

5 participants