-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove case insensitivity by changing the approach towards style insensitivity #547
Comments
As for me better to add (again) style skins (but more generalized): # possible syntax
import strutils
proc processIdent(s: string): string =
var afterWhitespace = false
for i in 0..<s.len - 1:
if s[i] == '_' and s[i + 1] != '_': result.add s[i + 1].toUpperAscii
elif not afterWhitespace: result.add s[i]
afterWhitespace = s[i] == '_' and s[i + 1] != '_'
if not afterWhitespace: result.add s[^1]
proc sameIdent(a, b: string): bool =
processIdent(a) == processIdent(b)
import std/syntaxskins # it contains also default skins
const mySkin = initSyntaxSkin(sameIdent = sameIdent)
{.syntaxSkin: mySkin.} It also solves problem with C wrappers (you can just disable case insensitivity for wrapper module and use case sensitivity wrapper from other module in case sensitivity manner) about interaction between modules: # gl.nim
import std/syntaxskins
{.syntaxSkin: caseSensive.} # or wrapperDefault, etc...
const GL_FLOAT = 0x1406
type GLfloat* = float32
# code.nim
import gl
# we have default syntax skin
let x = GL_FLOAT # external idents used with it's sameIdent logic
proc test(x: GLfloat) = discard
# but non external identifiers have normal logic
import snake_case_module
var x = weirdName # in snake_case_module it is weird_name |
Style skins seems like an interesting idea, but a bit irrelevant here as this RFC tries to address the issue of having multiple identifiers with different compound words that are treated as the same identifier, because the algorithm that converts them to an IR can't handle such cases. If your snake_case_module contains both weird_name and weirdname, Nim will still treat it as the same. Sure, you could just add {.syntaxSkin: caseSensive.} into snake_case_module, but I think it would be better to make Nim behave properly by default. |
Isn't the suggestion to remove case insensitivity? That is definitely breaking to code which uses different casings. |
Perhaps you still need to add a rule: It solves @metagn cases |
It is my understanding that it about changing the register insensitivity rules, not removing it. |
Sorry, I was wrong about |
newHTMLParser and newHtmlParser are the same |
With this proposal, they no longer become the same. |
this rule lets it be the same |
BTW, now I think that sameIdent shouldn't be in syntaxSkin, it should be another pragma like |
I think it would be better to convert every uppercase letter in a continuous sequence of uppercase letters to lowercase, except the first and the last ones, since a single identifier can have many words and (for me, at least) its not clear if compiler should alternate between upper and lower cases or make everything between first and last uppercase letter uppercase and what effects that may cause. With this logic newHTMLParser becomes newHtmlParser, GLFloat -> GlFloat, Gl_float, etc. What do you think? |
Maybe it's better, but I don't see much difference. getISLandFrame -> getIsLandFrame
get_is_land_frame -> getIsLandFrame # match but shouldn't
# must be valid only:
get_isl_and_frame -> getIslAndFrame # no match but should NOTE: for uppercase variant result is same # l - lowercase letter
# u - uppercase letter
# (l or u)* - zero and more matches
# (l or u)+ - one and more matches
# string with pattern:
l* u+ l+ u+ l* # in any case it will be the same Idealy, and must be camel case (And), but we can't detect it For getISLAndFrame it working as expected |
I thought about making Nim case sensitive but allowing for a |
I'm not sure I understand your examples correctly. get_is_land_frame and getIsLandFrame match and should match (both in current state of Nim and with proposed changes from this RFC). I don't see why get_isl_and_frame and getIslAndFrame woudn't match either (after both RFC changes and lowercasing letters between 2 capitals).
Get getISLAndFrame is problematic if you want to convert letters between 2 capitals to uppercase, because both it and getIslandFrame will be converted to getISLANDFrame. |
all ident matching and case insensitivity works via getIdent proc, if we add rules object for ident, we can
getISLandFrame this is a crazy example: getISLAndFrame will not be converted to getISLANDFrame it will be converted to getIslAndFrame |
This discussion flared up again because some people want to do this. type Foo = object
const FOO = 123 Before this, tons of people complained about not being able to use underscores on their own, like For any kind of distinction that style insensitivity in Nim removes, there are people that use that distinction in other languages, to the point that it supposedly stops them from using Nim. This has not changed since 8 years ago when I found out about Nim, and basically nothing has been done about it (I think the But it's still a feature that was promised to be part of Nim, hence 100+ people voted to keep it in #456. Removing it wouldn't be a development of Nim, it would create a new language called "Nim with complete identifier sensitivity". Many people would prefer or feel better about the original Nim over this new language. On the other end, it'll only be one thing potential new users/companies won't care about anymore while they find 20 other reasons not to use Nim. It's not a great compromise. I understand the desire here to find middle ground, but changing style insensitivity means people who are used to it will have to learn the more complex new rules, and only some of the people who are against the current rules will be accommodated while the rest will be equally placated and driven away due to the extra complexity. Instead, we can look at what exactly it is that the opponents of style insensitivity want that isn't possible in current Nim, and see if any of it is incompatible with what the proponents want that current Nim allows. If I had to say, it would be like: Opponents:
Proponents:
Notice the lack of overlap: the opponents don't care if we access the symbols with a different style than the definition as they're used to it just erroring, the proponents (probably) don't care if we define different symbols that would be considered the same under style insensitivity as, again, this would error in current Nim. Consider this design (ignore the complexity for the Nim compiler for now): Allow defining style sensitive different symbols. let foo = 1
let fO_o = 2 Allow referring to them with their exact definition style. echo foo # 1
echo fO_o # 2 If we refer to them with a different style, error only if more than 1 symbol matches the current style, otherwise allow the reference. echo fOo # error: ambiguous between foo and fO_o
let bar = 3
echo bA_r # 3 For routines defined with different styles (probably can be generalized to any overloaded symbol), if a style is used that exactly matches some of the definitions, only consider those as overloads. If a style is used that doesn't match any definition: if every definition has the same style, consider every overload, otherwise error due to ambiguous style. proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3
echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: ambiguous style between foo and fO_o
proc bar(x: int): int = x + 1
proc bar[T](x: T): T = x
echo bar(1) # 2
echo bA_r(1) # 2
echo bAr("abc") # 2 This part compromises on the style insensitivity camp because it doesn't let us swap the usage with any definition and have it still work in every case. But considering proc foo(x: int): int = x * 2
proc fO_o(x: int): int = x * 3
echo foo(1) # 2
echo fO_o(1) # 3
echo fOo(1) # error: both foo and fO_o match
proc bar(x: int): int = x + 1
proc bA_r[T](x: T): T = x # notice weaker match but different style
echo bar(1) # 2
echo bA_r(1) # 1
echo bAr(1) # 2 If this design is sound, then we have something that can make everyone happy, and we can iterate on it with regards to addressing complexity in the Nim compiler; or we can formulate other designs in the same way. But again, we can't compromise too much since Nim already promises style insensitivity and has for a long time. |
No benefits are known for SCREAMING constants, so this particular case is completely irrelevant for me personally. |
I personally don't like the fact that foo and fO_o are both compiled. But I think that idea of this RFC is quite logical because it’s strange to make gameStop == gamesTop. As for me we need case insensitivity but restricted where they have different meanings. The purpose of this RFC is to bring some clarity to case insensitivity. Yes maybe it need also aditional rules for equality foo and fO_o and Foo, FOO or make other rules in general. BTW I've never had a case where when using pure Nim, insensitivity gets in the way. However when wrapping C libraries it happens quite often, e.g. OpenGL, v4l2, libnx..... And I think you could add a pragma to specify case sensitivity: on | off, the same rules would apply for a module and identifiers from that module imported from other modules as case sensitivity in a module with a pragma. I find it quite useful, plus after these C-api wrappers there is likely to be a high-level api for nim. |
I also don't like the idea of having foo and fO_o in the same codebase, but mostly because in the context of Nim these things would be the same identifier, probably referring to the same thing in the same context, which is a huge code smell (that current Nim doesn't forbid anyway). And I do think that having style insensitivity for identifiers be allowed only if user's chosen style doesn't create any ambiguity when resolving identifier names is an interesting idea, but it's a bit out of scope of this RFC and is probably too complex for both camps to be happy with. The purpose of this RFC is to highlight a problem with current implementation of style insensitivity. People, when using snake_case or camelCase, don't place underscores and uppercase letters randomly, but tend to use them as delimiters for words inside of a single identifier name. This information is simply lost when Nim parses these names into IR, and in some cases this can create ambiguity amongst 2 different identifiers. And notice that people are mostly discussing snake_case and camelCase when talking about style insensitivity, and not styles like SCREAMINGCASE or nocase, so I personally don't find any value in a system that allows me to do C_R_A_Z_Y, sIlLy stufflikethis, but doesn't let me have new_freedom and new_free_dom in the same context. I consider that for styles that Nim realisitcally supports, case insensitivity is an improper tool to achieve insensitivity amongst them. If people find genuine issues with proposed changes that would break their workflow, I'd be happy to hear about those and have a discussion. But from what I can tell, these changes shouldn't affect anyone's code (that doesn't smell already) and shouldn't prohibit people from using their favorite style. |
No matter what algorithm we come up with some code will still break. In this sense, people do place underscores and uppercase letters randomly. There is also the problem that the simpler the algorithm is, the more code will break. If you don't want There is a case which compiles in current Nim: # a.nim
proc foo(x: int): int = x * 2 # b.nim
proc fO_o(x: int): int = x * 3 import a, b Trying to use either For this case, we can add another warning, for whenever we use a symbol when a symbol with another style was available. These warnings need to be opt in though because they will complain that people who prefer style sensitivity are using style sensitivity. |
From what I understand, I don't see how this is not possible to do in current compiler, since it requires a change in a compiler part that is already there and is responsible for current behavior in the first place (unless compiler, for some reason, is built around the fact that all identifiers internally are lowercase (except maybe for the first letter) and is now tightly coupled to this implementation detail).
If someone wants to use my Once again, if I'm missing something, and there is a genuine case where the proposed changes would break an otherwise reasonable code that doesn't adhere to either snake or camel case (or some mixture of both), then I'd like to see such an example (that isn't foobarbaz).
This will also compile with proposed changes, but will not be ambiguous, because you would be able to call |
The idea that the convenience style insensitivity is critical for deciding on using or ditching the language is hilarious. 🤷 |
I'm sure that's the rule that has been proposed here, but let me spell it out to see if there is disagreement: "Two identifiers are equal if they have separators in the same positions. A separator is either the underscore or a invisible between a case switch (lowercase followed by uppercase or vice versa). The first character is always case sensitive and other characters are not. An underscore cannot be followed by an underscore." This implies:
This is pretty intuitive, keeps style intensivity's benefits and tooling can easily provide a transition path. |
@Araq What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase). IMHO HTMLParser is more common style then HTMLparser so that
|
I am suggesting a new option entirely that would go with my proposal, it would just go under the styleCheck umbrella.
Yes, internally all identifiers are stored in a style insensitive way. You are suggesting to store them under a new scheme (no?), I am saying changing the scheme is going to cause more pain.
I don't want to indulge misunderstandings further here. The compiler, on a technical level, should be able to respect any different strings. The same way the compiler would allow
I meant that you cannot define both
I don't mean to be rude, but this is not an argument for your suggestion, it's a defense under the hypothetical argument "we should allow f_o_o_BaR". The point isn't that
I mean, this is the user's fault, there's nothing honest about changing
Yeah but are we really in a position to bargain against people who think this? We are only so influential.
I did not interpret the proposal like this at all, it's not what "convert snake case to camel case" usually means. The comment:
Implies that It's interesting though, and I'm not saying these conversion schemes don't make sense (if this was not clear) but I can't say that changing to a new scheme is a path we should be pursuing. To repeat, changing the scheme would:
If we are at the point that these are acceptable compromises then we can go nuts but I still have doubts. Final note: the majority use of style insensitivity is not the simultaneous use of camel case and snake case, most people adhere to NEP1 which says to use camel case. The majority use is cases like |
I agree but that's a (good!) argument against any proposal like this altogether. For now I'm interested in exploring the proposal, we can always reject it later. Of course, you can also say "waste of time" already. |
I like where this is going. Does this proposal handle all uppercase identifiers like this? |
@kuchta > What about "A separator is either the underscore or a invisible between a case switch" (when lowercase followed by uppercase or one left if uppercase followed by lowercase). Is that the same as "separator between uppercase letters AB if B is followed by a lowercase letter?" |
The ruleset is almost correct. Its just that a separator is either an underscore or a case switch from lowercase to uppercase (and not vice-versa). Also, like @metagn pointed out, a continuous sequence of uppercase letters gets converted into a separate word, except for the last character in such sequence, so people can use |
So AF_INET would be AfIneT? |
Perhaps there should be an exception would be made for the last character in sequence if that last character happens to be last character in identifier. So |
Maybe we could let the user decide what names the variables have, instead of having each variable name have x^n number of name permutations? It would make it easier for everyone, no inconsistencies. More power to the user. |
Sorry, didn't notice your comment earlier. Can you elaborate more on this? People here have already suggested using "style skins", and we already discussed their issues. |
I should probably read up on these style skins first, maybe they solve the issue already. |
Sorry for the late comment; I found this RFC when I searched for why the new Roadmap (Issue #556) doesn't mention fixing the style sensitivity issue for Version 3. Now I see why. Anyway, here are my thoughts on the matter. To me the aim of this discussion should be: how to facilitate the writing of code that is as easy as possible for people other than the original author to read and understand. It should not be about accommodating every individual's particular coding taste. For this reason I am against "style skins". That idea goes in the opposite direction from code readability - in order to understand someone else's code, one must first understand the style that is specified by the skin being used. Adding hoops to jump through when reading code is not my idea of a step forward. A major component of clear, understandable code is the use of descriptive identifiers, each of which may comprise several words. The situation we are faced with is that there are two main conventions/religions for separating words within an identifier:
From what I have read to date, it seems that the only thing that the proponents of the above two conventions have in common is unwillingness to use the other. Given that, the goal of this discussion should be to find a way to reconcile the above two conventions in a way that promotes the writing of easily readable code. So the challenge is to define a small, unambiguous set of simple rules for separating words within an identifier. Those rules must allow use of either case change or underscore as a word separator, such that identifiers that are comprised of the same sequence of words would be deemed to be the same - whether underscores are used, or case change, or some combination. That is, the compiler's internal representations of them are identical. There is one further necessary requirement. In general, a FFI may use use a totally different, arbitrary convention. In order to facilitate interactions with a FFI, it must be possible to write identifiers whose internal representation is exactly as typed. To that end, building on the discussion so far in this RFC, I propose the following set of rules. They are in decreasing order of precedence, and are in addition to the current constraints on identifiers as defined in the Manual (case sensitivity of the first character, restrictions on the use of underscore).
Finally, overriding the above rules:
Applying the above rules to the examples mentioned in previous comments, plus a few more, gives:
|
@lou15b For Nim 3 I am thinking of "just make it case sensitive but add {.snake_case.} and {.camelCase.} pragmas" which can be used for an entire module affecting all the imported symbols to adhere to the requested style. |
I think |
@ASVIEST I haven't thought about it much but in the new implementation there would be no hashing beyond what is used for identical strings. Instead on import there would be "try to import FooBar. Failed? Ok, so try FOO_BAR then". |
I think this would considerably simplify a number of things, not the least of which is to eliminate all the bike-shedding about naming conventions from discussions about the Nim language. I'm all for it.
I'm not sure what this means. If it means "convert all imported symbols to the specified style" then there must be rules defined for word separation, in order to properly convert those symbols. |
Sure but it doesn't look hard, lower-followed-by-upper or underscore is a word separation. |
True. And one can always fall back on the default - use the symbols as originally typed. |
There is an issue though, if I import fooBar as foo_bar and then the module later also adds foo_bar my code changes its meaning... Probably not too problematic though as a module that exports both fooBar and later foo_bar is not realistic. Or maybe within a module we can enforce that the used style for declared identifiers uses camelCase or underscores consistently. |
Shouldn't it be in stylecheck? And I think that you can write
Then names should not be in collision |
@ASVIEST I didn't bind the |
I came up with this quite a long time ago when I was playing with the compiler. The idea was to make a pragma # a.nim
{.sensitivity: sense.}
const GL_FLOAT* = 0x1406
type GLfloat* = float32
const megabyte* = 1024*8 import a # behaves by default as import a {. sensitivity: sense.} I.e. those ids that have become the same are treated sensitively (GL_FLOAT, GLfloat), the rest are insensitive.
assert GL_FLOAT == 0x1406 # valid
assert GLfloat is float32 # valid
assert megabyte == 1024*8 #valid Besides partial, there is sense and insense in this design (although I think insense can be made partial) # b.nim
import a {.sensitivity: sense.}
var megaByte* = "1024 * 8" #no error # c.nim
import a {.sensitivity: sense.} # or partial
import b {.sensitivity: sense.} # or partial
assert megaByte == "1024 * 8"
assert megabyte == 1024*8 Import with sensitivity defines the sensitivity in which the imported identifiers will be represented in the scope of current module |
Abstract
Instead of removing underscores and forcing every letter to lowercase, replace every occurrence of underscore followed by any-case letter with uppercase letter in snake_case identifiers, and keep camelCase identifiers as-is.
Motivation
Current style insensitivity algorithm can produce a few problems internally when trying to represent compound words that can be split more than one way in identifier names, as the internal all-lowercase identifier names are indistinguishable in certain scenarios (gameStop vs gamesTop, index vs inDEX, superb_owl vs super_bowl, island vs isLand, etc).
Description
I propose for identifiers to store their names with camelCase (or snake_case, or any other way to store separate words inside of a identifier), as a case of a letter or an underscore in the name clearly delimits the words used inside that name. This should resolve the issue presented above by allowing separate words to be represented properly.
Conversion between styles would preserve the words by forcing any letter preceded by an underscore to be uppercase when converting from snake_case to camelCase. When converting from camelCase to snake_case, just reverse the process.
Code Examples
No response
Backwards Compatibility
Macros will break if they generate an all-lowercase identifiers, but other than that I'm not aware of any other backwards-incompatible changes this would introduce to Nim.
The text was updated successfully, but these errors were encountered: