Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Scopes for module/namespace access #1842

Open
deathaxe opened this issue Jan 16, 2019 · 34 comments
Open

[RFC] Scopes for module/namespace access #1842

deathaxe opened this issue Jan 16, 2019 · 34 comments
Labels

Comments

@deathaxe
Copy link
Collaborator

Intro

In general a trend/principle in syntax definitions can be found which ends up with scoping the declaration/definition of constructs with entity.name.<construct>. When calling/using such constructs, something like variable.<construct> or variable.other.<construct> is used.

The most popular example is entity.name.function vs. variable.function.

The goal is clear - distinguish definition and usage of the same object.

Question

How are namespaces or modules to handle in that manner?

A couple of syntaxes including C, C++, Python, PHP, Java, JavaScript, Erlang, ... support such concepts. Most of them use entity.name.namespace to scope the identifier in the definition/declaration statement as the ST3 documentation at https://www.sublimetext.com/docs/3/scope_naming.html says:

Namespaces, packages and modules use the following scope. There are usually not multiple types of such constructs in a language, so this scope should suffice.

entity.name.namespace

But I can't find a common solution how to scope a namespace/module upon usage.

    module_name:function_name()
  • C# distinguishes between support.namespace.cs and variable.other.namespace.cs
  • CSS uses entity.other.namespace-prefix.css or entity.name.namespace.wildcard.css
  • Erlang uses entity.name.type.class.module.erlang
  • Python just scopes them as meta.generic-name upon usage or meta.import-name in import statements.
  • ...

Can we find a common scope for that usage?

  1. From my point of view anything starting with entity. is a no-go when we talk about usage.

  2. The most pleasant approach with respect of existing scoping guidelines and implementations seems to be variable.other.namespace. So we'd end up in

    entity.name.namespace vs. variable.other.namespace

  3. To keep up with the concept of function declaration and usage, I also could imagine to use variable.namespace. So we'd end up in

    entity.name.namespace vs. variable.namespace

Thoughts?

@keith-hall
Copy link
Collaborator

I completely agree with point 1.
for namespace usage, I think its worth having the distinction between namespaces provided in the standard library and those that are user defined (i.e. like the C# syntax definition does atm), as I like to color them differently.

@FichteFoll
Copy link
Collaborator

For method or static function calls on objects/classes, do we treat those as namespaces as well? There is no way to differentiate those from modules in most languages.

@deathaxe
Copy link
Collaborator Author

I did not directly think of static class member access, but this is a very good question though.

Basically a class is an extended concept of a namespace, which allows to create copies of the internal variables by instantiating. Because of that access semantics are equal in most languages and can't indeed be differentiated.

Furthermore everything most languages call package, module or class is most likely nothing else than a namespace.

I personally like the idea of using namespace to generally identify all kinds of such constructs in an abstract way, which can be applied to several syntaxes.

In general it might be a question of the languages concept of whether we use namespace or class.

Examples:

  1. C uses namespaces only, C++ knows both.
  2. A Perl source file can form a package, which can be instantiated like a C class.
  3. A Python source file (module) is nothing else than a namespace, while it can contain classes.
  4. Erlang uses modules which are basically namespaces only.

I just tend to suggest to limit scope names to namespace and class instead of using unique ones for each language like package or module, etc.

The question keeps open what to use, if no decision can be made.

Scope names would look like:

- meta.namespace / entity.name.namespace / variable.other.namespace
- meta.class / entity.name.class / variable.other.class

@FichteFoll
Copy link
Collaborator

Another question: when importing a module that is sourced from a file with the same name (and not explicitly specified), would the import statement be a usage or a definition of the namespace?

@deathaxe
Copy link
Collaborator Author

I think this depends on the perspective.

Importing a module means to declare it in order to be able to use it and its members. It can be compared to declare/define a variable or a function.

From the global point of view, an import is a usage of an existing module.

I tend to prefer the local perspective.

See python:

import os    # os <- entity.name.namespace

# os <- variable.other.namespace
os.getcwd()

The usage of os in os.getcwd() does not work without declaration of import os.

The interesting question here is - how about

from os import getcwd  # <- os = usage or definition?

getcwd()

or

from os import path   # <- os = usage, path = definition?

path.join()

@wbond
Copy link
Member

wbond commented Feb 15, 2019

The topic of scoping qualifiers is not well addressed currently, so I'd like to come up with something to move forward. Currently run into this issue in lots of places (like #737, where I was just working).

I think part of the issue is that sometimes we know looking at the code that a qualifier is a class name, or a namespace, etc. However, it isn't always clear when highlighting which we are currently dealing with, and for some it is impossible.

Because of this, and the fact that we need have many different syntaxes with different nuances, I think we need to come up with a somewhat generic scope that can be applied to the identifier qualifiers in a "path". This could be for a function call, a type name, an inherited class name, an XML namespace.

Previously I think we had thrown around the idea of a new top-level scope, such as identifier (does that ring a bell @FichteFoll?). Either way, I'm not sold on that idea, but I was poking around at existing syntaxes and was thinking about using the following:

entity.other.qualifier

Currently entity.other is used for inherited-class and attribute-name primarily. This would somewhat play off of the entity.other.inherited-class scope, since this is a place that qualifiers are sometimes seen.

I would imagine that most users wouldn't want these too heavily colored. Additionally, a series of entity.other.qualifier and variable.* should be scoped with meta.path.

Thoughts?

@wbond
Copy link
Member

wbond commented Feb 15, 2019

I think the most obvious alternative to entity.other.qualifier, or (entity.qualifier as @FichteFoll suggested on Discord) is:

variable.qualifier

This keeps most such identifiers in syntaxes under variable, for better or worse, and leaves entity.other as sort of a historical relic for inherited class names and HTML tags and attributes.

@Thom1729
Copy link
Collaborator

It sounds like we're talking about the following four types of constructs:

  1. Definitions of namespaces (e.g. namespace foo {} in C++).

Here, the whole construct should get a meta scope, namespace should be storage, and foo should get entity.name.

  1. Explicit imports (e.g. import os in Python).

This deserves more examples, because import syntax varies greatly between languages. I think that we can come up with an answer generic enough for general use.

Taking the broad view, import os is a declaration that declares the name os. It would be reasonable to scope os with entity.name. The statement from os import getcwd declares the name getcwd, which we could also scope with entity.name, but it does not declare os, so we should not scope it with entity.name. In both cases, os serves a special syntactic purpose: it's the name of a module, and it doesn't behave like an ordinary identifier. I'm not sure what the right scope would be for that -- let's call it FOO for now. Then, the scopes should be as follows:

import os
       ^^ FOO entity.name.something

from os import getcwd
     ^^ FOO
               ^^^^^^ entity.name.something
  1. References to a namespace (e.g. os.getcwd() in Python).

My biggest concern is with (3). In many languages, namespaces or modules are first-class values and a name representing a namespace is used in the same way as any other name. For instance, in Python, os.getcwd() is an ordinary expression where os is an ordinary variable no different from the foo in foo.bar().

In such languages, the only way to try to highlight references to a namespace is to guess based on the name of the variable. This is a bad idea, because the guess would often be wrong, and users get annoyed when some identifiers are colored differently seemingly at random.

In some languages, namespace references are used with special syntax. For instance, in C++, a namespace reference may be followed by the scope resolution operator ::. We could scope foo specially in foo ::, just like in foo() we may scope foo as variable.function instead of variable.other. The key here is that the difference is syntactic. In Python, we can't say that foo in foo() is actually a function, or callable (it may be a runtime error), but there is nevertheless a purely syntactic justification for highlighting it as a function.

We should keep in mind that we can't scope these things reliably. In foo\n(), we cannot recognize foo as a function.

  1. General dotted paths.

Most of the time in most languages, a dotted path like foo.bar.baz represents a value foo and a sequence of property accesses. The semantics will vary. We don't have a standard scope for foo and bar, and we should (maybe something with "property" or "attribute" in it). On the other hand, in most common languages foo isn't really distinguishable from a baseline variable. In some languages, we can say that it's an "object" of some kind, but in many languages everything's an object anyway.

We can try to scope the entire path, but to be honest I've never liked doing this. It's fundamentally unreliable and annoying to implement, and even in real-world use a typical file would likely see a lot of misses. Plus, I don't really see the motivation: why have a special meta scope for foo.bar but not foo + bar? We're not going to highlight it differently, and we can't guess reliably enough for automated tools to use it, so why do it at all?


Other thoughts:

  • Regexps with a lot of captures or lookaheads are bad for performance. The fastest way for a sublime-syntax to work is one token at a time. Highlighting a token based on what comes after it is often impossible and always annoying.
  • We could do meta.path reliably with nondeterministic parsing. However, leaning on nondeterminism for such a ubiquitous construct might be bad for performance, because the performance penalty for nondeterminism is proportional to the cube of the depth of nondeterminism.
  • I don't think that a class is really a kind of namespace, at least not to the degree that they should use the same scopes.

@deathaxe
Copy link
Collaborator Author

I don't really see the motivation: why have a special meta scope for foo.bar but not foo + bar?

The initial post doesn't say something about metas.

That said, the question just is: Which scope to use for the declaration of a namespace vs. the usage of it?

The scoping guideline says: meta.namespace entity.name.namespace to be used for all kinds of namespace, module, package definitions, but there is no general rule about how to scope the usage of a namespace (assuming it can be identified). The initial post lists some examples being used in different syntaxes. Comparing to function definition and calling, scoping namespace access identifiers with entity. feels odd.

With regards to variable.function in function-calls, the variable.namespace sounds reasonable. If a more general approach is desired as we may not be able to distinguish namespaces from classes or anything like that, variable.qualifier sounds good.

We should keep in mind that we can't scope these things reliably.

This issue was not raised to propose adding much guess work to syntaxes. It just tries to find answers for situations / languages, which allow to identifiy namespaces as different scopes for same things were found in already existing syntax definitions.

If something can't be identified reliably, an as general as possible scope should be applied.

@FichteFoll
Copy link
Collaborator

This topic is huge. I apologize for a wall of text, but this touches a very fundamental concept of how definitions and references are applied (so it very much relates to #1861).

It seems the question is three-way.

  1. How to scope namespace definitions (namespace foo in C++)?
  2. How to scope an import/global usage of a namespace (import foo, from foo import bar, from foo import bar as baz)?
  3. How to scope a local usage of a namespace, which quickly becomes equivalent to asking how to scope a qualifier?

Usually it cannot be decided for 3. whether a namespace, a class, some random object or whatever is being accessed, so I'll hold off on that for now.

For 1., I believe entity.name.namespace is the correct scope for the final identifier, assuming a construct like namespace abc::def is allowed. meta.namespace should span the entire definition of the namespace including its body.

For 2., I second @deathaxe's opinion in that an import that assigns something to an identifier becomes a declaration and should then be scoped as entity.name.import (suggestion). I believe that to be a general enough scope to be usable for all various types you can import.

However, not all imports also assign to a specific identifier. Some imports work on a literal basis and behave as if the referenced file was imported verbatim into the current file (barring preprocessor checks to prevent re-importing), e.g. #include <string>. The other form are wildcard imports like from asyncio import *. Those imports are usages of or references to namespaces (or files, which I guess can behave like namespaces for our purpose) and should thus follow general qualifier scoping rules, which don't exist yet (see 3. below). However, seeing as we can be quite sure to have at least one identifier than can definitely assumed to be a namespace, we might want to use variable.namespace for that usage. Once we decided on scope names for qualifiers (see below).


Now for the important part: How do we scope qualifiers and usages of variables or identifiers in general?

First, I believe we should clarify on terminology.

  • An identifier is a unique name made of words or characters that are atomic, i.e. changing any one character of it is impossible, and do not represent a keyword or a (language) constant, excluding case-insensitive languages. They can be used to reference a (virtual) storage of data or a (virtual) collection of structured data for storage.
  • A qualifier is a path built from one or multiple identifiers that can traverse a hierarchy of (virtually) structured data. Every identifier is also a qualifier on its own. Multiple identifiers that describe the path to a certain (virtual) data are joined using an accessor that separates identifiers syntactically. This may even be a whitespace character.

Unless I am mistaken, we can use these concepts to represent most, if not all, currently used patterns in programming languages.


Going forward, I conclude that we have to answer the following questions:

  1. Should we scope every identifier?
  2. Should we scope every qualifier? Note that every identifier is a qualifier.
  3. Which kind of scope should we use for identifiers/qualifiers?
    We can choose between a colored (variable), uncolored (meta) or new (identifier/qualifier) scope (on the first level).
    Should we introduce a new main
  4. Is "item access" in the form of indexing of a sequence or through key access of a mapping also part of the qualifier (and should thus be scoped as such)?
  5. What about function calls?
  6. (How much effort will this all be and can we ever hope to get all default syntaxes to follow it?)

I'll proceed with answering these question by myself, but I'm interested in your opinions.

  1. Yes. This question has two parts.

Currently, the variable scope has mostly been used for identifiers at various locations in a qualifier or depending on its semantics, like whether it represents a type/class or a function. However, its primary use is for the final identifier. Interestingly, support is equivalent in this regard except that identifiers scoped as support aren't user-defined and exist by default on behalf of the language's environment.
I believe it is in our best interest to continue this tradition and not break backwards compatibility basically everywhere if we were to apply a different top-level scope to identifier references.

Talking about declarations or definitions of identifiers, i.e. where they are usually used without the presence of qualifiers, is handled by entity.name and we should keep that.
Exceptions: Almost all entity.other scopes like entity.other.inherited-class (used as a reference to a class to be inherited). Imo we should adjust these to new guidelines eventually.

The second part of the question considers identifiers that are not the last segment of a path and where we usually don't know much. Encountering them at any place in a language like Python could mean anything from a namespace (module/class?), a type (class), a function or any other object data reference. However, we may be able to guess at what the identifier references based on naming conventions for constants or (built-in) types.

To conclude, yes, we should scope each identifier to the best of our abilities. If we cannot tell what an identifier represents (at its position in a qualifier), choose a generic scope. To make things easier, all identifiers should get this generic scope and only for those whose meaning we can decipher feasibly we add another scope.

  1. Depends. I don't think this hurts us in any way and we may be able to add sub-level scopes when we know what kind of qualifier we expect at this place. I don't really see a particular use case for color schemes currently, but the Expand Selection to Scope command could easily make use of this.

  2. Qualifiers should definitely not get a colored scope. meta.qualifier seems fine to me. We don't need a new top-level scope for non-atomic syntax elements.

For identifiers we currently have variable and support for those we can interpret. I don't think adding another top-level scope for basically the same purpose is worth it, so we should keep backwards compatibility on this. I don't think joining these two would be worth it either.

However, the scope for where we "don't know" is crutial. JavaScript currently scopes these as variable.other.readwrite; Python has meta.generic-name. For our candidates, I conclude meta.identifier and variable.other. The latter clearly indicates that the semantics are unknown and no third level is needed for this assessment, while the former is more subtle and could be stacked with variable and support.

I think choosing variable.other is optimal because strictly speaking meta.identifier would need to be added to every entity.name as well and we don't really gain anything from that. Instead, all references of any type (function, class, namespace) may be found simply through a selector on variable, support (in case the user overrides built-ins).
Where known, the type of the referenced data should be added as the second level, i.e. variable.function, variable.namespace or variable.constant. Whether or not types should be scoped as variable.type or instead remain as storage I'll leave to another time and RFC.

  1. No. As mentioned in 2., we barely benefit from scoping the qualifier and trying to include item-access or more is a lot of effort for seemingly no gain. Besides, at least according to my initial definition, they aren't even a part of a quantifier.

  2. same as 4.

  3. I believe the work required is manageable.

For Python, most of this is already finished and I just need to exchange scope names. For other languages, you'll most likely end up with a similar context layout where you need to scope path segments and accessors individually anyway.

The largest effort, as always, falls down to establishing the updated scoping guidelines and color schemes together with user perception, although we hardly have any backwards-incompatible changes in here.
The biggest pain points will probably come from adding variable to identifiers that didn't have them before and changing variable.parameter to entity.name.parameter (not talked about in here).


I hope you've been following me until this point, although I hope I structured it decently enough despite writing on it for an hour or so.

Did I miss a certain aspect? Does a language you know not fall into the raster/structure I imagined? Do you think non-first-level identifiers in a qualifier should get a variable.member instead of variable.other?

@Thom1729
Copy link
Collaborator

Your thoughts parallel my own. In particular, I agree that we have to address the general problem in a general way.

I do think that variable.other is the correct scope for a “generic” identifier. However, I think that a property name (bar in foo.bar) is a different kind of identifier deserving a different scope: variable.member, or variable.property, or something. In my opinion, it's the first identifier that's different from the others, not the last.

In some cases, it may make sense to further specialize these scopes in a purely additive fashion and on a best-effort basis. For example:

foo
# <- variable.other
foo()
# <- variable.other.function
foo.bar
#   ^ variable.member
foo.bar()
#   ^ variable.member.function

All that said, I'm not convinced that it's worthwhile to scope “paths” in the general case. An example in Python:

class Foo(collections.abc.Sequence):
    ...

There's an argument for scoping the path collections.abc.Sequence (say, meta.superclass). But I think that this argument is slightly misplaced — collections.abc.Sequence is just an expression. We would want to scope it the same way even if it were some other kind of expression — not because of its “internal” nature as a qualified path, but because of its “external” nature as the superclass part of a class declaration. (Granted, in a lot of languages, the only expression allowed there would be a qualified path, but we have to design the scoping guidelines for the general case.)

In this example, it might make sense to use a special additional scope for Sequence. (Currently we use entity.name.inherited-class.) But we should still have the regular variable.member scope.

@Thom1729
Copy link
Collaborator

I wrote up a rough draft based on the discussion. Comments welcomed.

entity.name: declaring a name.

When a name is declared, scope it under entity.name.

Use entity.name.function for the name of a function, method, or procedure:

function foo() {}
//       ^^^ entity.name.function

class Bar {
    baz() {}
//  ^^^ entity.name.function
};

Use entity.name.class for the name of a class:

class Foo {}
//    ^^^ entity.name.class

Use entity.name.namespace for the name of an explicit namespace, module, or package. Example in Java:

package foo.bar.Baz {}
//              ^^^ entity.name.namespace

Use entity.name.label for the name of a labeled code block:

foo: for (const x of y) { break foo; }
// <- entity.name.label

Use entity.name.import for the name of an imported value:

import Foo from 'foo';
//     ^^^ entity.name.import

import { bar } from 'bar';
//       ^^^ entity.name.import

import { bar as baz } from 'bar';
//              ^^^ entity.name.import

Questions:

  • Many languages have variable declarations (e.g. const x = 1). Should we scope these entity.name.variable? Perhaps add .parameter?
  • What about unquoted object keys in JavaScript and similar languages? People complain that they're uncolored, but they also complained when they were colored as strings. Perhaps entity.name.member makes sense.
  • We already scope formal parameters variable.parameter. This is probably wrong, and we should use an entity scope. How do we do this gracefully?
  • entity.other.inherited-class does not fit. Should we deprecate it?

variable: referring to a name.

When a name is referenced, scope it under variable.

Use variable.other for variable-like names in a general context:

foo;
// <- variable.other

Use variable.member for the name of a member, property, or attribute:

foo.bar;
// <- variable.other
//  ^^^ variable.member

Use variable.label for the name of a code label:

break foo;
continue bar;

Use variable.parameter for the name of a function parameter at the call site. In Python:

print(sep=',')
#     ^^^ variable.parameter

When an identifier is being called in a function-like way, we may add .function:

foo();
// <- variable.other.function

foo.bar();
// <- variable.other
//  ^^^ variable.member.function

NOTE: This does not necessarily mean that the identifier represents a value which is semantically a function. We can't generally know that! All we know is that it is being used in a function-like manner.

When an identifier is being used in a class-like way, we may add .class:

new Foo();
//  ^^^ variable.other.class

new foo.Bar();
//  ^^^ variable.other
//      ^^^ variable.member.class

In Python, there is no new operator. A function call is syntactically indistinguishable from a class constructor call. However, there is a strong convention that classes begin with uppercase letters:

foo()
# <- variable.other.function

Foo()
# <- variable.other.class

When an identifier is being used in a namespace-like way, we may add .namespace. In C++:

void f() {
    Foo::bar;
//  ^^^ variable.other.namespace
}

When an identifier is also scoped support, then we may automatically add .function or .class as appropriate:

Object;
// <- support.class.builtin variable.other.class

isNaN;
// <- support.function.builtin variable.other.function

NOTE: This simplifies implementation because a single rule can be used to scope Object in virtually all contexts.

Qualifiers

We shouldn't scope qualifiers or paths, per se. We should scope decorators/annotations:

    @foo.bar(complex + expression)
//  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.annotation (or whatever)

We should scope inherited classes:

class Foo extends (even+more/complex*expression) {}
//                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ meta.inherited-class (or whatever)

We could come up with more examples; the point is that these examples justify themselves without having to scope paths in general. Moreover, we can observe that scoping paths would not suffice in these examples because in a language as dynamic as JavaScript the thing in question may be an arbitrary expression (or something much more general than a dotted identifier path).

Is there a compelling reason to have a special scope for dotted paths, outside special cases like the above?

I don't think that it would be useful for highlighting. An identifier scoped variable.member would always be part of a dotted path, and I don't think that we'd want to color variable.other differently depending on whether it was in a path.

I don't think that it would be useful for tooling. Due to lookahead limitations, there's simply no way to do it well enough for a tool to rely on.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 26, 2019

I agree with everything I don't comment on.

Definitions

Many languages have variable declarations (e.g. const x = 1). Should we scope these entity.name.variable? Perhaps add .parameter?

Yes and see below. Add entity.name.constant to that list.

What about unquoted object keys in JavaScript and similar languages?

Keep as is, for now. Most recently we agreed on using meta.mapping.key string.unquoted for these. That would be the more general solution considering you may use arbitrary expressions/values in various languages as mapping keys compared to strings. However, JavaScript (and Lua) alias member access to a string key lookup, so defining a string key is equal to defining a member.

We already scope formal parameters variable.parameter. This is probably wrong, and we should use an entity scope. How do we do this gracefully?

Yes. I wonder how to call it, though. Usually it's just a function-local variable, so entity.name.variable would be correct. However, keyword arguments are different because they also declare the external name to be used in a function call. Note that Swift allows different internal and external parameter names that are part of the function signature, e.g.:

func add(string x: String) {/* variable name is `s` inside body */}
func add(_ x: Int) {/* variable name is `x` inside body */}
add(2)
add(string: "12")

In the string example, string is the external and x the internal name. In Python, def add(*, string): pass means string is both the internal and external name. (If it wasn't for *, you could still call this with a positional argument, though.)

Thus I suggest that if a name is (also) the external name, use entity.name.parameter, and if it is only the internal name, entity.name.other (see below).
For deprecation, variable.parameter entity.name.parameter/other?

I like variable.parameter for function calls.

entity.other.inherited-class does not fit. Should we deprecate it?

Yes. meta.inheritance <as_usual> sounds reasonable, e.g. meta.inheritance variable.other.class, which is especially useful for languages that allow expressions in the inheritance syntax like Python. I prefer "inheritance" over "inherited-class" because I think it has higher compatibility with trait-based models. I don't see a proper deprecation process here, though, because we want to get rid of a scope entirely, not just deprecate it.

References

I prefer variable.function over variable.other.function, so we can reserve variable.other for stuff we don't know anything about. variable.function has been used since TM days and we have enough freedom to add, say, variable.class next to variable.label. If we are confident enough to classify something as a function, we don't need to hide it behind variable.other.

Differentiating between variable.member and a generic non-member identifier seems reasonable, but variable.member.function or variable.member.class seem odd. It means to highlight a function you need to match variable.function, variable.member.function. Otoh, I do certainly see merit in highlighting members differently compared to top-level identifiers (variable.other) and having this information stack with function or class classification. A meta scope like meta.member would solve this.

func()
# <- variable.function
foo.func()
# <- variable.other
#   ^ meta.member variable.function
foo.bar.func()
# <- variable.other
#   ^ meta.member variable.member
#       ^ meta.member variable.function

When an identifier is also scoped support, then we may automatically add .function or .class as appropriate.

Everything that we would classify as support is also automatically a variable but built-in. To maintain compatibility, we should mirror the scope trees between the two, i.e. use support.function and support.constant. I don't see a need to add another variable name for the same thing.


More questions:

  1. What about generic variable declarations? Do it analogous to variable and assign entity.name.other?
  2. What about assignments? If a variable needs to be declared, e.g. in C, we can confidently say where a variable is introduced, but if declaration comes with assignment, does every assignment become a declaration? (imo no. Too much noise)
  3. variable.class => variable.type? Considering we use support.type for built-in classes.
  4. And similar to that, when does storage.type come into play?

@Thom1729
Copy link
Collaborator

I prefer variable.function over variable.other.function, so we can reserve variable.other for stuff we don't know anything about.

I've been using the model of variable.<vocabulary>, where a “vocabulary” is a sort of lexical context. Plain variables belong to one vocabulary, and member names belong to another vocabulary that's not merely distinct but truly incommensurate. (True, in some languages, you can get a first-class environment whose members are lexical variables, but we have to draw the line somewhere.) In JavaScript, foo, bar(), and new baz() all refer to the some kind of name, and that is a fundamentally different kind of name than break foo or [].bar. The “other” scope component was meant to explicitly encompass this kind.

In this model, “other” isn't just a catch-all, but a category in its own right, so the “other” in variable.other.function is not vacuous. The name “other” probably does not communicate this very well; it's a compromise for compatibility. Maybe something more explicit would be better.

To find all lexical variables in a JS file, we could select variable.other. If we didn't use other consistently, then we'd have to enumerate the guesswork-based scopes like variable.function as well.

variable.member.function or variable.member.class seem odd. It means to highlight a function you need to match variable.function, variable.member.function.

sublimehq/sublime_text#2619 would solve this (and a host of other issues), though (alas) we have to plan around the features we have.

Everything that we would classify as support is also automatically a variable but built-in. To maintain compatibility, we should mirror the scope trees between the two, i.e. use support.function and support.constant. I don't see a need to add another variable name for the same thing.

The idea is that if a color scheme or tool selects (say) variable.member.function, then this should select all methods, including built-in methods. Otherwise, improving support scopes could be a breaking change.

What about generic variable declarations? Do it analogous to variable and assign entity.name.other?

I'm really starting to dislike other. entity.name.variable would make sense, but I don't think we want to do variable.variable.

What about assignments? If a variable needs to be declared, e.g. in C, we can confidently say where a variable is introduced, but if declaration comes with assignment, does every assignment become a declaration? (imo no. Too much noise)

Nah, explicit declarations only. In (say) Python, I'd rather say that there are no variable declarations than that every assignment is a declaration. (Function formal parameters, imports, and other constructs would still explicitly declare names.)

variable.class => variable.type? Considering we use support.type for built-in classes.

No strong opinion on class/type.

And similar to that, when does storage.type come into play?

Ugh. I don't like storage. I think that keywords should be keyword and references (even references to type names) should be variable. In some languages, types constitute a truly separate “vocabulary”; in others, like Python, they don't, but it shouldn't do any harm to pretend that they do. In a perfect world, I'd use variable.type for type names.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 26, 2019

I've been using the model of variable., where a “vocabulary” is a sort of lexical context. Plain variables belong to one vocabulary, and member names belong to another vocabulary that's not merely distinct but truly incommensurate.

I can get behind this, but I also have a problem with specifically using the name variable.other for this purpose, although this is actually what the old TM docs suggest. I can't think of a better name for it either, however. Conceptually, it's just a "this isn't anything special syntax-wise".

I thougth about reversing the relation so that we the suggested sub-scopes of variable.other to variable and sub-scopes of variable to variable.other. That'll have its own variety of compatibility issues, though. The big advantage would be that we don't stash the most frequently used sub-scopes of variable into variable.other. Otoh we then get variable.other.member.function. 😕

We can use entity.name.variable for anything that is variable.other and not (!) covered by the likes of entity.name.function.
By the way, I'm actually starting to dislike entity.name.class now as opposed to entity.name.type.class, because a class is a type and the more generic concept, but w/e.

Otherwise, improving support scopes could be a breaking change.

Yeah, we can't really do that with your proposal but also don't need to. The support scope must come after variable.other, however. Coincidentally, this also allows support.function as a selector to work for both member and other variables.

In some languages, types constitute a truly separate “vocabulary”; in others, like Python, they don't, but it shouldn't do any harm to pretend that they do.

So, do we want to pretend they are different vocabulary when used in statically typed languages (or in Python type hints) or do we not?

In my opinion, tokens affecting storage type and kind in languages like C or even JavaScript are significant enough to warrant their own treatment. In statically typed languages, they are a different token entirely and should thus not be variable.other. variable.type would work, but then it becomes extremely similar to variable.other.type, which would be instantiation/usage of the class/type and something entirely different.

storage.modifier could only be moved to keyword.storage.modifier, at which point you might as well not do it at all and keep compatibility.

I'm undecided on whether support.type should only be applied to the usage or to both usage and static type.


Let's take a look at what this would imply. The following is a tree of "your suggestion" (I'll skip naming the concepts):

variable
  parameter
  label
  member
    type
    function
    constant
  other
    type
    function
    constant

This is "my suggestion":

meta
  member
variable
  parameter
  label
  member
  type
  function
  constant
  other

Downsides of your suggestion:

  • other and member subtrees are mirrored (and support, actually)
  • to target all function calls, you need variable.member.function, variable.other.function (and variable.function as the fallback, but that doesn't conflict) (and support.function because some syntaxes don't stack them, but that's irrelevant) (Approximate matching in selectors sublime_text#2619 would be lovely, indeed)
  • forwards-incompatible with variable.function, unless you use both for a period of time
  • variable.other will be used a lot (but so is entity.name)

Downsides of my suggestion:

  • the syntactically similar top-level identifiers (what you usually refer to as "variable") aren't grouped and instead mixed with variable.parameter and variable.member, for example
  • an additional meta scope

It fundamentally depends on how you weight these. I consider the first downside of my suggestion to be the most significant.


Since your suggestion is more breaking than mine, I took a brief look at how breaking it would be.

The following is how the variable scope is used currently (as suggested by PackageDev, collected empirically):

variable
    language
    parameter
    function
    annotation
    other
        constant
        member
        readwrite

We didn't talk about variable.language, so let's assume that is unchanged. It's different across languages anyway, e.g. in C++ it's actually a keyword while in Python it's a normal variable. Bash uses it for $1, Makefile for $%, Go for _, and JavaScript for various tokens. Those usages are fine as-is.

variable.parameter we already discussed. Its usage in function definitions will change to entity.name.parameter while function calls remain the same.

I found variable.function in 20 syntaxes in the default packages. That's quite a significant portion. We may be able to change all of these in a single batch (can most likely just regex replace), but I wonder about color schemes and third-party syntaxes.

variable.annotation wasn't addressed, but due to lack of time, I'll skip that for now. (#737)

variable.other.constant already follows the suggested style. .member will be moved up a level and .readwrite becomes variable.other but excluding known submatches.

variable.{.other,}type isn't used anywhere.

Thus, I conclude that except for variable.function this is probably trivial to change.

Some statistics on `variable` usage to work with
$ rg -g \*.sublime-syntax -e 'variable\.[^\. ]+\.' -o --no-filename | sort | uniq -c
     14 variable.annotation.
      1 variable.attribute-name.
      5 variable.begin.
      5 variable.declaration.
      1 variable.documentroot.
      2 variable.element.
      5 variable.end.
      1 variable.entity.
      1 variable.finder.
     77 variable.function.
      4 variable.import.
      1 variable.itunes.
      4 variable.job.
      3 variable.label.
     92 variable.language.
      1 variable.loop.
      1 variable.mac-classic.
      1 variable.magic.
      1 variable.notation.
    184 variable.other.
      1 variable.package.
    209 variable.parameter.
      1 variable.standard-suite.
      2 variable.substitution.
      2 variable.substring.
      3 variable.type.
      1 variable.using.

$ rg -g \*.sublime-syntax -e 'variable\.[^\. ]+\.[^\. ]+\.' -o --no-filename | sort | uniq -c
      1 variable.annotation.cluster.
      2 variable.annotation.function.
      1 variable.annotation.package.
      1 variable.function.assumed-macro.
      1 variable.function.guard.
      9 variable.function.member.
      1 variable.function.reference.
      1 variable.function.tagged-template.
      1 variable.import.renamed-from.
      1 variable.import.renamed-to.
      1 variable.language.arguments.
      1 variable.language.array.
      1 variable.language.attribute.
      1 variable.language.automatic.
     19 variable.language.blank.
      1 variable.language.constructor.
      2 variable.language.deconstruction.
      3 variable.language.environment.
      1 variable.language.global.
      2 variable.language.import.
      1 variable.language.job.
      2 variable.language.omitted.
      1 variable.language.prototype.
      1 variable.language.qmark.
      1 variable.language.super.
      1 variable.language.target.
      2 variable.language.this.
      1 variable.language.tilde.
      6 variable.language.underscore.
      1 variable.language.wildcard.
      1 variable.other.alias.
     17 variable.other.backref-and-recursion.
      1 variable.other.back-reference.
      2 variable.other.base-class.
      1 variable.other.class.
     13 variable.other.constant.
      3 variable.other.dollar.
      3 variable.other.field.
      1 variable.other.function.
      2 variable.other.generic-type.
      5 variable.other.global.
      1 variable.other.group.
      2 variable.other.interpolated.
      1 variable.other.math.
     19 variable.other.member.
      2 variable.other.namespace.
      1 variable.other.parameter.
      1 variable.other.placeholder.
      2 variable.other.predefined.
      1 variable.other.property.
     49 variable.other.readwrite.
      4 variable.other.regexp.
      1 variable.other.section.
      1 variable.other.selector.
      1 variable.other.subpattern.
      2 variable.other.template.
      2 variable.other.valid.
      1 variable.parameter.ameter.
      1 variable.parameter.bracket.
      1 variable.parameter.class-inheritance.
     76 variable.parameter.function.
      3 variable.parameter.handler.
      1 variable.parameter.hyphenation.
      1 variable.parameter.input.
      3 variable.parameter.labeled.
      1 variable.parameter.loop.
      1 variable.parameter.multiline-width.
     34 variable.parameter.option.
      3 variable.parameter.optional.
      2 variable.parameter.output.
      1 variable.parameter.record.
      1 variable.parameter.regular-field.
      1 variable.parameter.special-field.
      2 variable.parameter.switch.
      1 variable.parameter.tuple.
      1 variable.parameter.type.
      1 variable.parameter.unit.
      2 variable.type.dollar.

It's getting pretty late over here and I should've been doing something entirely different, but this problem in particular takes a lot of consideration and I always end up working on multiple parts of my post at the same time or in short succession. Hopefully I didn't mix things up too much.

I would be very interested in other opinions. Besides us two, nobody has commented on the Big Picture Discussion so far.

@Thom1729
Copy link
Collaborator

The support scope must come after variable.other, however.

Agreed.

So, do we want to pretend they are different vocabulary when used in statically typed languages (or in Python type hints) or do we not?

In my opinion, tokens affecting storage type and kind in languages like C or even JavaScript are significant enough to warrant their own treatment. In statically typed languages, they are a different token entirely and should thus not be variable.other. variable.type would work, but then it becomes extremely similar to variable.other.type, which would be instantiation/usage of the class/type and something entirely different.

I think that, where types are concerned, languages generally fit into three categories.

  1. Types do double-duty as de facto declaration keywords (e.g. C, Java, C#).
  2. Types stand alone, but belong to a separate vocabulary from first-class values (VB, Scala).
  3. Types are (usually) just first-class values (JavaScript, Python).

In C, a storage scope that includes both int and function makes sense, because those are both keywords signalling that a new name is being declared. In most languages, this is not the case; in Python, def and int are very different kinds of things. I think that there is general consensus on this point, so I will not belabor it.

For languages in category (2), variable.other is clearly wrong. For languages in category (3), some sort of variable scope is clearly right. variable.type seems like a reasonable compromise, but (as you point out) it does nearly conflict with variable.<whatever>.type.

An alternate approach would be to use storage.something, but to add variable.other in category-(3) languages. So in Scala:

val x: Int
//     ^^^ storage.type.primitive

And in Python:

x: int
#  ^^^ variable.other storage.type

v: sublime.View
#  ^^^^^^^ variable.other
#          ^^^^ variable.member storage.type

Let's take a look at what this would imply.

Generally agreed as to what the tradeoffs are.

In my mind, the biggest advantage to my suggestion is unifying variable scopes in a consistent fashion, and the biggest disadvantage is the change from variable.function to variable.other.function.

I hadn't considered using a meta scope. It almost seems like a hack to get around the lack of more powerful selectors, but we work with the system we have. Because these scopes should never cover more than one token, it should be safe to select meta.member variable or variable - meta.member. It looks generally fine to me, with the following thoughts:

  • variable.other seems vacuous; why not omit it?
  • variable.member seems redundant.
  • variable.label and .parameter don't quite fit with .type, .function, or .constant, but it's not too bad.
  • This would seem to go well with using storage for type names. In e.g. Python, those names could also be variable.

Also, I'm starting to become skeptical of variable.type. If we keep using storage, what are the use cases? That is, what is a syntactic context in which we would say that a variable is used in a type-like manner?

  • Inheritance, which already uses entity.other.inherited-class.
  • Type declarations, which would use storage.
  • new expressions. Is this really useful?

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 27, 2019

I think that, where types are concerned, languages generally fit into three categories.

First and third look good. I don't have experience with VB or Scala, so I can't realy comment on your assessment regarding the second category. My hunch would be to only use storage.type.
I also like the alternative suggestion. The question arises whether to scope built-in types on top of that (as you'd do storage as a meta scope here, more or less), so with the example of Python:

x: int
#  ^^^ variable.other storage.type support.type
# (specifically not `variable.other.type` due to the naming being unconventional)

The good part is that you can highlight built-in types used in a declaration or function annotation easily with storage support.type. The bad part is that with lazy selectors built-in types look the same as when they are used for instantiation, but that's still the same we have currently. Another bad part is that variable.other support.type scores higher until sublimehq/sublime_text#2152 is addressed.

Note that we decided to use keyword.declaration for keywords like def or class.

I hadn't considered using a meta scope. It almost seems like a hack to get around the lack of more powerful selectors, but we work with the system we have.

Yes, this is a hack because we'd mask the same scope with a meta to differentiate between the two syntactically different usages but with a simpler selector. It still requires specifically excluding the meta scope when you want to target the non-member variant, so it's not an improvement over variable - variable.member but over variable - variable.member - variable.parameter - variable.label.

In my suggestion, variable.member was the analogue to variable.other for cases where we "don't know" because I suspected meta.member variable to score higher than variable.other.function as a member function. I just verified, using sublime.score_selector and to my confusion, that it doesn't.

Either way, the more I think about it, the more I like your grouping suggestion, although I already preferred it yesterday. Get approximate matching into core and I'm entirely sold. just match for variable.*.function and all cases are covered.

Also, I'm starting to become skeptical of variable.type

In your or my suggestion? Assuming yours. Type (3) languages, or Python in particular, wouldn't use variable.type because they consider types to be first-class citizens, making them variable.other.type.

  • entity.other.inherited-class should be removed. meta.inheritance variable.type looks like the only candidate here.
  • We probably need to stack them.
  • While semantically similar to variable.other.type, I suppose we still need to be consistent with our syntactic classification if we agree on that being the deciding factor for the second variable level.

A notable concern with all this talk is that we might be overwhelming color scheme authors, although we don't exactly use rocket science here. Most useful selectors don't exceed two stacked scopes while maintaining lexical accuracy for more complex selectors or tools to work with. Maybe a compilation of standard scope coverage or common colorization efforts using the proposed schema would be useful.
An even bigger concern is making it harder for syntax authors to choose the correct scope names than it already is. This might be a good candidate as the first SNP for #1440.

Any other unanswered questions so far?

@Thom1729
Copy link
Collaborator

The question arises whether to scope built-in types on top of that (as you'd do storage as a meta scope here, more or less), so with the example of Python:

I concur with the example.

Another bad part is that variable.other support.type scores higher until sublimehq/sublime_text#2152 is addressed.

Given the scope variable.other storage.type support.type, the selector variable.other support.type should score higher than storage.type regardless of sublimehq/sublime_text#2152. However, storage.type support.type should score higher yet.

Type (3) languages, or Python in particular, wouldn't use variable.type because they consider types to be first-class citizens, making them variable.other.type.

I think I wrote confusingly.

I'm skeptical of the scope that in my suggestion would be variable.*.type and in your suggestion would be variable.type. I think that we could specify that scope, but I'm not convinced that it would be useful. The alternative would be under my suggestion variable.* and under your suggestion variable.other or variable (if variable.other is vacuous). I'll phrase the below in terms of variable.*.type for clarify, but it should translate from my suggestion to yours.

variable.*.type would refer to a variable.* that is referenced in a type-like manner. I think that this would almost always be a redundant scope.

In a category (3) language like Python, a type name in an annotation might be variable.*.type. But it would already be scoped storage.type, so the .type subscope would add no new information.

If we remove entity.other.inherited-class, then meta.inheritance variable.type could be used to match the name of the inherited type (if applicable), but this gets tricky because in many languages meta.inheritance might contain an arbitrarily complex expression, and that selector could hit irrelevant variables inside that expression. I think that entity.other.inherited-class is kind of a mess, but it is by its nature laser-focused to only mean the name of the inherited type. I don't have a perfect answer for this,

For comparison, variable.*.function would generally be applied when a lookahead sees function arguments. This is conceptually simple, broadly applicable, and clearly useful, and it provides information that is not otherwise available. (A syntax that uses meta.function-call might be able to figure it out, but that scope is a mess with all the problems of trying to scope paths and also more, unique problems.) The implementation would usually be a simple rule matching {{identifier}}{{args_lookahead}}: two or three extra lines in a couple of places.

By contrast, variable.*.type would be used in several syntactically distinct ways, most of which are quite narrow, most of which would be redundant with other scope information. In addition, implementation would be relatively complicated. In the JavaScript syntax, it takes about twenty extra lines of code to implement entity.other.inherited-class. It takes about thirty extra for new Foo() to use variable.type, and I've been hoping to rip that one out someday.

I'm not set against variable.*.type, and it would fit reasonably well in either suggested hierarchy. I'm just not sure that it hits a good balance of utility to complexity. Of course, standardizing it doesn't mean that syntax authors must implement it; it's helpful just to say that if an author implements something like that, then variable.*.type is the correct scope.

Any other unanswered questions so far?

Not that I can think of.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 27, 2019

However, storage.type support.type should score higher yet.

Tbh, I don't remember what I was going at with that. That is what I meant.

In a category (3) language like Python, a type name in an annotation might be variable.*.type. But it would already be scoped storage.type, so the .type subscope would add no new information.

Only in situations where they are used in a declaration, i.e. variable type hints and function annotations. Here's what I had in mind:

x: typing.Option[Abc] = Abc(2.2)
# <- entity.name.variable?
#^^^^^^^^^^^^^ meta.* (probably)
#  ^^^^^^ variable.*
#         ^^^^^^ variable.member.type storage.type support.type
#                ^^^ variable.*.type storage.type
#                       ^^^ variable.*.type - storage

This translates to C++ just fine:

Type *a = new Type();
// <- storage.type
//    ^ entity.name.variable?
//            ^^^^ variable.type - storage

Here's another example where variable.*.type not implying storage.type would be useful:

isinstance(x, MyClass)
#             ^^^^^^^ variable.other.type

Tl;dr: use variable.type in C(++), use variable.*.type in Python.

While that seems redundant, since I doubt you'll colorize one of these differen than the other, it stays true to the grouping in variable.* and if we don't do that, we might as well go back to variable.function, too, because there'd be no benefit from grouping if we don't do it consistently.

For inheritance in Python, a simple look-ahead to check for a simple type to be scoped as variable.other.type (or maybe a construct with typing metas) would be enough as a best effort imo. In fact, by what I suggested above, this is already accomplished by just including the expression context, since a standard usage of a variable following naming conventions will already be classified as a type. storage shouldn't be used in inheritance because this is referencing a type to create another type based on that, not declaring the storage type of a variable/identifier.

Edit: Actually, I just noticed a problem. What if the type in C++ is defined as a member, e.g. std::string?

@Thom1729
Copy link
Collaborator

I think I'm convinced. For one thing, if we didn't scope new-like constructs using variable.*.type, then in Python we'd scope Abc(2.3) as variable.other.function, which seems wrong in a very preventable fashion. (In Python, particularly, we would rely greatly on convention, but them's the breaks.) isinstance is a case I hadn't considered; it also brought to mind type casts in Flow. Special-casing the isinstance function does feel a bit weird, but it's no weirder than special-casing (say) require() in JavaScript.

Tl;dr: use variable.type in C(++)

Wouldn't we use storage.type instead of variable.type? Or would we stack them? Would storage.type ever be "naked"?

storage shouldn't be used in inheritance because this is referencing a type to create another type based on that, not declaring the storage type of a variable/identifier.

I think I'm a little confused as the meaning of storage.type. It seems to be in an awkward in-between place: it doesn't merely mean that the token represents a type, but it also doesn't cover the entire value of a storage type declaration (e.g. List[str] or foo.Bar). Those would seem to be to be the right levels of abstraction: variable.type to indicate that a token is syntactically a type name (or variable.other.type as appropriate) and something like meta.type to cover a "type expression" in a declaration.

Edit: Actually, I just noticed a problem. What if the type in C++ is defined as a member, e.g. std::string?

std is variable.namespace and string is storage.type.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 28, 2019

Wouldn't we use storage.type instead of variable.type?

Not always, as with my suggestion we wouldn't be using storage.type in new, inheritance or casts, even.

It seems to be in an awkward in-between place: it doesn't merely mean that the token represents a type, but it also doesn't cover the entire value of a storage type declaration.

Yes, this assessment is correct. It's in a weird compatibility limbo with being used extensively historically for int or float in declarations, but also for casts because the old standard didn't account for variable.type. Ideally we'd do the same as with annotations and class inheritance and wrap the entire thing in a meta scope (meta.type, meta.type.cast), but then literally everyone would lose their storage colorization and that's a huge change. Deprecating entity.other.inherited-class is manageable, but for a top-level scope like storag not so much.

A potential less awkward solution would be to always use storage.type instead of variable.type, but only in the C class languages. In Python we'd still stack variable.*.type storage.type.

@Thom1729
Copy link
Collaborator

I think I get it now: storage.type would almost always be stacked with a variable scope.

It may not be completely redundant though. In C#, dynamic should probably be storage.type but not variable.type. The same may apply to ignore in Python and to some other special cases.

In fact, many C# "type names" are actually keywords: int is a keyword that means System.Int32, and so on. In these cases, should we go with variable.type support.type storage.type or with keyword.other.type storage.type?

@FichteFoll
Copy link
Collaborator

Interesting case. Java also has primitive types like int that don't behave like obect types and are in fact keywords.

I initially wanted to say if they behave exactly like a type but are in fact an alias, that still qualifies as an identifier being used as a type (and not a keyword being replaced with a type). But they aren't in user space because they are reserved keywords and may never be used as the name for a custom type. I suppsoe in that situation, keyword.other.type storage.type (support.type)? makes sense.

Do you have an opinion on storage.type support.type order? I'm actually undecided and believe we should decide this on what color a user would rather want built-in types in declarations to look like, i.e. whether they should look different from other types. Both can easily be overridden with a respective selector, but the default is important.
I guess support last might also be easier to lay out in a syntax definition, but not by a significant margin.

@Remillard
Copy link

Remillard commented Mar 6, 2020

This is late to the game, but as a user of scoping definitions, any time I run into other I can't but help think it means "Something that belongs here, but cannot be definitively chucked into a known bucket". As a result, I find the notion of other having submembers (e.g. variable.other.constant or the like) to be a little strange.

My own corner of the world is VHDL, a strongly and statically typed language. There are no types in the LRM reserved words, however the standard library (which might as well be considered part of the language as it doesn't even really need to be declared) defines boolean, integer, bit, character, real, and so forth. As a result, I end up scoping these as support.type.std.vhdl using support.type to denote that it's a known type from a supporting library, and std to indicate it is from the standard library.

Due to the concurrency of hardware, the language also has multiple things similar to "variables" the difference being on when a value assigned to them take effect (immediately lexically, or driven at a later resolution point). I find great value in using storage.type to differentiate these when I can. So broadly speaking for VHDL we have signals, variables, and constants. Using storage.type.signal, and storage.type.variable has significance lexically. There is also the capability to define your own type, so a declaration like that I think the best way of scoping is:

type MY_STATES is (IDLE, STAGE1, STAGE2, FINISH);
^------ storage.type.type
     ^------ entity.name.type
               ^------ keyword.other.block.is

And so forth. Here though, that other feels kind of strange because it feels like we don't know what it is, yet we do know what it is. And is is kind of a strange token working multiple duties as a separator, and block start definer, as well as in some structures optional, and in other structures mandatory.

Later on I might declare a signal with that type:

signal count : integer;
^------ storage.type.signal
       ^------ entity.name.signal
             ^------ punctuation.separator
                ^------ support.type.std
signal current_state : MY_STATES;
^------ storage.type.signal
       ^------ entity.name.signal
                     ^------ punctuation.separator
                       ^------ variable.??other??.type

Again, that other is a bit strange. The token in this field must be a type, and if it's a known type, I can try to categorize it as a support.type and if not it's a variable.??.type

Anyhow, I'm not sure how this factors into the discussionn other than to try to throw in one of the stranger tributaries of language scoping, and maybe that'll aid definition.

(I also wish that function in a scope were replaced by subprogram because I find myself wanting to classify between functions and procedures, but I suspect that ship has sailed as far as color schemes and such. I tend to just classify them both as function now to make them show up similarly (e.g. entity.name.function for both functions and procedures)

@ismell
Copy link

ismell commented Mar 6, 2020

@Remillard Check out #1861 It might help.
As for variable.??.type. In #1831 I use support.type.c for unknown types. I have no way of knowing if something is a 'library type' or a 'user type', so I just classify them all as library types.

@deathaxe
Copy link
Collaborator Author

deathaxe commented Mar 6, 2020

I find the notion of other having submembers (e.g. variable.other.constant or the like) to be a little strange.

The variable.other scope denotes to ordinary user defined variables. Variables defined by the language are scoped as variable.language and function identifiers use variable.function.

I personally find variable.other.constant an oximoron and would replace that by constant.other. I guess the original intention was to be able to destinguish between readonly variables (see: final int var in Java) and those which are readwrite. The contextless parser of ST won't be able to destinguish them for most languages, so I find it a bit useless. That said, why is it odd to add a subscope to variable.other?

There are no types in the LRM reserved words, however the standard library (which might as well be considered part of the language as it doesn't even really need to be declared) defines boolean, integer, bit, character, real, and so forth.

Basically support.type is the best choice here, while it may be useful or ok to scope things like int, bool, ... as storage.type.primitive as well as storage is the primary scope to use for datatypes.

IIRC, I did so for several data types or functions in Perl. I just scoped them as storage even though perl calls them a function - just for a more consistent result.

Using storage.type.signal, and storage.type.variable has significance lexically.

I wouldn't recommend storage.type.variable as variable is one of the primary scope names, which may easily lead to misunderstandings. Especially as there discussions about scoping data types as variable.type.

type MY_STATES is (IDLE, STAGE1, STAGE2, FINISH);

This is basically the situaiton we are faced to with classes and structures in C/C++/Java/... .

Even though a class or struct is primarily a kind of datatype, they more likely are used to denote/define complex datatypes. Thus I tend to think they should be scoped keyword.declaration

type MY_STATES is (IDLE, STAGE1, STAGE2, FINISH);
^------ keyword.declaration.type
     ^------ entity.name.type
               ^------ keyword.declaration.is

Same here:

signal count : integer;
^------ keyword.declaration.signal
       ^------ entity.name.signal
             ^------ punctuation.separator
                ^------ storage.type.primitive
signal current_state : MY_STATES;
^------ keyword.declaration.signal
       ^------ entity.name.signal
                     ^------ punctuation.separator
                       ^------ storage.type.type

It is currently not clear what the best choice for scoping user defined complex datatypes like your MY_STATES is. While entity.name.type <-> variable.type could be applied from the goto definitions point of view, it feels strange to scope a data type as variable. The original TextMate scoping guidelines define storage as the primary keyword for such things.

I'd prefer the solution suggested in the two examples above keyword.declaration.type -> entity.name.type -> storage.type.type for all such cases in all languages.

@Remillard
Copy link

Well I think a single selector variable would only catch top-level scopes marked variable (e.g. variable.....) and not when variable was down in the scope hierarchy (e.g. a.b.variable.....) so I think I'm probably safe from random selector shenanigans. I know there's potential for confusion, but I do feel like for this language I need to attempt to distinguish between them. Of course, not that anyone is writing a color scheme for VHDL specifically, but for Goto functionality and other things, it might be useful to key off of that quality.

As for storage.type vs keyword.declaration I suppose I have no strong feelings, except that I get confused as to where storage.type might be applied. My understanding is that it denotes an major or minor object classification relating to the way they are used which seems to fit what I'm doing. storage.type defines the classification of the object, entity.name defines the name when declared, and then variable.* denotes the object when in use. There's no way for me to really classify in-use objects without knowing contextful information about them of course as you note.

Feel like this is attempting to define a generic meta-language for writing languages, what concepts they embody, and how they are used! Very tricky to cover all sorts of languages with equal notions.

wbond pushed a commit that referenced this issue Jul 21, 2020
* [Haskell] Rewrite operator matching

- Use variables
- Highlight '*' (and combinations) as operator
- Add punctuation scope to infix notation
- Scope non-infix notation as `keyword.operator`

* [Haskell] Update keyword matching

- Add proper scopes to control keywords
- Add proper scopes to declarations
- Use proper scope names for entities in declarations

* [Haskell] Restructure with contexts

* [Haskell] Remove usages of double quoted scalars

* [Haskell] Simplify string matches

Also highlight superfluous characters.

* [Haskell] Reduce max line length

* [Haskell] Match groups

* [Haskell] Adjust scopes for imports

Not final due to #1842 being
unresolved, but still an improvement.

* [Haskell] Match lists

* [Haskell] Correctly match idents with trailing '

* [Haskell] More gracious infix operator matching

* [Haskell] Adjust keyword scopes to recent standards

* [Haskell] match OPTIONS_HADDOCK

Same as https://github.com/sublimehq/Packages/pull/2270/files

* [Haskell] match deriving (..) via (..)

Same as https://github.com/sublimehq/Packages/pull/2271/files

* [Haskell] match @ and # in keyword.operator.haskell

Same as
- https://github.com/sublimehq/Packages/pull/2272/files
- https://github.com/sublimehq/Packages/pull/2273/files


Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] match deriving instance (..)

* [Haskell] Match functions from the prelude

Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307

Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] Add tests for `via` derives

* [Haskell] match deriving instance (..) without breaking data deriving

This fixes a bug introduced via 0d36dd1

Co-authored-by: Nikos Baxevanis <nikos.baxevanis@gmail.com>
@mitranim
Copy link
Contributor

mitranim commented Nov 3, 2020

I'd like to step away from specific scopes and classify the problem. Apologies for the infodump.


From a compiler's perspective, identifiers can be divided into:

  • Closed-set identifiers: keywords with special syntax for each.

  • Open-set identifiers: declared types, procedures, constants, variables, imports. May be predeclared by the language.

Types belong to the open set, because most languages let you define them.

Built-in types, constants, and functions belong to the open set of identifiers. In some languages, like Go, they're merely predeclared, not reserved, and can be redefined. Scoping built-ins as built-in is optional.

It's worth extending the definition of closed-set identifiers to special symbols like =. In Haskell, this is used for function definitions.

Closed-set keywords usually have special syntax. Open-set identifiers usually don't, with the exception of custom operators and macros. See below.

Another, orthogonal, classification:

  • Identifiers with special syntax: keywords, prefix operators, infix operators.

  • Identifiers with no special syntax.

In languages with custom operators, such as Haskell, user-defined operators like + belong to the "open set", but usually have special syntax (always binary infix in Haskell). In Lisp dialects, + is a regular identifier and merely a function, with no special syntax.

Closed-set sub-classification:

  • Declaration keyword for adding an open-set identifier to the scope: import X, type X, class X, function X, const X, var X.
  • Flow control keyword: return, continue, try, if, for.
  • Operator keyword: and, typeof.
  • Modifier keyword: public, private, volatile, final, const, mut.
  • Pseudo-function keyword: sizeof(X). Since they don't use special syntax, they can be chucked into the open set.
  • Probably more.

Open-set sub-classification:

  • Name being declared: X in import X, type X, class X, function X, const X, var X, X = ....
  • Name being used.

Open-set "name being used" sub-classification:

  • Name as value: one.
  • Name as part of path: one.two. ....
  • Name called: one().
  • Name called in path: three in one.two.three().
  • Type as storage modifier: some_value: SomeType.
  • Type as value: isinstance(some_value, SomeClass).
  • Type in path: SomeType.typeid.
  • Type called with types: Option in Option<String> or Option a.
  • Type as parameter to type: String in Option<String>, a in Option a.
  • Type as contract: Ord in <A: Ord> or Ord a => ....
  • Probably more.

Sidenote. As far as I can tell, in C, C++, and other languages where functions are defined with some_type func_name(), the type does not act as a keyword that starts a function declaration. It's the syntax <any_type> <any_ident>() with parens that turns this into a function declaration. Substituting parens for = turns this into a variable declaration. Which of them is allowed in root scope or local scope has no bearing on this.


Whew! I hope this makes sense. The above was objective. Now for my subjective conclusions.

For me personally, the most important information is the role of the identifier in the current context.

Important role 1: whether it controls syntax. Special keywords, operators, and punctuation define a syntax structure with "holes" where you can plug the non-special words from the "open set". For this reason, scoping these two roles differently is most important.

The simplest approach is keyword for anything that involves special syntax, and no special scope for others. Optionally, give others a generic scope, such as word.

As noted earlier, some languages have custom operators which belong to the "open set" of identifiers, yet involve special syntax such as prefix or infix, distinct from normal function calls. I believe these should be treated as keywords, since syntactic structure is more important whether something is "well known".

Important role 2: declaration or merely usage. Declarations are used for symbol navigation. From my perspective, declarations of root-level functions, types, variables, and constants, are all equally important, and symbol search for all of them is useful in practice. For this reason, there should be one scope for declared names that should be indexed (currently entity), which should be used for all root-level declarations including global variables. However, block-scoped declarations should not be added to the symbol index. For this reason, there should be a standard way to exclude them; - meta.block might be enough.

Important role 2.1: declaration keyword or regular keyword. Traditionally, declaration keywords have been scoped as storage.type.function, storage.type.class, and so on. Somehow this spills over and applies to anonymous functions: func(){}. Personally, I have no opinion on this. We could probably do without this. Aside from tradition and habit, there might be practical reasons I'm not aware of.

Important role 3: storage properties of a value. Its memory layout, numeric or structured, available fields, constant or mutable, reference or value. Traditionally this has been storage. This conflation makes sense to me, but only in "type positions". When using a type as a value, it may be appropriate to scope it like a regular identifier. In languages without strong conventions for type names, it may be unavoidable. For example, in Go, identifiers in call positions are always scoped as functions, even though casts use the same syntax.

Traditionally, some syntaxes scope certain types as "classes", and many color schemes give them special colors. This never made sense to me. In languages with classes, for all intents and purposes they're types, and should receive no special treatment. Syntactically, it's usually impossible to distinguish.

A type can be used:

  • As a storage modifier.
  • For instantiation.
  • As a value.
  • As a namespace.

A type's role as a storage modifier is entirely unrelated to its role as a value or namespace. I believe they should be scoped and colored differently. In many languages it's already impossible to detect whether part of a namespace is a type or a package name. The same applies to using them as values. For this reason, I believe we should scope types as storage only in special type positions such as value: Type or new Type().

Important role 4: call or value. Identifiers "called" as functions, methods, or macros have a special semantic role, and need a generic scope. The current standard is variable.function. I would simply prefer call, but I'm not advocating for a big renaming.

It should be noted that given the same function name, calling it and passing it as a value are entirely different roles. Even if the syntax could unambuguously (pun intended) detect that the given value is a function, I want calls and values scoped and colored differently.


There are more conclusions to draw, but I ran out of steam and must return to work. This is already much to absorb. I apologize and hope that this is useful to the discussion.

@wbond
Copy link
Member

wbond commented Nov 3, 2020

It might be helpful to note that we aren't working in a vacuum. We aren't going to break backwards compatibility of syntaxes and themes. Changing how we scope keywords isn't going to change.

Part of the reason there has been no movement on this issue is:

  1. Different languages have vastly different approaches to identifiers, and it can often be difficult to know what we are dealing with.
  2. Different community members have different ideas about how much color they want, thus how specific the scoping needs to be, and how much they want to assume about a given token

We have requests that run the gamut from "every identifier should be scoped as a support.type" to "if it isn't exactly known, don't scope it".

Overall if 50%+ of tokens in a source file are the same color, does it matter if they are the foreground, or another color? Or in other words, if everything is special, is nothing special?


Unfortunately I don't have time at the moment to devote to getting this unstuck, but I am hoping to during the next dev cycle.

@mitranim
Copy link
Contributor

mitranim commented Nov 3, 2020

Of course. I'm all for compatibility. There isn't much to gain, and much to lose, by breaking the existing conventions. But I feel it would be useful to rebuild our mental model for this, figure out the consensus on how it "should" be in a vacuum, then see how existing syntaxes and color schemes can be nudged there with least blood.

@FichteFoll
Copy link
Collaborator

@mitranim this is a very good breakdown, thanks for that.

A type's role as a storage modifier is entirely unrelated to its role as a value or namespace.

By this you are referring to the type of a variable declaration, correct?

@mitranim
Copy link
Contributor

mitranim commented Nov 15, 2020

By this you are referring to the type of a variable declaration, correct?

Was referring to the difference in the role of SomeType between this:

var value: SomeType

func func_name(param: SomeType) {}

And this:

var value: AnyType = SomeType

SomeType::some_function()

mitranim pushed a commit to mitranim/Packages that referenced this issue Mar 25, 2022
* [Haskell] Rewrite operator matching

- Use variables
- Highlight '*' (and combinations) as operator
- Add punctuation scope to infix notation
- Scope non-infix notation as `keyword.operator`

* [Haskell] Update keyword matching

- Add proper scopes to control keywords
- Add proper scopes to declarations
- Use proper scope names for entities in declarations

* [Haskell] Restructure with contexts

* [Haskell] Remove usages of double quoted scalars

* [Haskell] Simplify string matches

Also highlight superfluous characters.

* [Haskell] Reduce max line length

* [Haskell] Match groups

* [Haskell] Adjust scopes for imports

Not final due to sublimehq#1842 being
unresolved, but still an improvement.

* [Haskell] Match lists

* [Haskell] Correctly match idents with trailing '

* [Haskell] More gracious infix operator matching

* [Haskell] Adjust keyword scopes to recent standards

* [Haskell] match OPTIONS_HADDOCK

Same as https://github.com/sublimehq/Packages/pull/2270/files

* [Haskell] match deriving (..) via (..)

Same as https://github.com/sublimehq/Packages/pull/2271/files

* [Haskell] match @ and # in keyword.operator.haskell

Same as
- https://github.com/sublimehq/Packages/pull/2272/files
- https://github.com/sublimehq/Packages/pull/2273/files


Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] match deriving instance (..)

* [Haskell] Match functions from the prelude

Based on https://github.com/atom-haskell/language-haskell/blob/e036e449909816e616b880157e2703e70fc9b5df/grammars/haskell.cson#L1306-L1307

Co-Authored-By: Nikos Baxevanis <nikos.baxevanis@gmail.com>

* [Haskell] Add tests for `via` derives

* [Haskell] match deriving instance (..) without breaking data deriving

This fixes a bug introduced via 0d36dd1

Co-authored-by: Nikos Baxevanis <nikos.baxevanis@gmail.com>
@deathaxe
Copy link
Collaborator Author

With another roundtrip looking for meaningful common name qualifiers, I came up with two solutions, which take into account predefined and user defined namespace variables.

Meaning Scope 1 Scope 2 Examples
Language predefined support.namespace variable.language.namespace global::, parent::, self::, self. this
User defined variable.namespace variable.other.namespace myns::ns::, myns.ns, myns\ns

C#

C# has a predefined global:: namespace among other user defined user:: ones.

C++

It may even make sense to scope variable this as variable.language.namespace or support.namespace.

PHP

PHP knows about special namespace variables for late bindings such as self::, parent:: and static:: as well as $this for object member access.

Python

Same applies to self., which is already scoped variable.language, but could be treated as predefined namespace variable.

Thoughts?

@53v3n3d4
Copy link

Hi,

I came here from an issue/suggestion that I opened, #3676.

After reading this RFC, like others said doing a generic meta language that could attend multiple languages and communities ideas seems complex. Some will like to highlight all functions same color others may want different depending on role. Or you should categorize as keyword or storage... performance and it is not about color syntax only.

I will speak more about my experience trying to color syntax to get like:

  • All classes/traits/structs same color
  • All functions same color
  • Trying to not highlight the first member of path

It is an approach that I see mostly on GitHub, docs... I tend to prefer it today. But it seems difficult to achieve unless customize syntax, mostly classes and first member path. Usually they are in more generic scopes or only possible to color whole path.

Default color schemes opt to approach different, which I respect and understand. Celeste is different, seems to use a random way to color somethings based on two defined colors.

I feel that if these ideas that you talked here could happen will help on this case. Maybe support more approach/ideas.

I saw python examples in these RFC and deathaxe initial post where he mention meta.generic-name. This scope make highlight Class only very difficult. Sorry if I am not expressing correct.

I am posting few examples that illustrate what I try to achieve in st but could not unless customize syntax.

class Foo

Foo()
from test import Foo, bar
import { Foo, bar } from './test.js'
process.stdout.write()
/// builder() is `meta.function meta.block`, while the others functions are `variable.function`
let req = Request::builder()
    .method(Method::POST)
    .uri(URL)
    .header(header::CONTENT_TYPE, "application/json")
    .body(POST_DATA.into())
    .unwrap();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants