Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: System.Diagnostics.CodeAnalysis.StringSyntaxAttribute #62505

Closed
stephentoub opened this issue Dec 7, 2021 · 70 comments
Closed
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics feature-request
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Dec 7, 2021

Background and motivation

VS provides nice colorization of regular expressions when passed to known methods on Regex, e.g.
image

It also supports putting a comment before any string to mark the language being used in that string:
image

But there's currently no way to define an arbitrary API and annotate a string parameter as to what language is expected, e.g. you don't get this colorization today for RegexGenerator, because VS hasn't been taught about this API specifically:
image

While these examples are about regex, more generally this applies to arbitrary domain-specific languages. If VS adds colorization for JSON, for example, it'd be nice if every API that accepts JSON could be annotated appropriately.

API Proposal

namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field | AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
    public sealed class StringLanguageAttribute : Attribute
    {
        public StringLanguageAttribute(string language);
        public string Language { get; }
    }
}

API Usage

public sealed class RegexGeneratorAttribute : Attribute
{
    public RegexGeneratorAttribute([StringLanguage("regex")] string pattern);
}

Alternative Designs

No response

Risks

No response

@stephentoub stephentoub added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 7, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 7, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Dec 7, 2021

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

VS provides nice colorization of regular expressions when passed to known methods on Regex, e.g.
image

It also supports putting a comment before any string to mark the language being used in that string:
image

But there's currently no way to define an arbitrary API and annotate a string parameter as to what language is expected, e.g. you don't get this colorization today for RegexGenerator, because VS hasn't been taught about this API specifically:
image

While these examples are about regex, more generally this applies to arbitrary domain-specific languages. If VS adds colorization for JSON, for example, it'd be nice if every API that accepts JSON could be annotated appropriately.

API Proposal

namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter)]
    public sealed class StringLanguageAttribute : Attribute
    {
        public StringLanguageAttribute(string language);
        public string Language { get; }
    }
}

API Usage

public sealed class RegexGeneratorAttribute : Attribute
{
    public RegexGeneratorAttribute([StringLanguage("regex")] string pattern);
}

Alternative Designs

No response

Risks

No response

Author: stephentoub
Assignees: -
Labels:

api-suggestion, area-System.Diagnostics, untriaged

Milestone: 7.0.0

@stephentoub
Copy link
Member Author

stephentoub commented Dec 7, 2021

@stephentoub
Copy link
Member Author

cc: @joperezr

@CyrusNajmabadi
Copy link
Member

FWIW, i would love this. Note: we're also leaving design space open with Raw String Literals to allow for a direct way in the language to do it. Specifically:

var x = """regex
        \G(\w+\s?\w*),?
        """;

Embedded languages being a first class concept would be great. We would also like to do dotnet/roslyn#42634 so that 3rd parties could then implement the LS support for a particular embedded language (instead of Roslyn being responsible for it).

@alexrp
Copy link
Contributor

alexrp commented Dec 8, 2021

AttributeTargets.Parameter

Though I can't think of any concrete use cases off-hand, my gut reaction is that Field and Property seem like reasonable additions since this is meant to be more general than just Regex.

@stephentoub
Copy link
Member Author

Though I can't think of any concrete use cases off-hand, my gut reaction is that Field and Property seem like reasonable additions since this is meant to be more general than just Regex.

Yeah, I had that initially, but it seemed like there would be various places where it'd be challenging to reflect that in a tool, e.g. passing such a field by ref would lose that information, and potentially ambiguities as to whether it should be applied, e.g. on a property, is that meant to impact assigning to the property and/or the code in the property returning a string. Combined with my not having scenarios in mind for it, I left it off. But, I'd leave this up to @CyrusNajmabadi and @jasonmalinowski and friends, if they expect it would be useful and feasible initially. It could always be added later.

@stephentoub stephentoub added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Dec 8, 2021
@stephentoub
Copy link
Member Author

stephentoub commented Dec 8, 2021

so that 3rd parties could then implement the LS support for a particular embedded language

Cool. We have APIs in runtime that accept regexes, XML, and JSON; I expect we'd want to annotate many of those with this, e.g.

public static object? Deserialize(string json, Type returnType, JsonSerializerOptions? options = null)

We'll just need to standardize on a set of names for at least the languages we use, although maybe we would just snap to something like what GitHub uses for markdown language names. Thankfully, in general, the naming choice is obvious.

@jasonmalinowski
Copy link
Member

potentially ambiguities as to whether it should be applied, e.g. on a property, is that meant to impact assigning to the property and/or the code in the property returning a string. Combined with my not having scenarios in mind for it, I left it off.

Allowing it to be set on a property/field so the IDE can directly use that for the assignment/initializer. I'm guessing there's an API out there somewhere like:

rules.Add(new Rule { RegEx = "..." });

And the IDE could process that too. I don't see a huge risk in allowing the attribute against field/property even if we don't immediately support it, in the IDE; there's absolutely ecosystem risk of not allowing it against field/property until "later" and the forcing libraries to have to wait until later .NET versions before they can target that, or they have to start #ifdefing more stuff to deal with downlevel.

@terrajobst
Copy link
Member

Should we have a place where we document known languages as const fields?

@stephentoub
Copy link
Member Author

stephentoub commented Dec 8, 2021

Should we have a place where we document known languages as const fields?

You mean like adding:

public sealed class StringLanguageAttribute
{
    ...
    public const string Regex = "regex";
    public const string Xml = "xml";
    public const string Json = "json";
}

?

@terrajobst
Copy link
Member

terrajobst commented Dec 9, 2021

Yes. Not sure I'd want them on this particular class (because the class name is quite lengthy and AFAIK the field wouldn't be in scope when applying the attribute, so users would have to qualify the field references).

@FiniteReality
Copy link

Yes. Not sure I'd want them on this particular class (because the class name is quite lengthy and AFAIK the field wouldn't be in scope when applying the attribute, so users would have to qualify the field references).

How about System.Diagnostics.CodeAnalysis.KnownStringLanguages? That way it's closely related to StringLanguageAttribute, but still separate.

@stephentoub
Copy link
Member Author

Not sure I'd want them on this particular class (because the class name is quite lengthy

I don't have a strong opinion here. But for the number of places this will be used, I personally would be fine with e.g. [StringLanguage(StringLanguageAttribute.Json)]. If we want to spend another type on it and can make it much shorter and still discoverable, that's fine, too.

@CyrusNajmabadi
Copy link
Member

I agree with @jasonmalinowski here. Let the attribute appear in these places. Ide can then add support as necessary.

@terrajobst
Copy link
Member

terrajobst commented Dec 14, 2021

Video

  • We think we should rename "language" to "syntax" to avoid any connotations to localization
  • We should only expose the language constants as we're adding support for them
namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter |
                    AttributeTargets.Field |
                    AttributeTargets.Property,
                    AllowMultiple = false,
                    Inherited = false)]
    public sealed class StringSyntaxAttribute : Attribute
    {
        public StringSyntaxAttribute(string syntax);
        public string Syntax { get; }

        public const string Regex = "regex";

        // As we're adding more support we can add new languages like:
        // public const string Xml = "xml";
        // public const string Json = "json";
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Dec 14, 2021
@deeprobin
Copy link
Contributor

deeprobin commented Dec 19, 2021

@stephentoub Would like to get involved with this. Feel free to assign it to me.

@deeprobin
Copy link
Contributor

In my opinion, we should add a code analyzer for it, which gives a warning if you pass an unsupported language.

@stephentoub
Copy link
Member Author

In my opinion, we should add a code analyzer for it, which gives a warning if you pass an unsupported language

How do you define "unsupported"? VS isn't the only possible consumer, nor are we the arbiters for all possible languages. Someone should be able to write a library that accepts strings for any language they like and author tooling components to recognize those names.

@deeprobin
Copy link
Contributor

In my opinion, we should add a code analyzer for it, which gives a warning if you pass an unsupported language

How do you define "unsupported"? VS isn't the only possible consumer, nor are we the arbiters for all possible languages. Someone should be able to write a library that accepts strings for any language they like and author tooling components to recognize those names.

You are right :). I assumed that only mentioned languages should be supported.

@stephentoub
Copy link
Member Author

@bartonjs, opinions on the additional members?
#62505 (comment)

@bartonjs
Copy link
Member

No major opinions. Public arrays are always a fun "is it shared, or is it yours?", but it's an attribute, so it's probably not going to be passed around.

The lack of static typing will mean that anything touching an arguments-based syntax highlighter needs to be prepared for garbage... but they're generally dealing with probably-not-compilable code as input, so hopefully they're nice and defensive. (Documentation will, of course, be a pain)

@stephentoub
Copy link
Member Author

#62995

@stephentoub stephentoub changed the title [API Proposal]: System.Diagnostics.CodeAnalysis.StringLanguageAttribute [API Proposal]: System.Diagnostics.CodeAnalysis.StringSyntaxAttribute Jan 20, 2022
@madelson
Copy link
Contributor

Will/can we add a constant for sql? I realize that there are multiple SQL dialects out there, but there is enough commonality to make IDE syntax highlighting helpful. This would be great on DbCommand.CommandText as well as for libraries like Dapper.

@dotMorten
Copy link

@stephentoub Thanks for the demo about this today. You said it is available in VS17.2, and I'm trying with preview 1, and not seeing it working. Is it supposed to work and I'm doing something wrong, or not in the first preview? (I included the attribute code as you mentioned would still work so I get it in versions prior to .NET7)
image

@dotMorten
Copy link

D'oh! I just made the argument for why the "Json" text should be a constant, because I spelled it all lower-case. Works once properly cased

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 17, 2022

Will/can we add a constant for sql? I realize that there are multiple SQL dialects out there, but there is enough commonality to make IDE syntax highlighting helpful. This would be great on DbCommand.CommandText as well as for libraries like Dapper.

We have no plan on this currently. Primarily because there's no real way to estimate the cost of this work currently. For one thing, we have no tech available to us to even suitable lex/parse even one dialect of sql, let alone the myriad dialects actually used in practice. We'd need a strong proposal on how to actually achieve this.

Json has hte benefit of being a staggeringly simple language to add support for. And for regex we have hte canonical impl in teh runtime that we were able to ape in order to add this support.

@madelson
Copy link
Contributor

we have no tech available to us to even suitable lex/parse even one dialect of sql, let alone the myriad dialects actually used in practice

@CyrusNajmabadi is it a requirement that there be full lexing/parsing capabilities just to annotate this? Even very trivial support in an IDE (think highlighting common keywords like SELECT, WHERE, and JOIN plus operators) would make blocks of SQL text a lot easier on the eyes.

@terrajobst
Copy link
Member

terrajobst commented Feb 18, 2022

@stephentoub @bartonjs did we add any other constants for well-known languages? The approved shape only had one for Regex.

Here is my understanding of the final shape:

namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter |
                    AttributeTargets.Field |
                    AttributeTargets.Property,
                    AllowMultiple = false,
                    Inherited = false)]
    public sealed class StringSyntaxAttribute : Attribute
    {
        public StringSyntaxAttribute(string syntax);
        public StringSyntaxAttribute(string syntax, params object[] arguments);

        public string Syntax { get; }
        public object[] Arguments { get; }

        public const string Regex = "regex";

        // As we're adding more support we can add new languages like:
        // public const string Xml = "xml";
        // public const string Json = "json";
    }
}

@dotMorten
Copy link

dotMorten commented Feb 18, 2022

The string is Json (Upper case J). Lower case like above doesn’t work (at least in preview 1) and threw me off for a while.

/// <summary>The syntax identifier for strings containing date and time format specifiers.</summary>
public const string DateTimeFormat = nameof(DateTimeFormat);
/// <summary>The syntax identifier for strings containing JavaScript Object Notation (JSON).</summary>
public const string Json = nameof(Json);
/// <summary>The syntax identifier for strings containing regular expressions.</summary>
public const string Regex = nameof(Regex);

@terrajobst
Copy link
Member

terrajobst commented Feb 18, 2022

@madelson I'm not opposed to adding annotations for languages that Roslyn has no plans to support; this would allow other tools such as Rider to add support. However, I think we'll want at least one party that does something useful with it to ensure the annotations are sensible. For example, if you only ever want syntax highlighting, then sure, "SQL" might be good enough with a large dictionary of the various keywords used in SQL. However, if an IDE wants to do something more, such as code completion, then the string "SQL" might no longer be good enough. In the context of the BCL, "regex" basically means System.Text.RegularExpression, so that's in practice sufficiently unique. However, in .NET today there are various SQL dialects in use (MS SQL Server, Oracle, Postgress, etc). I can be convinced that this isn't a problem, but I'd like to hear this from an implementing party, not from a consumer because we'd likely make faulty assumptions.

@stephentoub
Copy link
Member Author

stephentoub commented Feb 18, 2022

did we add any other constants for well-known languages? The approved shape only had one for Regex.

We added Json, which is in the shown final shape as being fine to add when there's support, and we added DateTimeFormat, as you proposed and was discussed here #62505 (comment).

@jnyrup
Copy link
Contributor

jnyrup commented Feb 18, 2022

@stephentoub in the Community Standup you mentioned keeping StringSyntaxAttribute internal if we implement it for codebases not targeting .NET 7.

For a multi-targeting public API (think nuget package) the best workaround I can think of without risking assembly conflicts is

public class Class2
{
    public void MyMethod([StringSyntax("Regex")] string pattern) { }
}

#if !NET7_0_OR_GREATER
internal sealed class StringSyntaxAttribute : Attribute { ... }
#endif

Is there a way to expose StringSyntaxAttribute to older TFMs without assembly conflicts or is this a technical limitation?

@stephentoub
Copy link
Member Author

For a multi-targeting public API (think nuget package) the best workaround I can think of without risking assembly conflicts is

What's wrong with that approach? I don't think that's so much a workaround as it is the recommended course of action if you need to use the attribute downlevel.

@dotMorten
Copy link

Making it internal works beautifully for me in my multi targeted project

@jnyrup
Copy link
Contributor

jnyrup commented Feb 18, 2022

Sorry, thought I had tested this and thought the attribute had to be public to take effect.
It works brilliantly - this is awesome!

@lsoft
Copy link

lsoft commented Feb 19, 2022

sorry for dumb question: I ask about SQL. If I know which dialect we are using in our project, is there any way to colorize SQL keywords? I mean parse SQL not by VS itself, but I already have a piece of code that can grab SQL from the code (via Roslyn) + can parse it to the tokens, so I have an information about tokens and theoretically can provide this info to VS visualization layer somehow. is it feasible now?

@madelson
Copy link
Contributor

madelson commented Feb 19, 2022

@terrajobst I hear what you're saying about there being different dialects, but in many if not most cases SQL strings are being passed to a more generic SQL library (DbCommand.CommandText, EF, Dapper, etc) so even if the IDE had the ability to handle specific flavors of SQL in most cases it would probably have to fall back to lowest common denominator anyway.

Waiting to get buy-in from an implementer makes sense; just offering that as a consumer this would be valuable; definitely more so for my use-cases than JSON or date formats. I also suppose that there doesn't need to be an official constant here so long as the library authors and IDEs agree on a value to use for this.

@roji
Copy link
Member

roji commented Mar 3, 2022

For SQL, see this comment: #65634 (comment)

[...] even if the IDE had the ability to handle specific flavors of SQL in most cases it would probably have to fall back to lowest common denominator anyway.

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted. The moment you e.g. want to get only X rows, SQL Server has TOP, PG/SQLite have LIMIT/OFFSET, etc. IMHO for anything to be useful around SQL, there needs to be some setting somewhere that says what the dialect is, which the analyzer or IDE would pick up.

@CyrusNajmabadi
Copy link
Member

We are making this extensible at the Roslyn layer. But it will be up to a different party entirely to provide any level of SQL support.

@madelson
Copy link
Contributor

madelson commented Mar 4, 2022

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted.

I imagine that if VS/VSCode/Rider were to implement this then it would start with just the base keywords (SELECT, JOIN, WHERE, etc) and maybe some operators. I've worked with SQL in editors with this behavior and a little syntax highlighting goes a long way towards making the code easier to read.

IDEs can of course go beyond this by offering configurable dialect highlighting like JetBrains does.

@deeprobin
Copy link
Contributor

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted.

I imagine that if VS/VSCode/Rider were to implement this then it would start with just the base keywords (SELECT, JOIN, WHERE, etc) and maybe some operators. I've worked with SQL in editors with this behavior and a little syntax highlighting goes a long way towards making the code easier to read.

IDEs can of course go beyond this by offering configurable dialect highlighting like JetBrains does.

We could indeed support only ANSI/ISO SQL (ISO/IEC 9075-1) by means of the identifier sql and then build on it in the long run with sql+mysql, sql+mssql/sql+tsql (maybe with version identifier? - sql+mysql8.0)...
That would be my approach now, anyway. Feel free to leave feedback.

@madelson
Copy link
Contributor

madelson commented Mar 5, 2022

build on it in the long run with sql+mysql, sql+mssql/sql+tsql...

As mentioned previously, likely most applications of this would be SQL dialect agnostic members like DbCommand.CommandText, parameters to various methods in Dapper, EF, etc. The value is in the enhanced readability of SQL blocks in an IDE as opposed to perfect parsing. Many text editors/IDEs/wikis/etc offer this sort of agnostic SQL syntax highlighting indicating that it is indeed useful despite the differences in SQL dialects.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Diagnostics feature-request
Projects
None yet
Development

No branches or pull requests