-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: improvements to raw strings #32590
Comments
Just found this as well: #24475 I did not see this one while making this proposal, my bad. But it does contain another interesting suggestion by @bcmills #24475 (comment)
|
I do not believe this is correct. Certainly for databases like MySQL and Postgress their quoting character is Ref: https://dev.mysql.com/doc/refman/8.0/en/string-literals.html Section 9.1.1 |
@davecheney these databases use single quotes for strings, yes. (Some allow Backticks are used by MySQL (and sqlite? any others?) if you need to quote identifiers: https://dev.mysql.com/doc/refman/8.0/en/identifiers.html (search for "backtick"). (Postgres uses |
I like @deanveloper's main proposal, but on reflection I like @ianlancetaylor's idea ( |
@cespare Thank you for the correction. I continue to assert that back ticks for SQL string construction is not a valid argument for a language change proposal -- users should not be constructing SQL query strings by hand. The fact that a. Alternative syntax like Gives weight to this position. |
@davecheney I gave other examples, such as JavaScript and Go, which are both languages that I can see being put into raw strings. JavaScript, of course because Go is used a lot for web backends, so, depending on the circumstance, it could make more sense to embed a small script into a raw string rather than create an entire separate file for it. Go may make sense for code generation purposes if you wish to detect a raw string in a file. Also, there is the possibility that someone may just need to encode a string that has many backticks in it. I still personally believe that SQL is a valid use-case for this. SQL Injection attacks are caused by people using string concatenation in SQL queries, not necessarily because they are crafting queries by hand. The
Double quotes are technically supported by MySQL, but only if ANSI_QUOTES mode is enabled (as shown by the page linked by @cespare). Enabling this mode also requires that string literals no longer use double quotes. Also, while backticks are not technically required in SQL languages, they are still common practice. If a DBA sends me a complex query to use, I don't want to go through the whole thing and remove each backtick, or need to put in a |
@davecheney, backticks are not in the SQL standard, they're database specific. For example MSSQL doesn't use backticks, but MySQL and Postgres do. With those DBs there's no alternative to using backticks sometimes. |
Could you please cite a source for this position. Thank you |
To clarify my position. If you’re using sql you should be using prepared statements. To the best of my knowledge when that is done the dB drivers take care of quoting for identifies and values. This is why I assert sql queries are not a good premise for this proposal. |
While this is not a direct source (I don't have one, but I also am not the one who claimed this fact so I think I get a bit of leeway): The lack of support for generics is a huge bummer for SQL Builders, because they cannot have type safety, which is really the only advantage they had over
This is not a problem for simple queries, however as queries get more complex, it gets exponentially worse.
Literals are common in SQL queries, in which case I need to worry about quoting. I don't want to replace every single one of my literals with a
There are definitely times where I want string literals embedded in my queries, and again as queries get more complex I definitely don't want to be moving string literals into |
Well, the way Go standard library ( Prepared or not, you still need to communicate your statement (aka query) to the database server, and this usually requires writing your own SQL literal string. Sure it'd be a big mistake to use string concatenation instead of binding the query params. Binding works for both usual and prepared statements. Btw, prepared statements have their own issues - for example they're very hard to manage with transaction based connection pooling (i.e. via pgbouncer) - so I tend to not use them that much. Performance wise, in many typical cases prepared statement slow down the overall system performance, rather than speed it up - so use them with caution and measure the effect. |
@MOZGIII i disagree with most of your advice, yes prepared statements have a cost, but a very clear benefit, and you don't need to quote the There must be a better use case for this proposal than SQL strings. |
@davecheney I've mentioned both in the original proposal and my first reply to you other possible use cases, but you never seem to address them. Even if they're relatively minor, there's just no good way to encode a multiline string that contains backticks. Is that really too much to ask, especially out of a general-purpose programming language? |
@deanveloper if we take SQL quoting out of the proposal, then the use cases you're suggesting are writing snippets of other languages in Go string literals? How common is this in the wild? |
@davecheney When variable binding is used, the actual variables and the query are typically passed as separate units to the database engine. I can assure that it is the case for postgres, but less advanced databases actually may do something different. But nonetheless, in postgres, the data packet to the database over the wire can be represented as the following tuple: Now, prepared statements. They can also use variable binding, but they don't have to, the same way as the regular, non-prepared statements. The problem with prepared statements is it's quite difficult to manage them on a per-session level, and nobody does that. The alternative is preparing the statement for every operation independently. This is slow, because it often causes multiple round trips. This may become a critical bottleneck, as it easily doubles the execution time in a well optimized system. As I said, variable binding is not only possible with the prepared statements. Under the hood it is implemented even for a regular SQL queries: https://github.com/jackc/pgx/blob/762e68533f0090ecb6bb1166d51966b326597ec7/query.go#L410-L453 Anyhow, let's get back to literals - I don't want this to be an SQL discussion. I would say that the lack of support to just copy and paste arbitrary string (SQL with quotes in particular) is a design flaw of the existing implementation. Especially while iterating on the whatever thing you need to represent as a literal - it's much easier to set the boundaries once rather than escaping the backticks on every iteration. |
@davecheney Embedding markdown is another example. I feel like pretty much every other language that has backticks as part of the syntax may cause issues if we attempt using it in the current raw string literals implementation. Those are bash, ruby, perl, TeX and many more. You never know all the ways the feature that's not in the language could've been used. Frankly, even for features that are in the language it's very hard to tell all possible uses. I think we should not accept "SQL is not a good example" as an argument. The rationale is a) the lack of a better example doesn't prove there's no problem, and b) SQL is not just a practical example, it is an actual pain point that I've encountered while solving my day-to-day tasks with Go. |
@davecheney I'm going to come at this from a different angle, actually. Regardless of if you want to use Go for code generation, which I personally do quite a bit, raw strings the way they are now are just poorly designed. Raw strings are designed to store large amounts of text, or text which contains special characters ( |
The more argument (and lack of agreement) I see on this matter, the more convinced I become that it's not worth doing anything here and we should just live with what we have. The present syntax for raw strings has the desirable properties (for a simple language such as Go) of being very easy to type, parse, read, explain, understand and remember. The only problem it has is not being able to include the back-tick character and opinions differ on how acute this problem actually is in the wild. All the proposed solutions (including those suggested by myself in #32190) either seriously compromise one or more of these properties and/or deal with the back-tick problem at the expense of creating a similar problem with some other (more rarely used) character. It's also worth remembering that, apart from the workaround mentioned by @deanveloper in his opening post, there are (at least) two other ways of dealing with the back-tick problem at the present time:
s := fmt.Sprintf(`This is a "raw" string including %cback-ticks%[1]c.`, '`')
fmt.Println(s)
// This is a "raw" string including `backticks`. Admittedly none of these solutions is exactly "nice" but, on balance, I think they are better than introducing some new (and probably contentious) syntax to deal with the back-tick problem which I don't think everyone regards as being particularly serious in any case. |
This has my support. I'm a voice for Unicode as an out of (ASCII) bounds signalling means, but admit that while it solves the easy case of quoting all normal situations, it cannot alone address recursive self quoting; for that we must have a case-specific delimiter, and identifier-backtick is a clean, simple, universal mechanism. It is trivial to parse so I dispute any pushback on inconvenience of implementation. In addition to the likely uses...
...it also allows an extreme case: generate a huge random integer (256 bits, say), encode that in identifier form (letter or underscore followed by (letter|underscore|digit)*), and use that to blind quote text without looking at it and knowing it will work. This may never come to pass, but I like that it could:
This is something you could rely on when, say, packaging files in strings, like a go generate tool that pastes a copy of a source file into that source file. |
My personal issue with this solution is that you then must apply transformations to the string before the code can compile. This is especially undesirable if you do not have syntax highlighting, which is a crutch that Go should not depend on. It is also annoying to write a program to apply the string transformation for you, as you cannot put the original string into a Go program. |
@deanveloper I write a lot of SQL and SQL system and I write a lot of Go. I'm neutral on the core of this proposal. However the motivation for a change is important. When building SQL strings, you may want to quote identifiers and quote text/time/(a few other cases) values. It is assumed that you will use value parameters where appropriate. Also of note, a prepared query and and a query that uses value parameters are not directly correlated in most system. Different database system support different ways to quote identifiers and values. SQL Server uses square brackets Let's take your SQL problem and address it first. For this problem, the solution you have will work, but is not one I would recommend. It is too ad-hoc for a system. Spend a half a day and we can make a better solution. Spend a little bit more time in discovery to find other motivating pain points. The example you gave just isn't that motivating to me. A better solution
There are lots of ways to solve the problem you presented. This would be one of them. I would argue it would do a better job of what you are wanting then what raw strings (current or improved) can give you. You could implement this as a runtime step or as a pre-compile step. I've also used text/template + some custom template functions to construct SQL. Constructing SQL is fine. Carefully including values in SQL text can be okay if done through a system (like above) to prevent silly mistakes. I don't care if you use parameters or not, I don't care if you prepare your SQL or not (modern SQL engines a prepare is useless). |
Again, SQL is not the only target of this. Raw strings as they are now, are poorly designed as illustrated in #32590 (comment). That comment is a better illustration of the problem that we currently face than all of this arguing on SQL Queries and how they should be done. |
@kardianos that, or there could've just been a string literal. It is a beautiful workaround for a problem that shouldn't exist in the first place. With this you can either do a compile-time codegen - in which case just having a The problem with the example above is that, while it solves one part of the problem - and that is allowing you to get a program that invokes the query you want - it doesn't address other parts - among those are the ability to author the same literal that the app will use. This is more important for literals talk than the ability to actually somehow make the string value that you want appear at runtime. Go already addresses this problem, it just does it poorly (we have the As a counter example of why we actually need to have good literals for raw strings: any byte array (even non-unicode) we can represent in a Base64 encoding, embed into the regular |
Spend a little bit more time in discovery to find other motivating pain points. |
@MOZGIII Any string literal by itself won't automatically escape value or identifiers, nor will it expand an array of columns nor will it escape and join together a list of strings or other values. I personally find these things important. You may not. |
@kardianos so, what you mean is your code does something valuable that's not directly correlated to representing literals, and as a bonus it solves the issues with Are we still in the "why?", or are we in the "how? already? If we're in the "how?" then I don't see the point of looking into the workarounds to the SQL use case - let's use that example to see what literals we can offer to solve the pain point. PS: just noticed markdown supports multiple backticks for verbatim substrings. This means we can play around with this idea easily! Example: (working)
Result:
Example: (not working)
I must say I like |
It was very narrow case where that sequence was explicitly required to be displayed, unlikely someone else stumbles upon this. That said - I've had way more cases where backticks and/or triple quotes been isufficient for embedding code in them. They're not a solution for a generic case, that's a fact, but I've already said everything I had to say about this above. |
So it appears that the only real difficulty is embedding source code in other languages and potentially in Go itself. I wonder how common this is in the wild? In most cases, I suspect that the amount of source code would be such that you'd want to load it in from a file. |
Almost every time I embed multi-line strings is to embed something that's code. 95%. I'd say it's a major use case. Very rarely I embed verbatim text paragraphs without typography (i.e. without html/markdown). And I have no idea what else can even be possibly considered not code. |
I think an issue with having two separate delimiters to represent the same thing is that people may get curious as to what the difference between backticks and triple-quotes/heredocs/etc, especially if they have dealt with languages like Bash where every kind of quote means a different thing. Finding answers online can also often be difficult, as people may often have trouble putting their question into words, and there are always many answers, even if the answer in the end is simply "they mean the same thing". So the huge advantage to |
Well, by source code, I meant code for general purpose programming languages as opposed to: SQL, XML, json, mark-up etc. But, if embedding code is as common as @MOZGIII says it is, then the simple solution I was proposing is not really good enough. FWIW, of the '100%' solutions I prefer the:
idea which I think @ianlancetaylor came up with. |
@deanveloper I don't really understand your point as ISTM that, no matter how you look at it, you're going to end up with two different types of delimiter for raw strings to solve the back-tick problem. If one accepts that the problem is worth solving at language level, there are basically two questions to answer:
As you will have gathered, I was hoping that 1. might be good enough but I'm now not so sure. |
@deanveloper Please stop repeating that something you disagree with is "poorly designed", you're trying to dress an opinion in words that imply more than it being just an opinion. Simple is great and I love the design of simple things, in computers and in real life objects.
[citation needed] I would argue that "large amounts of text" don't belong inside a Go source file. Personally, I have large SQL queries in separate files and use asset embedding techniques. @DeedleFake Disallowing empty raw strings is not realistic, they're out in the wild and justified well enough. For example:
|
"RAW" strings in Cuelang work by using N I'm having the issue that I would like to embed markdown strings in my Go program as strings that I could process and server (like a web version of docs / help flags). Perhaps there is a better asset embedding technique for Go programs that is recommended? Something that compresses it, embeds it, and then decompresses into mem when running? |
I find myself embedding markdown frequently in Go code and not being able to put a backquote into a backquoted string is a minor annoyance. I think this proposal is probably too complicated though. I think we should just copy what Python does and have triple quoted strings, both This easy to lex, backwards compatible, familiar to Python users, simple to explain and covers nearly all use cases without getting overcomplicated. |
I didn't see Lua's approach mentioned here, which I personally like. Lua lets you delimit a raw string with Things I like about this solution:
|
I did not think about the benefits as mentioned in #40393. To summarize that issue, named backtick literals would also allow for syntax highlighters to highlight the raw strings differently, which is especially useful for SQL/HTML strings. |
I have come to really appreciate Rust's raw string literals. I don't think we need to go as far as an arbitrary identifier as a delimiter. Such a change would warrant an addition to the type system to assign a name to forms of literals, which gets complicated. The design should be driven by the smallest addition to the current backtick literals that does not make lexing immensely more complicated. I think we can further simplify Rust's string literal syntax by excluding the I quite like the second option, and I'm in favor of using the unary
|
This proposal is also very important to write regexp multiline regexp (for matching and parsing the stdout/err of os.Cmd for example) |
I have written a proof-of-concept implementation for the
and produces a file like this:
The additions to |
I'd like to add my support for the "text block" style approach that @smasher164, @ncw, @alanfo and others have mentioned above. It seems the consensus of the problem we're really trying to solve here is "allow for inlined/pasted string literals without the programmer being required to reformat or otherwise modify the inline text". Text blocks are an elegant solution to this difficult parsing problem. They allow us to keep backwards compatibility with existing string literal options in a visually distinct way that is programmer friendly. This also seems reasonable to implement as smasher164 has shown. The most common implementation out there seems to be Personally I like the java/swift implementation the most (with the indenting being normalized by default). I don't believe that this particular conceptual wheel needs to be reinvented -- Go's solution does not need to be a highly custom or unique implementation. If it is necessary for some implementation reason that's fine, but I'm not convinced that it is. Regardless of my thoughts... How do we move this forward, one way or the other? =) |
@slycrel I should mention that my idea/implementation was based on the "delimiter-depth" idea from Rust. It was not based on the triple-quoted literals like in Python. I have no qualms about that idea either, I just wanted to mention that it's different. |
With the addition of |
Respectfully, Consider a medium-sized, predefined data structure that has many medium-sized strings (such as markdown and/or All I wanted to do is define a medium-size data structure in a |
@mpontillo Fair enough. There have been a number of different alternatives discussed in this issue, so I would suggest that if someone wanted to move this forward, they file a separate proposal with a specific language change in mind. |
The current source generation produces hex encoded backticks, resulting in invalid code. There seems to be no way to properly encode or escape them, as the final code is itself part of a backticked multi-lines string. Go does not allow nesting or escaping backticks. There's some proposals to address this issue: - golang/go#32190 - golang/go#32590 but look like nothing got accepted there yet. The current sources have been updated to use string concatenation instead of backticks, and the generator got updated to panic if it encounter any. This is not an ideal solution, as some sources may require the usage of backticks at some point, but it would probably require a deeper rework of the generation to fix it (maybe external files + go:embed ?)
Background
This proposal was branched off of #32190, which was a proposed HEREDOC syntax for Go. It was concluded that HEREDOC was not the correct syntax for Go to use, however the proposal did point out a large problem that Go currently has:
Problem
There is only one option to use raw strings, which is the backtick. The nature of how raw strings works means that raw strings themselves cannot contain backticks, meaning that the current workaround for including a backtick in a raw string is:
Raw strings are often used for storing large strings, such as strings containing other languages, or Go code itself. In many languages, the backtick has significant meaning. For instance:
SELECT * FROM `database`.`table`
fun `a method with spaces`() { ... }
let str = `HELLO ${name.toLocaleUpperCase()}`
Of course there are far more examples of languages where the backtick is a significant character in the language. This makes embedding these languages in Go very hard.
Proposed Solution
If there were a fixed number of ways to declare raw strings, the problem would, no matter what, arise that you would be unable to put Go code inside of Go code without some kind of need to transform the code. This means that there needs to be a variable way to create raw strings.
This proposal highlights one brought up here. It essentially improves on the current way to declare raw strings, allowing the following syntax:
Essentially, raw strings can be prefixed with a delimiter, and the string is then terminated with a backtick followed by the same delimeter.
Strings which are densely populated with words and backticks may make it hard to pick a word to use as the delimiter for the raw string, as the word may appear inside the string, which would end the string early and cause a syntax error. Allowing any identifier to be used as a delimiter would allow non-ascii characters to be used as well, meaning that in special cases, when it's really needed, one can use a non-ascii character as their delimiter.
Concerns
@jimmyfrasche #32190 (comment)
I don't like the idea of complicating the language. I do not work with the internals of the language, so I am unsure of the magnitude of complication to the lexer that this change would bring. If it is too much, I don't think that it would at all be worth it, and maybe one of the alternatives below would be a better fit.
@ianlancetaylor #32190 (comment)
I share this sentiment. My response to this here was that establishing a convention to use short, noticable identifiers (ie
RAW
,JS
,SQL
, etc) help with noticing where the string starts and ends. This could (possibly) be enforced bygolint
, but I'm not sure if that is a good idea or not.Other Alternatives brought up
In #32190, there were several other alternatives that tried to achieve the same goal:
Variable numbers of backticks
Essentially, you could start the raw string with a certain number of backticks, and it would have to end with the same number of backticks.
This solution still had problems though. Strings cannot start with an even number of backticks, because any even number of backticks could also be interpreted as an empty string, introducing ambiguities. It also causes developers a bit of fuss when trying to get it to work inside of markdown, as markdown uses multiple backticks in order to signify a block of code.
Also, the strings could not start or end with backticks, which would be an unfortunate consequence.
Variable number of backticks + no empty raw strings
This one is a breaking change, however I think it is my favorite solution out of all of the alternatives. It's the exact same as the previous one, but Go also introduces a breaking change to disallow empty raw strings. There is no need for raw strings to be used to represent an empty string, since the normal
""
can do that, and is much more preferable. The only code this would break is people who have used a raw string to define an empty string by doing something likex := ``
orfuncCall(``, ...)
. It may be good to do some research on if empty raw strings are ever used in real code.This solution still has the issue of being annoying to use with markdown's code fences. The argument was used that we shouldn't make language decisions based on other languages, however I personally do not like this argument. Sharing code is part of what a programmer does, and Markdown is a very widely used markup language that uses multiple backticks in a row to define a code fence. This feature may make it a bit difficult to share Go code over anything that uses Markdown (slack, github, discord, and other services).
Despite making it difficult to share code via markdown-enabled chats, it is still easy to share code via something like
gist.github.com
orplay.golang.org
. If my original proposal proves to not work very well (doesn't feel Go-like, too difficult to implement, etc) I would love for this solution to be accepted in place.Variable number of backticks + a quote
This proposal is actually pretty nice. It's similar to the previous proposal. Essentially, the starting is N backticks (N >= 2) followed by a quotation mark, and the ending delimiter is a quotation mark followed by the same number of backticks. Example:
This syntax is actually very nice in my opinion. It fixes the "odd-number-only" ambiguity from the previous example, as well as fixing the Markdown issue (as code fences must occur on their own line). It also fixes the "strings starting/ending with backticks" issue.
The only issue with this syntax is that it doesn't seem to work well with existing raw strings. I don't personally have data about how often this occurs, but I'd imagine that there are several times where raw strings are used to describe strings with quotes in them, making code like
x := `"this is a string"`
common. Newcomers to Go may see this and think that the`"
is the delimiter to the raw string, when in reality the`
is the delimiter and the"
is part of the string.However that critique may be a bit nitpicky. I do like this syntax a lot.
Choosing a symbol pair that nobody uses
This alternative stated that Go should add another symbol to use to declare raw strings in Go. For instance,
⇶
to start the string and⬱
to end the string. Go code is defined to be UTF8 so file formatting issues should not happen. Another proposed idea was≡
(U+2261 IDENTICAL TO
).This solution also has problems. What if our string has both backticks AND strange symbols (for instance if you were defining a list of mathematical symbols)? Or, what if you were trying to embed Go syntax inside of your strings? Also, the symbol is hard to type and not easy to find, so it may not be a good fit as a string delimiter.
Variable number of a special character
In #32590 (comment), another solution that I quite like was brought up, using a variable number of special characters. They propose using
^
, and then the delimiters for the string become^`
and`^
, where the number of^
symbols is variable. They also created an implementation of it here.For example:
Other languages
R"delim(string)delim"
r#"string"#
#"string"#
It's important that we have some kind of variable delimiter, as that way if the string we are embedding somehow contains it, it is easy to change the string's delimiter in order to avoid the issue.
The delimiter doesn't have to be an identifier like it is in this main proposal, it could also be varying the number of backticks like the one a few paragraphs up.
Conclusion
Raw strings in Go are often used to be able to copy-paste text to be used as strings, or to embed code from other languages (such as JS, SQL, or even Go) into Go. However, if that text contains backticks, we need some way to make sure that those backticks do not terminate the string early.
I believe that the way to do this is allowing an identifier to precede the string, and to make sure that the terminating backtick must be followed by the same identifier in order to terminate the string.
The text was updated successfully, but these errors were encountered: