-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String canonicalization #985
Comments
The phrasing of "string canonicalization" has always been vague. In fact, the specification does not require canonicalization, it only requires that const a = "s";
const b = "s";
print(identical(a, b)); // must be true because `identical(a, b)` is a constant expression
var c = a;
print(identical(c, b)); // false. Not bound by the specification. We wouldn't want that, but the specification allows it because it doesn't talk about canonicalization of strings at all. If we read the specification as implicitly defining canonicalization of strings through their behavior wrt I believe the current VM behavior is that they canonicalize all constant string literals, and they canonicalize the value of constant string expressions in constant contexts. That is This does ensure that The JS compilation is limited by the underlying platform. A JS program cannot distinguish two strings with the same content in any way, so they effectively canonicalize all string values. We do not want the VM to do the same, so we will definitely need to allow more canonicalization than what the specification requires. We should probably specify the exact behavior that we require.
Then we can say that:
|
The API reference seems to show the behavior of https://api.dart.dev/stable/2.8.3/dart-core/String/operator_equals.html |
We have indeed specified both It's still better to specify canonicalization explicitly instead of implicitly. (There have been a lot of different ways to read that part of the spec over time). |
Neither? Why is this a priority right now? |
It's definitely not a priority. I think it probably came up because we just migrated the language/identity tests which have some tests around strings and |
Feb 2024: Note that the original example prints 'true' in many (perhaps all?) configurations including void main() {
var s1 = "ab";
var s2 = "a" "b"; // or `"a" + "b"`.
print(identical(s1, s2)); // 'true'.
} |
That used to work, but the VM still does not canonicalize It's not that |
The degree to which canonicalization occurs in Dart, in particular for strings, has always been somewhat unclear.
We have explicit rules requiring that the following forms of constant expressions must be evaluated to the same (canonical) object when evaluating any of the occurrences of such an expression:
#foo
or#+
) are canonicalized.<constObjectExpression>
(such asconst C(2)
) is canonicalized.The definition of
identical()
ensures that instances ofbool
,int
, anddouble
are canonicalized, in the sense that they are considered identical when they represent the same value (and it is not observable whether this happens because the representation is a tagged bit string, or becauseidentical
treats distinct boxed representations in a special way).Strings have traditionally been treated differently:
The specification of
identical
actually requires thatidentical(c1, c2)
must evaluate to true whenwhich implies that the behavior of
dart
is wrong, anddart2js
is correct:"ab"
is a constant expression of typeString
,"a" "b"
is a constant expression of typeString
, and the two strings are equal according tooperator ==
. (We haven't specified the behavior ofoperator ==
on strings, but we hardly want to make those two strings unequal).I believe that all non-string objects are canonicalized or not in a way which is well-defined: It is specified for the basic forms mentioned above that every constant expression of these forms has a canonicalized value, and constant expressions do not allow for creating composite objects from existing objects, canonicalized or not, other than constant collection literals (e.g., we can't have
const C(x)
). In particular, we do not have to specify any additional rules saying that the value of a constant variable is canonicalized.It is unfortunate that different tools behave differently, but it is also likely to be a serious breaking change to change the behavior of any of these tools. So do we wish to proceed and change the specification to make
identical()
implementation dependent on strings, or do we wish to choose a specific behavior and get that implemented?@munificent, @lrhn, @leafpetersen, WDYT?
The text was updated successfully, but these errors were encountered: