-
Notifications
You must be signed in to change notification settings - Fork 6k
WIP: Locale class: toLanguageTag(), parse() factory constructor, getters #6481
Conversation
Implements Unicode BCP47 Locale Identifiers with syntax checking and normalization. Provides iterator access to variants, does not yet provide access to extensions (other than via toLanguageTag()). Unit tests showing behaviour, including highlighting idiosyncratic behaviour.
This doesn't belong in Flutter, it belongs in all of Dart. For Flutter to be able to directly use the Dart-wide Locale class however, I believe a discussion on this pull request makes sense - irrespective of whether we include this in Flutter first, before migrating it to a new library, or whether we skip the Flutter inclusion and go straight for the new library. |
This makes Locale objects much heavier (multiple new pointers), which is a little concerning. It also adds numerous static RegExps onto the heap, which seems suboptimal at best. Can we keep the parsing logic in an optional package and only have this class be really simple? |
/// * The extensions map must contain only valid key/value pairs. "u" and "t" | ||
/// keys must be present, with an empty string as value, if there are any | ||
/// subtags for those singletons. | ||
Locale._internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hugo, I liked your idea of marking the (lang, country) constructor as "use with caution", and then provide a list of the most commonly used locales in a const fashion:
// en-US, guaranteed to be a valid Unicode BCP47 locale.
const Locale kEnUs = Locale._ourPrivateConstructor('en', 'US');
// ... and more ...
Then we could change the Flutter I18n examples to use the locales from that list rather than calling the somewhat "risky" const constructor which can't normalize well. That together with the usual way how code is cargo culted around would maybe already mitigate some of the ways in which invalid locales spring into existence.
How about this: class Locale {
const Locale(
this._languageCode, [
this._countryCode,
]) : assert(_languageCode != null),
this.scriptCode = null,
this.variants = null,
this.extensions = null;
const Locale.fromComponents({
String language,
String script,
String region,
this.variants,
this.extensions,
}) : assert(language != null),
assert(!(variants != null && variants.length == 0)),
assert(!(extensions != null && extensions.length == 0)),
_languageCode = language,
scriptCode = script,
_countryCode = region;
String get languageCode => _replaceDeprecatedLanguageSubtag(_languageCode);
final String _languageCode;
static String _replaceDeprecatedLanguageSubtag(String languageCode) {
//...
}
final String scriptCode;
String get countryCode => _replaceDeprecatedRegionSubtag(_countryCode);
final String _countryCode;
static String _replaceDeprecatedRegionSubtag(String regionCode) {
//...
}
final List<String> variants;
final List<String> extensions;
String toLanguageTag({ String separator = '-' }) {
final StringBuffer out = StringBuffer(languageCode);
if (scriptCode != null)
out.write('$separator$scriptCode');
if (_countryCode != null)
out.write('$separator$countryCode');
if (variants != null) {
for (String variant in variants)
out.write('$separator$variant');
}
if (extensions != null) {
for (String extension in extensions)
out.write('$separator$extension');
}
return out.toString();
}
@override
bool operator ==(dynamic other) {
if (identical(this, other))
return true;
if (runtimeType != other.runtimeType)
return false;
final Locale typedOther = other;
return languageCode == typedOther.languageCode
&& countryCode == typedOther.countryCode
&& scriptCode == typedOther.scriptCode
&& _listEquals<String>(variants, typedOther.variants)
&& _listEquals<String>(extensions, typedOther.extensions);
}
bool _listEquals<T>(List<T> a, List<T> b) {
if (a == null)
return b == null;
if (b == null || a.length != b.length)
return false;
for (int index = 0; index < a.length; index += 1) {
if (a[index] != b[index])
return false;
}
return true;
}
@override
int get hashCode => hashValues(languageCode, countryCode, scriptCode, hashList(variants), hashList(extensions));
@override
String toString() => toLanguageTag(separator: '_');
} |
I have reservations. See commit 761c842 for a direction I could propose taking it, though it conflicts with your design philosophy. My feedback/concerns regarding your suggestion: const Locale.fromComponents({
String language,
String script,
String region,
this.variants,
this.extensions,
}) : assert(language != null),
assert(!(variants != null && variants.length == 0)),
assert(!(extensions != null && extensions.length == 0)),
_languageCode = language,
scriptCode = script,
_countryCode = region; Another const-constructor gives me more headaches. Variants not being sorted means we have to do more work in toLanguageTag() to sort them, if we want to produce the canonical/normalized form. Other fields might be syntactically invalid, producing broken output in toLanguageTag(). Can we enforce correct normalizing to appropriate upper-case/lower-case form? I don't have enough understanding of why const construction is needed when parsing values I thought are returned from system calls? Are those not runtime constructed? (What are the compile-time/const needs exactly, could we provide a set of const-constructed constants in our library, so we can ensure they're well-formed?) final List<String> variants;
final List<String> extensions; I believe we don't want to expose directly modifiable contents, we do want Locale instances to be immutable. How do we protect against incorrect modifications of variants and extensions members? String toLanguageTag({ String separator = '-' }) {
if (variants != null) {
for (String variant in variants)
out.write('$separator$variant');
}
if (extensions != null) {
for (String extension in extensions)
out.write('$separator$extension');
}
return out.toString();
}
final List<String> extensions; Extensions have separators in them too - these should be changed to match the separator you're requesting via the parameter. If we're keeping things in a string rather than breaking down by key, I've started wondering if we could simply store the language tag as a whole and then grab substrings on demand. (Might even have a "const Locale.fromLanguageTag(String bcp47_locale_identifier)" const constructor then...) Storing offsets could help make field access faster, if we don't want to reparse every time someone wishes to access the region code. I'll steal your hashCode and operator==, thanks! |
We can remove the variants pointer by placing variants in the extensions map. Would this help? We could even place language, region and script in that map, to have only one map - although then the map will always be present. With only variants and extensions in the map, it can be null most of the time. What are your thoughts on this suggestion?
This would make me want to distinguish between trusted/normalized instances and untrusted, either by field indicating its state, or by having a second "normalized" class. If I should give up on the "normalized/trusted Locale instances" idea, then we could check validity and throw exceptions when producing interchange formats, e.g. conceptually |
The constructor would just have the contract that it must be given the data in normalized form, if the order matters. Whether the constructors are const or not doesn't really seem an important distinction. What matters is that the class is very cheap -- constructing it is merely allocation and assignment, comparing it is the minimum amount of work required, etc. Regarding the lists being mutable, I would just define the contract as being that the list shouldn't be modified. That's what we do with Flutter child lists and it seems fine. If it's really an issue, you could do: Iterable<String> get variants => _variants;
List<String> _variants; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be nit-picky, but ... you used RegExps. I can't not comment on those :)
Make other adjustments to have operator== and hashCode work correctly.
71adb46
to
640304d
Compare
@hugovdm what's the status of this PR? Should we close it? |
Identifiers with syntax checking and normalization.
and preserving them in toLanguageTag().
Unit tests show behaviour, including highlighting the idiosyncratic
behaviour coming from backward compatibility.