Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

WIP: Locale class: toLanguageTag(), parse() factory constructor, getters #6481

Closed
wants to merge 21 commits into from

Conversation

hugovdm
Copy link
Contributor

@hugovdm hugovdm commented Oct 9, 2018

  • Implements parsing and producing of Unicode BCP47 Locale
    Identifiers with syntax checking and normalization.
  • Adds scriptCode getter.
  • Provides iterator access to variants.
  • Does not yet provide access to extensions, other than parsing them
    and preserving them in toLanguageTag().

Unit tests show behaviour, including highlighting the idiosyncratic
behaviour coming from backward compatibility.

Implements Unicode BCP47 Locale Identifiers with syntax checking and
normalization. Provides iterator access to variants, does not yet
provide access to extensions (other than via toLanguageTag()).

Unit tests showing behaviour, including highlighting idiosyncratic
behaviour.
@hugovdm
Copy link
Contributor Author

hugovdm commented Oct 9, 2018

This doesn't belong in Flutter, it belongs in all of Dart. For Flutter to be able to directly use the Dart-wide Locale class however, I believe a discussion on this pull request makes sense - irrespective of whether we include this in Flutter first, before migrating it to a new library, or whether we skip the Flutter inclusion and go straight for the new library.

@hugovdm hugovdm changed the title Locale class: toLanguageTag(), parse() factory constructor, new getters WIP: Locale class: toLanguageTag(), parse() factory constructor, getters Oct 9, 2018
@Hixie
Copy link
Contributor

Hixie commented Oct 10, 2018

This makes Locale objects much heavier (multiple new pointers), which is a little concerning.

It also adds numerous static RegExps onto the heap, which seems suboptimal at best. Can we keep the parsing logic in an optional package and only have this class be really simple?

/// * The extensions map must contain only valid key/value pairs. "u" and "t"
/// keys must be present, with an empty string as value, if there are any
/// subtags for those singletons.
Locale._internal(
Copy link

@gnoack gnoack Oct 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hugo, I liked your idea of marking the (lang, country) constructor as "use with caution", and then provide a list of the most commonly used locales in a const fashion:

// en-US, guaranteed to be a valid Unicode BCP47 locale.
const Locale kEnUs = Locale._ourPrivateConstructor('en', 'US');
// ... and more ...

Then we could change the Flutter I18n examples to use the locales from that list rather than calling the somewhat "risky" const constructor which can't normalize well. That together with the usual way how code is cargo culted around would maybe already mitigate some of the ways in which invalid locales spring into existence.

@Hixie
Copy link
Contributor

Hixie commented Oct 10, 2018

How about this:

class Locale {
  const Locale(
    this._languageCode, [
    this._countryCode,
  ]) : assert(_languageCode != null),
       this.scriptCode = null,
       this.variants = null,
       this.extensions = null;

  const Locale.fromComponents({
    String language,
    String script,
    String region,
    this.variants,
    this.extensions,
  }) : assert(language != null),
       assert(!(variants != null && variants.length == 0)),
       assert(!(extensions != null && extensions.length == 0)),
       _languageCode = language,
       scriptCode = script,
       _countryCode = region;

  String get languageCode => _replaceDeprecatedLanguageSubtag(_languageCode);
  final String _languageCode;

  static String _replaceDeprecatedLanguageSubtag(String languageCode) {
    //...
  }

  final String scriptCode;

  String get countryCode => _replaceDeprecatedRegionSubtag(_countryCode);
  final String _countryCode;

  static String _replaceDeprecatedRegionSubtag(String regionCode) {
    //...
  }

  final List<String> variants;

  final List<String> extensions;

  String toLanguageTag({ String separator = '-' }) {
    final StringBuffer out = StringBuffer(languageCode);
    if (scriptCode != null)
      out.write('$separator$scriptCode');
    if (_countryCode != null)
      out.write('$separator$countryCode');
    if (variants != null) {
      for (String variant in variants)
        out.write('$separator$variant');
    }
    if (extensions != null) {
      for (String extension in extensions)
        out.write('$separator$extension');
    }
    return out.toString();
  }

  @override
  bool operator ==(dynamic other) {
    if (identical(this, other))
      return true;
    if (runtimeType != other.runtimeType)
      return false;
    final Locale typedOther = other;
    return languageCode == typedOther.languageCode
        && countryCode == typedOther.countryCode
        && scriptCode == typedOther.scriptCode
        && _listEquals<String>(variants, typedOther.variants)
        && _listEquals<String>(extensions, typedOther.extensions);
  }

  bool _listEquals<T>(List<T> a, List<T> b) {
    if (a == null)
      return b == null;
    if (b == null || a.length != b.length)
      return false;
    for (int index = 0; index < a.length; index += 1) {
      if (a[index] != b[index])
        return false;
    }
    return true;
  }

  @override
  int get hashCode => hashValues(languageCode, countryCode, scriptCode, hashList(variants), hashList(extensions));

  @override
  String toString() => toLanguageTag(separator: '_');
}

@hugovdm
Copy link
Contributor Author

hugovdm commented Oct 11, 2018

I have reservations. See commit 761c842 for a direction I could propose taking it, though it conflicts with your design philosophy. My feedback/concerns regarding your suggestion:

  const Locale.fromComponents({
    String language,
    String script,
    String region,
    this.variants,
    this.extensions,
  }) : assert(language != null),
       assert(!(variants != null && variants.length == 0)),
       assert(!(extensions != null && extensions.length == 0)),
       _languageCode = language,
       scriptCode = script,
       _countryCode = region;

Another const-constructor gives me more headaches. Variants not being sorted means we have to do more work in toLanguageTag() to sort them, if we want to produce the canonical/normalized form. Other fields might be syntactically invalid, producing broken output in toLanguageTag(). Can we enforce correct normalizing to appropriate upper-case/lower-case form?

I don't have enough understanding of why const construction is needed when parsing values I thought are returned from system calls? Are those not runtime constructed? (What are the compile-time/const needs exactly, could we provide a set of const-constructed constants in our library, so we can ensure they're well-formed?)

final List<String> variants;
final List<String> extensions;

I believe we don't want to expose directly modifiable contents, we do want Locale instances to be immutable. How do we protect against incorrect modifications of variants and extensions members?

String toLanguageTag({ String separator = '-' }) {
  if (variants != null) {
    for (String variant in variants)
      out.write('$separator$variant');
  }
  if (extensions != null) {
    for (String extension in extensions)
      out.write('$separator$extension');
    }
  return out.toString();
}

final List<String> extensions;

Extensions have separators in them too - these should be changed to match the separator you're requesting via the parameter.

If we're keeping things in a string rather than breaking down by key, I've started wondering if we could simply store the language tag as a whole and then grab substrings on demand. (Might even have a "const Locale.fromLanguageTag(String bcp47_locale_identifier)" const constructor then...) Storing offsets could help make field access faster, if we don't want to reparse every time someone wishes to access the region code.

I'll steal your hashCode and operator==, thanks!

@hugovdm
Copy link
Contributor Author

hugovdm commented Oct 11, 2018

This makes Locale objects much heavier (multiple new pointers), which is a little concerning.

We can remove the variants pointer by placing variants in the extensions map. Would this help? We could even place language, region and script in that map, to have only one map - although then the map will always be present. With only variants and extensions in the map, it can be null most of the time.

What are your thoughts on this suggestion?

It also adds numerous static RegExps onto the heap, which seems suboptimal at best. Can we keep the parsing logic in an optional package and only have this class be really simple?

This would make me want to distinguish between trusted/normalized instances and untrusted, either by field indicating its state, or by having a second "normalized" class.

If I should give up on the "normalized/trusted Locale instances" idea, then we could check validity and throw exceptions when producing interchange formats, e.g. conceptually toCanonicalLanguageTag(validate:true) could throw exceptions when validation fails.

@Hixie
Copy link
Contributor

Hixie commented Oct 11, 2018

The constructor would just have the contract that it must be given the data in normalized form, if the order matters.

Whether the constructors are const or not doesn't really seem an important distinction. What matters is that the class is very cheap -- constructing it is merely allocation and assignment, comparing it is the minimum amount of work required, etc.

Regarding the lists being mutable, I would just define the contract as being that the list shouldn't be modified. That's what we do with Flutter child lists and it seems fine. If it's really an issue, you could do:

  Iterable<String> get variants => _variants;
  List<String> _variants;

Copy link
Contributor

@lrhn lrhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be nit-picky, but ... you used RegExps. I can't not comment on those :)

@hugovdm hugovdm force-pushed the unicodebcp47 branch 3 times, most recently from 71adb46 to 640304d Compare November 5, 2018 22:11
@Hixie
Copy link
Contributor

Hixie commented Feb 11, 2019

@hugovdm what's the status of this PR? Should we close it?

@hugovdm hugovdm closed this Apr 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants