Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Disable some Calendars when using GlobalizationMode.Invariant. #38281

Closed
wants to merge 3 commits into from

Conversation

baulig
Copy link

@baulig baulig commented Jun 5, 2019

When using GlobalizationMode.Invariant, disable the Japanese, Taiwan and Hebrew calendars and make the code more linker friendly.

Since the linker does not currently support dead code elimination, it cannot break down any conditionals inside method bodies. One trick that we use to work around this is to move those conditional pieces into separate methods; then we can give the linker a list of those methods and tell it to replace their bodies with exceptions.

Martin Baulig added 3 commits June 5, 2019 17:05
Add `GlobalizationMode.Invariant` conditionals to some calendar code.

This allows the Japanese, Taiwan and Hebrew calendars to be disabled.

(cherry picked from commit f759dcc)
The goal is to allow the linker to remove some of the globalization code
that is conditional to `GlobalizationMode.Invariant`.

Since the linker does not currently support dead code elimination, it cannot
break down any conditionals inside method bodies.  One trick that we use to
work around this is to move those conditional pieces into separate methods;
then we can give the linker a list of those methods and tell it to replace
their bodies with exceptions.

In this particular case, we do not want to access the `JapaneseCalendar` and
`TaiwanCalendar` types from methods that could not (currently) be linked out.

The `DateTimeFormatInfo.PopulateSpecialTokenHashTable()` method does not
explicitly reference any of those, but it is quite big so we benefit from
having the non-invariant pieces in a separate method which can then be removed.
bool isHebrewCalendar = (cal.ID == CalendarId.HEBREW);
bool isJapaneseCalendar = (cal.ID == CalendarId.JAPAN);
bool isHebrewCalendar = !GlobalizationMode.Invariant && ((CalendarId)cal.ID == CalendarId.HEBREW);
bool isJapaneseCalendar = !GlobalizationMode.Invariant && ((CalendarId)cal.ID == CalendarId.JAPAN);
// This is a flag to indicate if we are formating hour/minute/second only.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When changing the definition of GlobalizationMode.Invariant into a constant

internal const bool Invariant = false;

this allows the C# compiler to eliminate the dead code.

@@ -2442,7 +2343,136 @@ internal TokenHashValue[] CreateTokenHashTable()
return (temp);
}

private void AddMonthNames(TokenHashValue[] temp, ReadOnlySpan<char> monthPostfix = default)
private void PopulateSpecialTokenHashTable(TokenHashValue[] temp, ref bool useDateSepAsIgnorableSymbol)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved into it's own method because it's quite large and we want the linker to remove it.

@@ -1160,7 +1169,21 @@ private static bool Lex(DS dps, ref __DTString str, ref DateTimeToken dtok, ref
return true;
}

private static bool VerifyValidPunctuation(ref __DTString str)
private static Calendar GetJapaneseCalendarDefaultInstance()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need to be split out into separate methods to avoid referencing the JapaneseCalendar / TaiwanCalendar classes from within a method body that cannot be linked out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth comments so nobody refactors them back in again someday?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth comments so nobody refactors them back in again someday?

Seems like this Linker feature ended up with the same design as dotnet/designs#42 (convert entire methods to a throw or stub them out), except the linking out behavior is defined far away from where the code is (and carries the risks Dan is pointing out since we're hardcoding CoreLib implementation details in the linker). The referenced design uses custom attributes because they're self-documenting and harder to miss during refactorings. It also opens this feature up to second/third party developers.

We talked about what we did in .NET Native here: dotnet/linker#486 (comment). Is there a reason why we're dismissing the custom attribute approach? The FX API committee didn't have objections to the new attribute in principle, but wanted to gather more data on the usefulness. So the attribute currently lives as a private thing here:

// Instructs IL linkers that the method body decorated with this attribute can be removed
// during publishing.
//
// By default, the body gets replaced by throwing code.
//
// UseNopBody can be set to suppress the throwing behavior, replacing the throw with
// a no-operation body.
[AttributeUsage(AttributeTargets.Method)]
internal class RemovableFeatureAttribute : Attribute
{
public bool UseNopBody;
public string FeatureSwitchName;
public RemovableFeatureAttribute(string featureSwitchName)
{
FeatureSwitchName = featureSwitchName;
}
}

The attribute is currently understood by .NET Native only and we have some APIs annotated with it.

We also discussed having a specialized exception type to be thrown when the code was removed: https://github.com/dotnet/corert/blob/697c70b9942f5b5c104f196b399e12dfb8a323bf/src/System.Private.CoreLib/src/System/Runtime/CompilerServices/FeatureRemovedException.cs#L7-L24

Copy link
Contributor

@marek-safar marek-safar Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth comments so nobody refactors them back in again someday?

That should be just a temporary solution, we will be improving linker to detect code involving GlobalizationMode.Invariant automatically (basically do constant propagation for it), so the method could be renamed and additional attributes are not necessary

{
LazyInitializer.EnsureInitialized (ref m_hebrewNumberParser, () => new MatchNumberDelegate(DateTimeParse.MatchHebrewDigits));
parseInfo.parseNumberDelegate = m_hebrewNumberParser;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not statically initialize this via .cctor when we don't need it (like for instance in invariant mode).

{
internal sealed partial class GlobalizationMode
{
internal static bool Invariant { get; } = false;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't exactly sure what to do about the tests; we added the GlobalizationMode.Invariant conditional to them in Mono, so we can test both modes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danmosemsft @stephentoub how do you test for such general properties? Do you have any fancy attribute which could run each test or suite in both modes or something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corefx\src\System.Globalization\tests\Invariant is the test with Invariant mode.

I don't think we need to have this new defined class.

Also, please move this PR to coreclr


In reply to: 290961607 [](ancestors = 290961607)

@baulig
Copy link
Author

baulig commented Jun 5, 2019

Upstreaming Mono PR's mono#270 and mono#298.

@stephentoub
Copy link
Member

stephentoub commented Jun 6, 2019

A few high-level concerns:

  • The product code being changed here is only compiled into Corelib; CI isn't testing it. Such changes should instead be made in the coreclr repo.

  • The PR is changing tests by adding in checks for GlobalizationMode.Invariant. We have the https://github.com/dotnet/corefx/tree/master/src/System.Globalization/tests/Invariant project for doing invariant tests; all other tests assume the default mode of culture data being available.

  • Why are we disabling various calendars when in Invariant mode? Is it a bug that we don't today (because data the calendars rely on isn't available without ICU), or is this changing the meaning of what Invariant is? Seems like the latter, in which case we should not be making such a change and tying it to that property. cc: @tarekgh.

  • Regarding "Since the linker does not currently support dead code elimination, it cannot break down any conditionals inside method bodies", isn't that something we should work to fix instead, rather than changing a whole bunch of code to work around the limitation?

cc: @jkotas

@stephentoub
Copy link
Member

Separately, @baulig, how did you identify this set of changes as something to change?

throw new PlatformNotSupportedException();
return JapaneseCalendar.GetDefaultInstance();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more references to these calendars in CultureInfo.GetCalendarInstanceRare. Do those need to be changed as well?

@stephentoub stephentoub added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 6, 2019
@tarekgh
Copy link
Member

tarekgh commented Jun 6, 2019

I admit I am not getting the full picture why exactly we are doing that. @stephentoub raised some valid points in the comment #38281 (comment)

What I am seeing is we are adding some conditions to the Invariant mode which really not needed. for example, the following condition while invariant mode is on, will be false because we'll never set the Hebrew calendar. so why we need to add extra checks here? would it add more perf cost even if it is small?

 bool isHebrewCalendar = !GlobalizationMode.Invariant && ((CalendarId)cal.ID == CalendarId.HEBREW);

Also, does the linker work done during the compile time or during the runtime?

As I mentioned I am not familiar with this work so I cannot judge this kind of changes. more detailed information would help here.

@@ -414,6 +416,10 @@ public Calendar Calendar
}
set
{
if (GlobalizationMode.Invariant)
{
throw new PlatformNotSupportedException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new PlatformNotSupportedException(); [](start = 20, length = 42)

I don't think this is valid. the user can still set the Gregorian calendar and this should be allowed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again was a size and dependency concern. If you allow any calendar to be set here, then the OptionalCalendars property will be accessed - which in turn calls CultureData.CalendarIds.

My primary goal was to break down those cross-dependencies as much as possible in Invariant mode - and eventually also reduce some of that CultureData code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't this condition written as

if (GlobalizationMode.Invariant && !(value is GregorianCalendar))

does this will still will not allow break the cross dependencies?

ch = Culture.TextInfo.ToLower(ch);
if (IsHebrewChar(ch) && TokenMask == TokenType.RegularTokenMask)
ch = this.Culture.TextInfo.ToLower(ch);
if (!GlobalizationMode.Invariant && IsHebrewChar(ch) && TokenMask == TokenType.RegularTokenMask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsHebrewChar(ch) [](start = 51, length = 17)

nit: I am not sure if we need to break this scenario. even with Invariant, we used to allow parse some Hebrew numbers even if we are not using Hebrew calendar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give you a little bit of background here: we did this in Mono as part of the Web Assembly work and the objection was to reduce the linked size of our mscorlib.dll as much as possible, moving all the globalization code that's not needed for US English into invariant mode.

I wrote a little tool that dumped IL size on a per-namespace as well as per-type basis and also looked at a general list of all the types still present in the generated code. With the objective of "every single kilobyte counts" I then went over that list to see what can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I am talking more about the scenario that is used to work with the Invariant mode and now will be broken. I think if we are parsing some date string which include some Hebrew numbers, I believe this used to succeed even with the Invariant mode as we are parsing the Hebrew numbers regardless if we are using the Hebrew calendar or not. What I am trying to say here, we should be explicit saying we don't have to support such scenarios with the Invariant mode if we go with this change.

}

// CJK suffix
InsertHash(temp, CJKYearSuff, TokenType.SEP_YearSuff, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CJKYearSuff [](start = 33, length = 11)

similar to my Hebrew numbers questions. we used to allow parsing some Japanese characters even with Invariant mode and not using Japanese calendar. Now we are disallowing that.

@baulig
Copy link
Author

baulig commented Jun 6, 2019

What I am seeing is we are adding some conditions to the Invariant mode which really not needed. for example, the following condition while invariant mode is on, will be false because we'll never set the Hebrew calendar. so why we need to add extra checks here? would it add more perf cost even if it is small?

 bool isHebrewCalendar = !GlobalizationMode.Invariant && ((CalendarId)cal.ID == CalendarId.HEBREW);

This is a little trick that I've been using while playing around with both the C# compiler as well as an enhanced linker module that DCE:

The second expression, (CalendarId)cal.ID == CalendarId.HEBREW checks the local variable cal, which is set from the parameter dtfi that's passed to this method.

Neither the C# compiler nor any Linker can know that in invariant mode, this will always be false.

Adding the globalization check, this will essentially resolve into

>  bool isHebrewCalendar = false && ((CalendarId)cal.ID == CalendarId.HEBREW);

when then resolves into

const bool isHebrewCalendar = false;

And then both the C# compiler as well as a DCE-enabled linker can remove all code that became unreachable because of it.

@stephentoub
Copy link
Member

And then both the C# compiler

The C# compiler could only do that if Invariant were a const.

@baulig
Copy link
Author

baulig commented Jun 6, 2019

  • Regarding "Since the linker does not currently support dead code elimination, it cannot break down any conditionals inside method bodies", isn't that something we should work to fix instead, rather than changing a whole bunch of code to work around the limitation?

@stephentoub @jkotas Before you guys invest any work in that direction, please talk to me first. I have already done quite a decent part of that work and would love to share it with you guys. It's currently in a private module on GitHub and splits methods into basic blocks, then performs basic flow analysis and dead code elimination, including dead variable removal. I also wrote some code rewriter and a few tests.

There had been some concerns about stability, performance and long-term maintainability and I was also in a hurry when writing that module, but I would love to see this work finished and reused by other people.

@baulig
Copy link
Author

baulig commented Jun 6, 2019

And then both the C# compiler

The C# compiler could only do that if Invariant were a const.

That's what I did as an experiment - and then I looked at the generated IL code and compared it with what my new linker module would generate.

The idea is essentially that the linker should "rewrite" that Invariant property into a const and then eliminate any dead code.

@baulig
Copy link
Author

baulig commented Jun 6, 2019

  • Why are we disabling various calendars when in Invariant mode? Is it a bug that we don't today (because data the calendars rely on isn't available without ICU), or is this changing the meaning of what Invariant is? Seems like the latter, in which case we should not be making such a change and tying it to that property.

We did this in Mono as part of our ongoing Web Assembly work to trim down the linked size of our mscorlib.dll as much as possible. Those three calendars, Japanese, Taiwan and Hebrew (as well as the Hebrew Numbers) are referenced by methods that are used by the DateTime class and always preserved when linking.

I explicitly looked at the IL code that was generated by the linker and all the types still present - and those were the only calendars still showing up. Then I looked at where those were referenced from and why the linker didn't remove them.

@jkotas
Copy link
Member

jkotas commented Jun 6, 2019

Before you guys invest any work in that direction,

@baulig The best way to avoid redundant efforts is to be transparent and make it easy to discover what is being worked on. Could you please make sure that there is issue in https://github.com/mono/linker that describes this specific linker feature and its current status.

@baulig
Copy link
Author

baulig commented Jun 6, 2019

Before you guys invest any work in that direction,

@baulig The best way to avoid redundant efforts is to be transparent and make it easy to discover what is being worked on. Could you please make sure that there is issue in https://github.com/mono/linker that describes this specific linker feature and its current status.

I already created a Meta-Issue there yesterday to track all the current and ongoing work and will update it to include some more details about my module: dotnet/linker#607.

@jkotas
Copy link
Member

jkotas commented Jun 6, 2019

dotnet/linker#607

The title of this issue is "Meta Issue: Linker Size Work". This is title is too broad and makes it hard to understand what it is about. The issues work best if they are about concrete piece of work. The github projects work best for grouping multiple issues together for project management purposes.

@jkotas
Copy link
Member

jkotas commented Jun 6, 2019

the scenario that is used to work with the Invariant mode and now will be broken

+1. This change is introducing breaking change in officially supported feature. We look carefully at impact of breaking changes. We do not make breaking changes lightly. This breaking change needs homework to evaluate the impact.

Also, we are close to 3.0 and have very limited runway to react to feedback. In any case, it is unlikely we will be able to take this breaking change in 3.0.

@baulig baulig closed this Jun 10, 2019
@karelz karelz added this to the 3.0 milestone Jul 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
* NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants