Fix feet inches #803

tmilnthorp · 2020-06-01T20:13:08Z

I think we need to fix #794 by constructing the Regex with the separators.

This now passes except for two tests:

ParseWithCultureUsingDotAsThousandSeparators_ThrowsExceptionOnInvalidString actually now passes (more forgiving regex parses successfully)
ParseLengthToMetersUsEnglish with 1e-3 km fails (regex doesn't handle engineering notation).

I feel it's closer, but we should handle engineering notation.

tmilnthorp · 2020-06-01T20:40:57Z

UnitsNet.Tests/CustomCode/LengthTests.FeetInches.cs

+            new object[]{"-1′1″", -1.08333333, new CultureInfo("en-US")},       // Without space
+            new object[]{"-1 ft 1 in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"-1ft 1in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"1’000′", 1000, new CultureInfo("de-CH")},             // Feet only, with seperator


new object[]{"1’000′", 1000, new CultureInfo("de-CH")}, // Feet only, with seperator new object[]{"1’000′ 6\"", 1000.5, new CultureInfo("de-CH")}, // Normal form, using separators for culture

Are the new tests with culture specific separators. The rest are all en-US format so being explicit.

@pgrawher also pointed out that in some Windows/Linux/Mac versions, the apostrophe symbol can vary for the same culture. But, unless we actually run into this issue on our own machines and the AppVeyor VMs, then I think a hard coded string is better. I really don't expect this character to change much over time for a given culture unless they are fixing a consistency bug in one of the OS versions, but I am merely speculating.

For the sake of reviewing this PR, it would be better if you did the refactoring to memberdata in a separate PR, but it helps to know that these were the only two new cases plus the new culture parameter.

Aren't you missing the problematic test case? Where the thousand separator character exactly matches the foot or inches unit? Requires a culture that has this exact character though, can construct our own if necessary.

new object[]{"1′000′", 1000, new CultureInfo("de-CH")},

Use new CultureInfo("...", false) everywhere. This makes sure that at least when developers use a similarly current version of Windows, the settings will be the same. (This ignores any manual overrides in control panel)

UnitsNet.Tests/CustomCode/LengthTests.FeetInches.cs

angularsen · 2020-06-01T21:29:56Z

UnitsNet.Tests/CustomCode/LengthTests.FeetInches.cs

+            new object[]{"-1′1″", -1.08333333, new CultureInfo("en-US")},       // Without space
+            new object[]{"-1 ft 1 in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"-1ft 1in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"1’000′", 1000, new CultureInfo("de-CH")},             // Feet only, with seperator


@pgrawher also pointed out that in some Windows/Linux/Mac versions, the apostrophe symbol can vary for the same culture. But, unless we actually run into this issue on our own machines and the AppVeyor VMs, then I think a hard coded string is better. I really don't expect this character to change much over time for a given culture unless they are fixing a consistency bug in one of the OS versions, but I am merely speculating.

angularsen · 2020-06-01T21:31:18Z

UnitsNet.Tests/CustomCode/LengthTests.FeetInches.cs

+            new object[]{"-1′1″", -1.08333333, new CultureInfo("en-US")},       // Without space
+            new object[]{"-1 ft 1 in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"-1ft 1in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"1’000′", 1000, new CultureInfo("de-CH")},             // Feet only, with seperator


For the sake of reviewing this PR, it would be better if you did the refactoring to memberdata in a separate PR, but it helps to know that these were the only two new cases plus the new culture parameter.

angularsen · 2020-06-01T21:33:48Z

UnitsNet/CustomCode/Quantities/Length.extra.cs

-            // Match entire string exactly
-            string pattern = $@"^(?<negativeSign>\-?)(?<feet>{footRegex})\s?(?<inches>{inchRegex})$";
+            var feetMatch = new Regex(footRegex, RegexOptions.Singleline).Match(str);
+            var inchesMatch = new Regex(inchRegex, RegexOptions.Singleline).Match(str);


Won't this allow invalid formats like inches first, feet after? 5" 1'
Or is this actually an acceptable form?

You're right, it wouldn't. I'll make sure to enforce ordering.

angularsen · 2020-06-01T21:39:56Z

UnitsNet.Tests/CustomCode/LengthTests.FeetInches.cs

+            new object[]{"-1′1″", -1.08333333, new CultureInfo("en-US")},       // Without space
+            new object[]{"-1 ft 1 in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"-1ft 1in", -1.08333333, new CultureInfo("en-US")},
+            new object[]{"1’000′", 1000, new CultureInfo("de-CH")},             // Feet only, with seperator


Aren't you missing the problematic test case? Where the thousand separator character exactly matches the foot or inches unit? Requires a culture that has this exact character though, can construct our own if necessary.

new object[]{"1′000′", 1000, new CultureInfo("de-CH")},

tmilnthorp · 2020-06-02T00:49:58Z

Indeed I did miss the problematic case!

I wonder if it would help simplify everything if we just search for the abbreviation ending, and just do a double.Parse or double.TryParse on the prefix.

In other words, parse the abbreviation suffix ourselves and let .NET parse everything before it. They've already done all the work - no sense in recreating the parsing regex.

angularsen · 2020-06-07T20:21:33Z

In other words, parse the abbreviation suffix ourselves and let .NET parse everything before it.

Isn't that how the regex is designed though? To only split the string on known abbreviations and use double.TryParse on whatever was in front of the abbreviation?

I may remember it wrong.

tmilnthorp · 2020-06-09T15:42:34Z

In other words, parse the abbreviation suffix ourselves and let .NET parse everything before it.

Isn't that how the regex is designed though? To only split the string on known abbreviations and use double.TryParse on whatever was in front of the abbreviation?

I may remember it wrong.

It is, kind of. It fails when there's an extra apostrophe as this defect shows :)

My regex knowledge fails me here. We want to take everything until the last abbreviation string at the end. Almost as if to reverse the string or do a maximum look-ahead of some sort.

angularsen · 2020-06-18T11:09:55Z

Are we simply overcomplicating this?
Do we really need to support 1'500' 2" ? Is this a realistic usecase?
Can we throw if there is more than two matches of each foot/inch abbreviation?

I'm sure we can get it working, but it just feels very complicated for something perhaps no one will ever benefit from. Maybe we can implement it properly if someone complains about this exception for scenarios like that?
At least we don't give out incorrect values, which is a whole lot worse.

In my feeble metric mind, I speculate these are the typical use cases:

Integers: 5' 2"
Fractions: 5 1/4"
Decimal point for either foot or inch, not both: 1.5' or 1.5", but I think fractions are more common?
Thousand separators for either foot or inch, not both: 1'500' or 1'500" (one thousand five hundred feet)

pgrawehr · 2020-06-18T13:38:56Z

I agree that we do not need the perfect solution here. I think it's ok to throw if something is ambiguous. But we should a) throw in a defined way and b) have all the unit tests pass regardless of the dev's culture settings. The fact that b) is currently not true was the reason I started all this.

tmilnthorp · 2020-06-18T14:32:53Z

I disagree, we should handle any culture (or grouping separator) gracefully.

We can't even round-trip right now for some cultures. The following throws a FormatException:

var feetInches = new FeetInches( 1234, 5 );

var culture = new CultureInfo( "de-CH" );

var deCHString = feetInches.ToString( culture );
var parsed = Length.Parse( deCHString, culture );

deCHString is: 1’234 ft 5 in

angularsen · 2020-06-18T15:04:27Z

I guess you make a fair point about the round-trip of ToString/Parse.

tmilnthorp mentioned this pull request Jun 1, 2020

Add tests for de-CH #799

Closed

tmilnthorp commented Jun 1, 2020

View reviewed changes

angularsen mentioned this pull request Jun 1, 2020

Fix TryParseFeetInches when current locale uses ' as number separator #794

Closed

angularsen reviewed Jun 1, 2020

View reviewed changes

tmilnthorp closed this Jun 19, 2020

tmilnthorp force-pushed the FixFeetInches branch from d6d779a to 381e181 Compare June 19, 2020 14:58

pgrawehr mentioned this pull request Jul 26, 2020

Culture-Dependent parsing of feet/inches #817

Open

Fix feet inches #803

Fix feet inches #803

Uh oh!

Conversation

tmilnthorp commented Jun 1, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmilnthorp commented Jun 2, 2020

Uh oh!

angularsen commented Jun 7, 2020

Uh oh!

tmilnthorp commented Jun 9, 2020

Uh oh!

angularsen commented Jun 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pgrawehr commented Jun 18, 2020

Uh oh!

tmilnthorp commented Jun 18, 2020

Uh oh!

angularsen commented Jun 18, 2020 via email

Uh oh!

Uh oh!

angularsen commented Jun 18, 2020 •

edited

Loading