Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Optimize regular numbers parse logics #990

Merged
merged 3 commits into from
Nov 10, 2024

Conversation

filzrev
Copy link
Contributor

@filzrev filzrev commented Oct 9, 2024

This PR intended to optimize performance for number deserialization by changing Parse to TryParse method.
It's discussed at #794

Additionally this PR fix double value deserialization problems that are found on UnitTests.

What's Changes

  • Replace existing regular number Parse logics with TryParse method.
  • Separate float/double parse logics for .NET Framework.
  • Add code path for .NET that parse text as double, then checks float value ranges.
  • Add related unit tests (UnquotedStringTypeDeserialization_RegularNumbers)

.NET Behavior differences
On .NET Framework. float.Parse((double.MaxValue + 1).ToString()) throw OverflowException.
But on .NET Core. It returns PosititiveInfinite().

These behavior differences are described at Single.Parse Method page.

In .NET Core 3.0 and later, values that are too large to represent are rounded to [PositiveInfinity] or
NegativeInfinity as required by the IEEE 754 specification.
In prior versions, including .NET Framework, parsing a value that was too large to represent resulted in failure.

On current implementation.
Values outside the range of the float type are converted to -∞/.
So I've added logics to check float value ranges.

@filzrev filzrev force-pushed the perf-optimize-parse-numbers branch from 432f828 to a926285 Compare October 9, 2024 03:55
@filzrev
Copy link
Contributor Author

filzrev commented Oct 9, 2024

Microbenchmark results

Optimized version run about 10x faster. and memory allocation is reduced about 50%.

Baseline

Method Job Runtime Mean Error StdDev Exceptions Gen0 Gen1 Allocated
Deserialize ShortRun-.NET 8.0 .NET 8.0 176.9 us 18.02 us 0.99 us 18.0000 7.3242 - 29.96 KB
Deserialize ShortRun-.NET Framework 4.7 .NET Framework 4.7 588.5 us 60.13 us 3.30 us 23.0000 9.7656 0.9766 43.63 KB

Optimized

Method Job Runtime Mean Error StdDev Exceptions Gen0 Gen1 Allocated
Deserialize ShortRun-.NET 8.0 .NET 8.0 20.88 us 3.234 us 0.177 us - 4.0283 - 16.45 KB
Deserialize ShortRun-.NET Framework 4.7 .NET Framework 4.7 52.78 us 9.504 us 0.521 us - 5.3711 0.1221 23.66 KB

Test Code

[MemoryDiagnoser]
[ExceptionDiagnoser]
[ShortRunJob(RuntimeMoniker.Net80)]
[ShortRunJob(RuntimeMoniker.Net47)]
public class AttemptUnknownTypeDeserializationBenchmarks
{
    private static readonly IDeserializer deserializer = new DeserializerBuilder().WithAttemptingUnquotedStringTypeDeserialization().Build();

    private static readonly string TestData =
        """
        byteMaxValue:   255
        shortMaxValue:  32767
        longMaxValue :  9223372036854775807
        ulongMaxValue:  18446744073709551615
        floatMaxValue:  3.4028235E+38
        doubleMaxValue: 1.7976931348623157E+308
        """;

    [Benchmark]
    public object? Deserialize()
    {
        return deserializer.Deserialize(TestData);
    }
}

@EdwardCooke
Copy link
Collaborator

I like performance fixes. And this is a pretty big one! If you can address the 2 comments then I'll merge this in and get it out.

@filzrev
Copy link
Contributor Author

filzrev commented Oct 10, 2024

Apart from this PR.
I've noticed another performance relating issue.

https://github.com/aaubry/YamlDotNet/blob/master/YamlDotNet/Serialization/NodeDeserializers/ScalarNodeDeserializer.cs#L383

Above regex checking regular number(integer or floating number).
But any text that contains digits match regex.
So unneeded Parse operations are called multiple times.

Is it able to change Regex to use more strict regex with ^ and $ ?
or is there any YAML specs that require text partial match?

@EdwardCooke
Copy link
Collaborator

You can add those to that regex without any issue I would think.

@filzrev
Copy link
Contributor Author

filzrev commented Oct 10, 2024

You can add those to that regex without any issue I would think.

Thanks for your response.
I'll try to create another PR after this PR is merged.

@filzrev
Copy link
Contributor Author

filzrev commented Oct 26, 2024

I've removed wrong comment in commit.

float.IsNormal is available .NET Core 2.1 and The behavior will be different from what I'm expecting.
https://learn.microsoft.com/en-us/dotnet/api/system.single.isnormal?view=net-8.0#remarks

@EdwardCooke EdwardCooke merged commit 9a977ef into aaubry:master Nov 10, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants