Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for repeated XML elements without a name attribute #44608

Merged
merged 23 commits into from
Mar 10, 2021

Conversation

amoerie
Copy link
Contributor

@amoerie amoerie commented Nov 12, 2020

New PR because #43722 was out of date and CI was broken at the time. See that PR for initial review.

Fixes #36541
Fixes #36561

This pull request adds support in Microsoft.Extensions.Configuration.Xml for repeated XML elements without requiring a Name attribute.
This solves a particularly subtle bug when configuring Serilog from an XML configuration source. For a full description, see #36541

The original implementation of the XmlStreamConfigurationProvider has been modified to do the following:

Maintain a stack of encountered XML elements while traversing the XML source. This is needed to detect siblings.
When siblings are detected, automatically append an index to the generated configuration keys. This makes it work exactly the same as the JSON configuration provider with JSON arrays.
Tests are updated to reflect the new behavior:

Tests that verified an exception occurs when entering duplicate keys have been removed. Duplicate keys are supported now.
Add tests that verify duplicate keys result in the correct configuration keys, with all the lower/upper case variants and with support for the special "Name" attribute handling.
Thank you for your consideration.

This commit adds support in Microsoft.Extensions.Configuration.Xml for repeated XML elements without requiring a Name attribute.
This solves a particularly subtle bug when configuring Serilog from an XML configuration source. For a full description, see dotnet#36541

The original implementation of the XmlStreamConfigurationProvider has been modified to do the following:

- Maintain a stack of encountered XML elements while traversing the XML source. This is needed to detect siblings.
- When siblings are detected, automatically append an index to the generated configuration keys. This makes it work exactly the same as the JSON configuration provider with JSON arrays.

Tests are updated to reflect the new behavior:
- Tests that verified an exception occurs when entering duplicate keys have been removed. Duplicate keys are supported now.
- Add tests that verify duplicate keys result in the correct configuration keys, with all the lower/upper case variants and with support for the special "Name" attribute handling.
@ghost
Copy link

ghost commented Nov 12, 2020

Tagging subscribers to this area: @maryamariyan
See info in area-owners.md if you want to be subscribed.


Issue meta data
Issue content:
New PR because #43722 was out of date and CI was broken at the time. See that PR for initial review. 

Fixes #36541
Fixes #36561

This pull request adds support in Microsoft.Extensions.Configuration.Xml for repeated XML elements without requiring a Name attribute.
This solves a particularly subtle bug when configuring Serilog from an XML configuration source. For a full description, see #36541

The original implementation of the XmlStreamConfigurationProvider has been modified to do the following:

Maintain a stack of encountered XML elements while traversing the XML source. This is needed to detect siblings.
When siblings are detected, automatically append an index to the generated configuration keys. This makes it work exactly the same as the JSON configuration provider with JSON arrays.
Tests are updated to reflect the new behavior:

Tests that verified an exception occurs when entering duplicate keys have been removed. Duplicate keys are supported now.
Add tests that verify duplicate keys result in the correct configuration keys, with all the lower/upper case variants and with support for the special "Name" attribute handling.
Thank you for your consideration.

</td>
Issue author: amoerie
Assignees: -
Labels:
`area-Extensions-Configuration`

</td>
Milestone: -

@maryamariyan
Copy link
Member

@amoerie
Copy link
Contributor Author

amoerie commented Nov 13, 2020

@maryamariyan CI is green now. The tests that verify the configuration with duplicate keys in them are not compatible with the way the repeated elements work in XML configuration.

To be more specific, the 'DuplicatesTestConfig' provides the following test data:

public static TestSection DuplicatesTestConfig { get; }
                = new TestSection
                {
                    Values = new[]
                    {
                        ("Key1", (TestKeyValue)"Value1"),
                        ("Key1", (TestKeyValue)"Value1")
                    },
                    ...
                };

Later, this is verified:

protected virtual void AssertConfig(
            IConfigurationRoot config,
            bool expectNulls = false,
            string nullValue = null)
        {
            var value1 = expectNulls ? nullValue : "Value1";
            ...

            Assert.Equal(value1, config["Key1"], StringComparer.InvariantCultureIgnoreCase);
            ...
        }

But this fails, because "Key1" is not in the output. This is because the XML config provider will produce two entries, "Key1:0" and "Key1:1", both with value "Value1".
This is not a breaking change, because previously the XML provider simply threw an exception in this scenario.
However, it does mean we cannot simply reuse the duplicates tests from ConfigurationProviderTestBase, because the output does not match.

@amoerie
Copy link
Contributor Author

amoerie commented Dec 9, 2020

Any update on this?

@maryamariyan
Copy link
Member

Any update on this?

We'll be reviewing this week,
cc: @safern

The XML configuration provider now builds a simple in-memory model of the XML elements that it encounters
Each element keeps track of its attributes, children and content.
Furthermore, each element also has a reference to its siblings.
Each group of siblings all share the same list.

All of the above makes it possible to intelligently produce configuration keys and values, taking into account repeated XML elements.
@amoerie
Copy link
Contributor Author

amoerie commented Dec 18, 2020

@maryamariyan I took a look at your comments, and almost all of them stem from the same issue: the "Key" property was lazily computed in various places, because you cannot compute it once in the constructor. This is because the value of "IsMultiple" might change, because a new element is encountered during traversal that shares the same name. This triggers the new behavior where the "sibling index" is then added to the key.

To address your concerns, I slightly reworked the implementation, and I would be interested to hear your thoughts.

I've removed all "Key" and "Value" computation logic from the new classes, they are now extremely simple POCOs. Instead, all of the heavy lifting now happens in XmlStreamConfigurationProvider.

The "algorithm" now works as follows:

  1. Traverse the XML document using XmlReader. While traversing, build an object tree of XmlConfigurationElement. Each element has a reference to its attributes, text content and children. This traversal happens in XmlStreamConfigurationProvider.Read.
  2. While traversing, detect siblings. Siblings are elements that share the same parent and element name. (If a Name attribute is present, that must also be the same)
  3. After the XML document has been traversed, process the XmlConfigurationElement tree (starting from the root) and produce keys and values for the configuration dictionary. This happens in XmlStreamConfigurationProvider.ProvideConfiguration

@amoerie
Copy link
Contributor Author

amoerie commented Jan 22, 2021

@maryamariyan Any chance this PR could get another review?

@maryamariyan
Copy link
Member

maryamariyan commented Jan 22, 2021

Thanks for the update, I'll do a comprehensive review within next week,

@maryamariyan
Copy link
Member

Hey @amoerie, I haven't forgotten about this one, I'll make sure this keeps moving and totally appreciate how you kept the fix up to date.

@amoerie
Copy link
Contributor Author

amoerie commented Jan 30, 2021

Thanks! On the bright side, I guess I can say I've been contributing to Microsoft Open Source for 3 years already. 😉

@safern
Copy link
Member

safern commented Feb 24, 2021

@amoerie thanks for running the benchmarks. I don't think we have any benchmarks for Microsoft.Extensions.Configuration yet: https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/libraries

based on the results and your investigation, I think taking a 5-20% regression is not acceptable, specially since we could try and improve your current implementation with the info you got from the profiling.

GetLineInfo : previously, line info was only read when an error occurred. Now, we must call it for every element and store it, in case we need to throw an exception later. One might question if this line info is really such an important feature.

Is there a way to get the line info only if an exception is thrown? Removing it from the error might be breaking and I think having it in the error is valuable as it helps you know where the error in your configuration file is.

@amoerie
Copy link
Contributor Author

amoerie commented Feb 28, 2021

@safern I've pushed some optimizations, if you would find the time again to review that.

GetLineInfo : previously, line info was only read when an error occurred. Now, we must call it for every element and store it, in case we need to throw an exception later. One might question if this line info is really such an important feature.

Is there a way to get the line info only if an exception is thrown? Removing it from the error might be breaking and I think having it in the error is valuable as it helps you know where the error in your configuration file is.

Avoiding the line info reading would require completely revising the algorithm, which I'm hesitant to do right now. But I found that the most time was spent on SR.Format which uses the LineNumber and LinePosition, so now I'm delaying that string format as long as possible.

This is a summary of the improvements I've made:

  • Store LineNumber and LinePosition instead of LineInfo. This avoids a call to string.Format in most cases.
  • Use a helper class to build the configuration keys. This uses StringBuilder instead of string.Join internally, which is much faster.
  • Keep a dictionary of XML children, keyed by their "SiblingName" (= ElementName + Name). This allows for a much more efficient lookup of siblings under the current parent.
  • Do not create this dictionary for XML elements with a single child. This is quite common, and avoiding it in this scenario is more efficient in terms of memory and execution.
  • Use Dictionary.TryAdd instead of Dictionary.ContainsKey + Dictionary.Add if we can (.NET Standard 2.1)

All of this amounts to an implementation that is even slightly faster than the original one! 🎉

These are the performance results of these modifications:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.103
  [Host]     : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT
  DefaultJob : .NET Core 5.0.3 (CoreCLR 5.0.321.7212, CoreFX 5.0.321.7212), X64 RyuJIT

Method TestFile Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Before deep.xml 125.49 μs 2.507 μs 2.682 μs 1.00 0.00 14.6484 1.9531 - 92048 B
After deep.xml 130.67 μs 1.721 μs 1.609 μs 1.04 0.03 15.8691 1.9531 - 99800 B
Improved deep.xml 124.07 μs 2.414 μs 2.965 μs 0.99 0.03 14.4043 1.8311 - 90648 B
Before names.xml 185.49 μs 2.378 μs 2.442 μs 1.00 0.00 24.1699 4.6387 - 151912 B
After names.xml 231.85 μs 4.546 μs 7.962 μs 1.27 0.05 27.3438 6.1035 - 172088 B
Improved names.xml 181.91 μs 1.792 μs 1.399 μs 0.98 0.02 25.8789 5.6152 - 163224 B
Before repeated.xml NA NA NA ? ? - - - -
After repeated.xml 183.69 μs 2.673 μs 2.501 μs ? ? 22.9492 4.1504 - 144720 B
Improved repeated.xml 156.03 μs 2.841 μs 3.157 μs ? ? 20.7520 4.1504 - 131392 B
Before simple.xml 92.73 μs 1.156 μs 0.903 μs 1.00 0.00 11.5967 1.2207 - 72864 B
After simple.xml 111.59 μs 1.996 μs 1.770 μs 1.21 0.02 12.5732 1.3428 - 79096 B
Improved simple.xml 92.49 μs 1.412 μs 1.179 μs 1.00 0.02 11.7188 1.3428 - 74232 B

Benchmarks with issues:
XmlConfigurationBenchmarks.Before: DefaultJob [TestFile=repeated.xml]

Base automatically changed from master to main March 1, 2021 09:07
@safern
Copy link
Member

safern commented Mar 8, 2021

Thanks, @amoerie. I'm happy that you were able to find a way that it even improves the perf. Now that you wrote some benchmarks, can we add them to: https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/libraries ? That way our infra will catch if there are any regressions in the future.

I left a couple of comments, but overall looks good.

amoerie added 2 commits March 10, 2021 12:33
This avoids confusion with the ProcessXYZ helper methods that run during a different stage in the algorithm
@amoerie amoerie requested a review from safern March 10, 2021 12:10
@amoerie
Copy link
Contributor Author

amoerie commented Mar 10, 2021

Thanks, @amoerie. I'm happy that you were able to find a way that it even improves the perf. Now that you wrote some benchmarks, can we add them to: https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/libraries ? That way our infra will catch if there are any regressions in the future.

I left a couple of comments, but overall looks good.

Thanks, I'll see if I can add the benchmarks there.

@safern
Copy link
Member

safern commented Mar 10, 2021

Timeout is: #49309

@safern safern merged commit e0af8cf into dotnet:main Mar 10, 2021
@safern
Copy link
Member

safern commented Mar 11, 2021

@amoerie thanks for the effort to get this fix in 🎉

@amoerie
Copy link
Contributor Author

amoerie commented Mar 12, 2021

@safern

You're very welcome! It's been quite the experience, I learned quite a few things during this endeavor. 💪

Thank you (and @maryamariyan too) for taking time out of your schedules to review and discuss the work of a total stranger. From my own work experience, I know this is not trivial.

For me personally, I consider it a very big deal to have contributed to .NET itself. So thanks for making that possible. 👍

@amoerie amoerie deleted the repeatedelementsinxmlconfig branch March 12, 2021 09:31
@safern
Copy link
Member

safern commented Mar 12, 2021

For me personally, I consider it a very big deal to have contributed to .NET itself. So thanks for making that possible.

Anytime! Feel free to browse for more up-for-grabs issues and help us make .NET better and reach out for any help if needed, we really appreciate and like the community help!

@ghost ghost locked as resolved and limited conversation to collaborators Apr 11, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@tarekgh tarekgh added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Aug 10, 2022
@ghost ghost added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Aug 10, 2022
@eerhardt
Copy link
Member

I have created dotnet/docs#30660 to document this breaking change.

@eerhardt eerhardt removed the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Aug 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Extensions-Configuration breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.
Projects
None yet
8 participants