Optimizations to "TokenParser" #964

fzbm · 2024-01-16T10:39:35Z

This Pull Requests makes changes to the TokenParser to optimize the performance when working with larger templates. There is the companion ticket #963 which also contains some questions, which might result in additional commits once clarified. The Clone topic is one of those.

As described in the ticket the performance of PnP was really bad when deploying a larger template which also contained a larger amount of localization resources. It took several minutes to "initialize the engine" (aka the TokenParser) and also up to minutes when deploying a field, content type or list, but the problem has been already described in greater detail in the ticket #963.

With the help of a profiler I was able to pinpoint the source of the problem to the TokenParser, which really taxed the CPU when processing the localization resources and also when parsing a string (ParseString) due to repeatedly iterating through all TokenDefinition, their tokens and processing them with Regex.Unescape. I made some changes to the initialization of the localization resource tokens and the parsing of strings, which greatly reduced the initialization time of the TokenParser and also the overall CPU utilization when working with ParseString.

The following main changes have been made.

The token cache now only gets created when using Rebase, the private constructor (for cloning) and creating the instance.
Added a cache for non-cacheable TokenDefinition containing the actual TokenDefinition instead of the value. This has been added to eliminate the need of creating a dictionary with all non-cacheable definitions most of the times when ParseString gets called.
The normal and non-cacheable defintions cache gets updated by just processing the affected TokenDefinition when calling AddToken or RebuildListTokens. In most cases the cache dictionaries are initialized with a calculated capacity to avoid resizing operations.
Changed AddResourceTokens to use a dictionary instead of a list to avoid costly lookup operations when adding LocalizationToken. Additionally the capacity of the _tokens list gets extended to a calculated amount beforehand to avoid resizing operations.
GetListTitleForMainLanguage and _listsTitles are now instance members. This fixes issues with environments where multiple threads might deploy a template at the same time.
- GetListTitleForMainLanguage now also uses a batch approach for retrieving the titles of the lists in order to reduce the time and amount of server calls required to retrieve them.
Removed the sorting of _tokens when adding a token or creating a TokenParser instance. It looks like sorting the list is not/no longer required, but might be added back later based on the discussion in ticket Question regarding "TokenParser" because of pending optimizations #963.

The following additional changes have been made. The first two are not part of any of the commit messages due to an oversight on my side.

ResourceEntry
- Changed the type of the LCID property from uint to int.
LocalizationToken
- Changed LocalizationToken to use a Dictionary instead of a List to speed up the lookup performance when trying to retrieve a ResourceEntry for a specific language.
  - The ResourceEntries property is now also an IReadOnlyList instead of a List. A caller should not be able to at least add or remove a ResourceEntry on an LocalizationToken instance. He would still be able to manipulate the ResourceEntry itself though. The change should not have any impact on existing applications, because the whole class is internal and not available outside of the library.
TokenParser
- Formatted some code to enhance the readability.
- Replaced some LINQ methods with native methods provided by the specific collection type.
- Changed the code of some methods to exit them early, which reduces the nesting.
- Removed unused parameters from private methods.
- Removed some commented code.
- Moved all variable declarations to the top of the class.

The extension method "PendingRequestCount" now uses a cached "PropertyInfo" to access the "Actions" property of a "ClientContext". Additional changes: * Changed the code of "PendingRequestCount" to reduce nesting. * Removed unnecessary initializations. * Removed unnecessary references to the class when accessing class-level members. * Fixed naming of static members.

"ProvisionObjects" of "ObjectSiteSecurity" now uses "OfType" instead of a combination of "Where" and "Cast".

The following changes to "TokenDefinition" have been made to optimize the general performance when working with those. * The maximum token length now gets determined once when creating the "TokenDefinition" instance. Tokens can not be changed and therefore determining the maximum length dynamically when calling "GetTokenLength" is not necessary. Also switched from LINQ to a "Math.Max" approach. * A list of by "Regex.Unescaped" processed tokens gets created when creating the "TokenDefinition" instance. This list is also exposed by the new "GetUnescapedTokens" method. * Added a property to return the amount of tokens. * Removed the remaining "this" references, which aligns the affected code with the rest of the class.

The following changes to "SimpleTokenDefinition" have been made to optimize the general performance when working with those. * The maximum token length now gets determined once when creating the "SimpleTokenDefinition" instance. Tokens can not be changed and therefore determining the maximum length dynamically when calling "GetTokenLength" is not necessary. Also switched from LINQ to a "Math.Max" approach. * A list of by "Regex.Unescaped" processed tokens gets created when creating the "SimpleTokenDefinition" instance. This list is also exposed by the new "GetUnescapedTokens" method. * Added a property to return the amount of tokens. * Removed the remaining "this" references, which aligns the affected code with the rest of the class.

The "TokenParser" had some performance issues and required a rework of some parts of the code. The main changes are: * The token cache now only gets created when using "Rebase", the private constructor (for cloning) and creating the instance. * Added a cache for non-cacheable "TokenDefinition" containing the actual "TokenDefinition" instead of the value. This has been added to eliminate the need of creating a dictionary with all non-cacheable definitions most of the times "ParseString" gets called. * The normal and non-cacheable defintions cache gets updated by just processing the affected "TokenDefinition" when calling "AddToken" or "RebuildListTokens". In most cases the cache dictionaries are initialized with a calculated capacity to avoid resizing operations. * Changed "AddResourceTokens" to use a dictionary instead of a list to avoid costly lookup operations when adding "LocalizationToken". Additionally the capacity of the "_tokens" list gets extended to a calculated amount beforehand to avoid resizing operations. * "GetListTitleForMainLanguage" and "_listsTitles" are now instance members. This fixes issues with environments where multiple threads might deploy a template at the same time. * "GetListTitleForMainLanguage" now uses a batch approach for retrieving the titles of the lists in order to reduce the time and amount of server calls required to retrieve them. * Removed the sorting of "_tokens" when adding a token or creating a "TokenParser" instance. It looks like sorting the list is not/no longer required. This might be added back later. Additional changes have been made: * Formatted some code to enhance the readability. * Replaced some LINQ methods with native methods provided by the collection type. * Changed the code of some methods to exit them early, which reduces the nesting. * Removed unused parameters from private methods. * Removed some commented code. * Moved all variable declarations to the top of the class.

Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to address NU1605 (package downgrade).

# Conflicts: # src/lib/PnP.Framework/PnP.Framework.csproj

Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.35 to address NU1605 (package downgrade) for the "PnP.Framework.Modernization .Test" project.

fzbm · 2024-02-08T07:43:02Z

Are there any updates about this? This is somewhat a production impacting issue for an already active application and heavily impacts a planned feature.

jansenbe · 2024-02-15T09:07:33Z

@fzbm : this is pretty big change requiring a lot of test effort as token parser is fundamental to the provisioning engine. Given I can only spare a few cycles a month on this project I currently don't have the bandwidth to extensively test this. Can you explain more about this: are you using this in your application today, do you feel your provisioning templates are representative of what folks typically would do and do they still work fine using your change?

fzbm · 2024-02-19T13:13:22Z

@jansenbe I don't know if I understand your question correctly, but we are using PnP.Framework since a few years for our automated provisioning of SharePoint sites. The templates are not hand-tailored, but extracted from live sites ("special" template ones). We apply different fixes etc. to the template produced by PnP, because not everything that was extracted can be provisioned again.

We also saw in the telemetry of the services that usually a few deployment jobs are enough to totally lock up the CPU. So while we discovered the issue with the TokenParser just recently, in reality it already affected the production system since a few years.

To answer your question about how representative our use case is: I would say in general PnP is mostly used for smaller tasks. For example PowerShell, providing utility methods and sometimes deploying smaller, hand-tailored templates. So no, I think we are not the usual PnP use case.

Regarding if the use cases of other people would still work after those changes: I tested them multiple times by for example comparing the state of the parser at different points in time with previously created states, in order to identify possible issues which might have been introduced. I also let PnP deploy the offending template, which failed after I think a few content types or lists due to a different problem not related to the changes (just as a disclaimer).

This totally does not mean that everything is fine and the changes should not been reviewed, because I might have overlooked something not so obvious.

Regardless of our use case those changes should be merged into the dev branch in my opinion. Not because I made same, but because they clearly improve the performance of PnP when processing templates.

Currently the parser requires a large amount of CPU time, which is not really necessary. The fact that it is also always in the hot path does not help the case either.

Those changes will speed up PowerShell scripts which use the template feature, backend processes which doing something similar etc.

Because reviewing those changes will take a fair amount of time, I think we will remove the NuGet package from the project and switch temporarily to the modified version by referencing it directly. This allows us to already use those changes in production and we will also see if everything still works as expected.

fzbm · 2024-02-22T11:53:30Z

A small update from my side.

The modified version is in production for about two days and deployed since then 24 different templates to 97 sites. We have observed no issues so far, but what we definitely noticed is a huge improvement regarding the performance.

There are times were it happens that 10 concurrent jobs would deploy templates to 10 different sites. Before the changes situations like this would lead to a 100% CPU load and a really long runtime of each job. With those changes the average load in the same situation is about 40 to 45% percent and the time is about the same as if only one job was executed. In both cases the code was running on a P1v2 App Service.

Logically we also noticed a runtime reduced by half in some cases when deploying a template, although this again depends on the template in question.

NicolajHedeager · 2024-02-28T07:30:03Z

This is definitely a lot more efficient than the existing TokenParser.
Have been trying out this implementation for a few weeks, no issues so far.

fzbm added 9 commits January 16, 2024 10:20

Using "OfType" instead of "Where" and "Cast"

08ed8d3

"ProvisionObjects" of "ObjectSiteSecurity" now uses "OfType" instead of a combination of "Where" and "Cast".

Updated "System.IdentityModel.Tokens.Jwt"

9166e5c

Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to address NU1605 (package downgrade).

Updated "System.IdentityModel.Tokens.Jwt"

734abeb

Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to address NU1605 (package downgrade).

Merge branch 'dev' into feature/tokenparser-optimizations

caa3c25

# Conflicts: # src/lib/PnP.Framework/PnP.Framework.csproj

Updated "System.IdentityModel.Tokens.Jwt"

cd0b171

Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.35 to address NU1605 (package downgrade) for the "PnP.Framework.Modernization .Test" project.

jansenbe requested a review from PaoloPia February 15, 2024 09:07

jansenbe self-assigned this Apr 11, 2024

jansenbe added a commit that referenced this pull request Apr 11, 2024

Merge of #964. Thanks @fzbm

28ba670

jansenbe merged commit 3b951f1 into pnp:dev Apr 11, 2024
1 check passed

fzbm deleted the feature/tokenparser-optimizations branch August 27, 2024 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations to "TokenParser" #964

Optimizations to "TokenParser" #964

fzbm commented Jan 16, 2024

fzbm commented Feb 8, 2024

jansenbe commented Feb 15, 2024

fzbm commented Feb 19, 2024

fzbm commented Feb 22, 2024 •

edited

Loading

NicolajHedeager commented Feb 28, 2024

Optimizations to "TokenParser" #964

Optimizations to "TokenParser" #964

Conversation

fzbm commented Jan 16, 2024

fzbm commented Feb 8, 2024

jansenbe commented Feb 15, 2024

fzbm commented Feb 19, 2024

fzbm commented Feb 22, 2024 • edited Loading

NicolajHedeager commented Feb 28, 2024

fzbm commented Feb 22, 2024 •

edited

Loading