Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations to "TokenParser" #964

Merged
merged 9 commits into from
Apr 11, 2024

Conversation

fzbm
Copy link
Contributor

@fzbm fzbm commented Jan 16, 2024

This Pull Requests makes changes to the TokenParser to optimize the performance when working with larger templates. There is the companion ticket #963 which also contains some questions, which might result in additional commits once clarified. The Clone topic is one of those.

As described in the ticket the performance of PnP was really bad when deploying a larger template which also contained a larger amount of localization resources. It took several minutes to "initialize the engine" (aka the TokenParser) and also up to minutes when deploying a field, content type or list, but the problem has been already described in greater detail in the ticket #963.

With the help of a profiler I was able to pinpoint the source of the problem to the TokenParser, which really taxed the CPU when processing the localization resources and also when parsing a string (ParseString) due to repeatedly iterating through all TokenDefinition, their tokens and processing them with Regex.Unescape. I made some changes to the initialization of the localization resource tokens and the parsing of strings, which greatly reduced the initialization time of the TokenParser and also the overall CPU utilization when working with ParseString.

The following main changes have been made.

  • The token cache now only gets created when using Rebase, the private constructor (for cloning) and creating the instance.
  • Added a cache for non-cacheable TokenDefinition containing the actual TokenDefinition instead of the value. This has been added to eliminate the need of creating a dictionary with all non-cacheable definitions most of the times when ParseString gets called.
  • The normal and non-cacheable defintions cache gets updated by just processing the affected TokenDefinition when calling AddToken or RebuildListTokens. In most cases the cache dictionaries are initialized with a calculated capacity to avoid resizing operations.
  • Changed AddResourceTokens to use a dictionary instead of a list to avoid costly lookup operations when adding LocalizationToken. Additionally the capacity of the _tokens list gets extended to a calculated amount beforehand to avoid resizing operations.
  • GetListTitleForMainLanguage and _listsTitles are now instance members. This fixes issues with environments where multiple threads might deploy a template at the same time.
    • GetListTitleForMainLanguage now also uses a batch approach for retrieving the titles of the lists in order to reduce the time and amount of server calls required to retrieve them.
  • Removed the sorting of _tokens when adding a token or creating a TokenParser instance. It looks like sorting the list is not/no longer required, but might be added back later based on the discussion in ticket Question regarding "TokenParser" because of pending optimizations #963.

The following additional changes have been made. The first two are not part of any of the commit messages due to an oversight on my side.

  • ResourceEntry
    • Changed the type of the LCID property from uint to int.
  • LocalizationToken
    • Changed LocalizationToken to use a Dictionary instead of a List to speed up the lookup performance when trying to retrieve a ResourceEntry for a specific language.
      • The ResourceEntries property is now also an IReadOnlyList instead of a List. A caller should not be able to at least add or remove a ResourceEntry on an LocalizationToken instance. He would still be able to manipulate the ResourceEntry itself though. The change should not have any impact on existing applications, because the whole class is internal and not available outside of the library.
  • TokenParser
    • Formatted some code to enhance the readability.
    • Replaced some LINQ methods with native methods provided by the specific collection type.
    • Changed the code of some methods to exit them early, which reduces the nesting.
    • Removed unused parameters from private methods.
    • Removed some commented code.
    • Moved all variable declarations to the top of the class.

fzbm added 9 commits January 16, 2024 10:20
The extension method "PendingRequestCount" now uses a cached
"PropertyInfo" to access the "Actions" property of a "ClientContext".

Additional changes:
* Changed the code of "PendingRequestCount" to reduce nesting.
* Removed unnecessary initializations.
* Removed unnecessary references to the class when accessing class-level
  members.
* Fixed naming of static members.
"ProvisionObjects" of "ObjectSiteSecurity" now uses "OfType" instead of
a combination of "Where" and "Cast".
The following changes to "TokenDefinition" have been made to optimize
the general performance when working with those.

* The maximum token length now gets determined once when creating the
  "TokenDefinition" instance. Tokens can not be changed and therefore
  determining the maximum length dynamically when calling
  "GetTokenLength" is not necessary. Also switched from LINQ to a
  "Math.Max" approach.
* A list of by "Regex.Unescaped" processed tokens gets created when
  creating the "TokenDefinition" instance. This list is also exposed by
  the new "GetUnescapedTokens" method.
* Added a property to return the amount of tokens.
* Removed the remaining "this" references, which aligns the affected
  code with the rest of the class.
The following changes to "SimpleTokenDefinition" have been made to
optimize the general performance when working with those.

* The maximum token length now gets determined once when creating the
  "SimpleTokenDefinition" instance. Tokens can not be changed and
  therefore determining the maximum length dynamically when calling
  "GetTokenLength" is not necessary. Also switched from LINQ to a
  "Math.Max" approach.
* A list of by "Regex.Unescaped" processed tokens gets created when
  creating the "SimpleTokenDefinition" instance. This list is also
  exposed by the new "GetUnescapedTokens" method.
* Added a property to return the amount of tokens.
* Removed the remaining "this" references, which aligns the affected
  code with the rest of the class.
The "TokenParser" had some performance issues and required a rework of
some parts of the code.

The main changes are:
* The token cache now only gets created when using "Rebase", the private
  constructor (for cloning) and creating the instance.
* Added a cache for non-cacheable "TokenDefinition" containing the
  actual "TokenDefinition" instead of the value. This has been added to
  eliminate the need of creating a dictionary with all non-cacheable
  definitions most of the times "ParseString" gets called.
* The normal and non-cacheable defintions cache gets updated by just
  processing the affected "TokenDefinition" when calling "AddToken" or
  "RebuildListTokens". In most cases the cache dictionaries are
  initialized with a calculated capacity to avoid resizing operations.
* Changed "AddResourceTokens" to use a dictionary instead of a list to
  avoid costly lookup operations when adding "LocalizationToken".
  Additionally the capacity of the "_tokens" list gets extended to a
  calculated amount beforehand to avoid resizing operations.
* "GetListTitleForMainLanguage" and "_listsTitles" are now instance
  members. This fixes issues with environments where multiple threads
  might deploy a template at the same time.
* "GetListTitleForMainLanguage" now uses a batch approach for retrieving
  the titles of the lists in order to reduce the time and amount of
  server calls required to retrieve them.
* Removed the sorting of "_tokens" when adding a token or creating a
  "TokenParser" instance. It looks like sorting the list is not/no
  longer required. This might be added back later.

Additional changes have been made:
* Formatted some code to enhance the readability.
* Replaced some LINQ methods with native methods provided by the
  collection type.
* Changed the code of some methods to exit them early, which reduces the
  nesting.
* Removed unused parameters from private methods.
* Removed some commented code.
* Moved all variable declarations to the top of the class.
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to
address NU1605 (package downgrade).
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to
address NU1605 (package downgrade).
# Conflicts:
#	src/lib/PnP.Framework/PnP.Framework.csproj
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.35 to
address NU1605 (package downgrade) for the "PnP.Framework.Modernization
.Test" project.
@fzbm
Copy link
Contributor Author

fzbm commented Feb 8, 2024

Are there any updates about this? This is somewhat a production impacting issue for an already active application and heavily impacts a planned feature.

@jansenbe
Copy link
Contributor

@fzbm : this is pretty big change requiring a lot of test effort as token parser is fundamental to the provisioning engine. Given I can only spare a few cycles a month on this project I currently don't have the bandwidth to extensively test this. Can you explain more about this: are you using this in your application today, do you feel your provisioning templates are representative of what folks typically would do and do they still work fine using your change?

@jansenbe jansenbe requested a review from PaoloPia February 15, 2024 09:07
@fzbm
Copy link
Contributor Author

fzbm commented Feb 19, 2024

@jansenbe I don't know if I understand your question correctly, but we are using PnP.Framework since a few years for our automated provisioning of SharePoint sites. The templates are not hand-tailored, but extracted from live sites ("special" template ones). We apply different fixes etc. to the template produced by PnP, because not everything that was extracted can be provisioned again.

We also saw in the telemetry of the services that usually a few deployment jobs are enough to totally lock up the CPU. So while we discovered the issue with the TokenParser just recently, in reality it already affected the production system since a few years.


To answer your question about how representative our use case is: I would say in general PnP is mostly used for smaller tasks. For example PowerShell, providing utility methods and sometimes deploying smaller, hand-tailored templates. So no, I think we are not the usual PnP use case.


Regarding if the use cases of other people would still work after those changes: I tested them multiple times by for example comparing the state of the parser at different points in time with previously created states, in order to identify possible issues which might have been introduced. I also let PnP deploy the offending template, which failed after I think a few content types or lists due to a different problem not related to the changes (just as a disclaimer).

This totally does not mean that everything is fine and the changes should not been reviewed, because I might have overlooked something not so obvious.


Regardless of our use case those changes should be merged into the dev branch in my opinion. Not because I made same, but because they clearly improve the performance of PnP when processing templates.

Currently the parser requires a large amount of CPU time, which is not really necessary. The fact that it is also always in the hot path does not help the case either.

Those changes will speed up PowerShell scripts which use the template feature, backend processes which doing something similar etc.


Because reviewing those changes will take a fair amount of time, I think we will remove the NuGet package from the project and switch temporarily to the modified version by referencing it directly. This allows us to already use those changes in production and we will also see if everything still works as expected.

@fzbm
Copy link
Contributor Author

fzbm commented Feb 22, 2024

A small update from my side.

The modified version is in production for about two days and deployed since then 24 different templates to 97 sites. We have observed no issues so far, but what we definitely noticed is a huge improvement regarding the performance.

There are times were it happens that 10 concurrent jobs would deploy templates to 10 different sites. Before the changes situations like this would lead to a 100% CPU load and a really long runtime of each job. With those changes the average load in the same situation is about 40 to 45% percent and the time is about the same as if only one job was executed. In both cases the code was running on a P1v2 App Service.

Logically we also noticed a runtime reduced by half in some cases when deploying a template, although this again depends on the template in question.

@NicolajHedeager
Copy link

This is definitely a lot more efficient than the existing TokenParser.
Have been trying out this implementation for a few weeks, no issues so far.

@jansenbe jansenbe self-assigned this Apr 11, 2024
jansenbe added a commit that referenced this pull request Apr 11, 2024
@jansenbe jansenbe merged commit 3b951f1 into pnp:dev Apr 11, 2024
1 check passed
@fzbm fzbm deleted the feature/tokenparser-optimizations branch August 27, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants