-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations to "TokenParser" #964
Conversation
The extension method "PendingRequestCount" now uses a cached "PropertyInfo" to access the "Actions" property of a "ClientContext". Additional changes: * Changed the code of "PendingRequestCount" to reduce nesting. * Removed unnecessary initializations. * Removed unnecessary references to the class when accessing class-level members. * Fixed naming of static members.
"ProvisionObjects" of "ObjectSiteSecurity" now uses "OfType" instead of a combination of "Where" and "Cast".
The following changes to "TokenDefinition" have been made to optimize the general performance when working with those. * The maximum token length now gets determined once when creating the "TokenDefinition" instance. Tokens can not be changed and therefore determining the maximum length dynamically when calling "GetTokenLength" is not necessary. Also switched from LINQ to a "Math.Max" approach. * A list of by "Regex.Unescaped" processed tokens gets created when creating the "TokenDefinition" instance. This list is also exposed by the new "GetUnescapedTokens" method. * Added a property to return the amount of tokens. * Removed the remaining "this" references, which aligns the affected code with the rest of the class.
The following changes to "SimpleTokenDefinition" have been made to optimize the general performance when working with those. * The maximum token length now gets determined once when creating the "SimpleTokenDefinition" instance. Tokens can not be changed and therefore determining the maximum length dynamically when calling "GetTokenLength" is not necessary. Also switched from LINQ to a "Math.Max" approach. * A list of by "Regex.Unescaped" processed tokens gets created when creating the "SimpleTokenDefinition" instance. This list is also exposed by the new "GetUnescapedTokens" method. * Added a property to return the amount of tokens. * Removed the remaining "this" references, which aligns the affected code with the rest of the class.
The "TokenParser" had some performance issues and required a rework of some parts of the code. The main changes are: * The token cache now only gets created when using "Rebase", the private constructor (for cloning) and creating the instance. * Added a cache for non-cacheable "TokenDefinition" containing the actual "TokenDefinition" instead of the value. This has been added to eliminate the need of creating a dictionary with all non-cacheable definitions most of the times "ParseString" gets called. * The normal and non-cacheable defintions cache gets updated by just processing the affected "TokenDefinition" when calling "AddToken" or "RebuildListTokens". In most cases the cache dictionaries are initialized with a calculated capacity to avoid resizing operations. * Changed "AddResourceTokens" to use a dictionary instead of a list to avoid costly lookup operations when adding "LocalizationToken". Additionally the capacity of the "_tokens" list gets extended to a calculated amount beforehand to avoid resizing operations. * "GetListTitleForMainLanguage" and "_listsTitles" are now instance members. This fixes issues with environments where multiple threads might deploy a template at the same time. * "GetListTitleForMainLanguage" now uses a batch approach for retrieving the titles of the lists in order to reduce the time and amount of server calls required to retrieve them. * Removed the sorting of "_tokens" when adding a token or creating a "TokenParser" instance. It looks like sorting the list is not/no longer required. This might be added back later. Additional changes have been made: * Formatted some code to enhance the readability. * Replaced some LINQ methods with native methods provided by the collection type. * Changed the code of some methods to exit them early, which reduces the nesting. * Removed unused parameters from private methods. * Removed some commented code. * Moved all variable declarations to the top of the class.
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to address NU1605 (package downgrade).
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.34 to address NU1605 (package downgrade).
# Conflicts: # src/lib/PnP.Framework/PnP.Framework.csproj
Updated the "System.IdentityModel.Tokens.Jwt" NuGet package to 6.35 to address NU1605 (package downgrade) for the "PnP.Framework.Modernization .Test" project.
Are there any updates about this? This is somewhat a production impacting issue for an already active application and heavily impacts a planned feature. |
@fzbm : this is pretty big change requiring a lot of test effort as token parser is fundamental to the provisioning engine. Given I can only spare a few cycles a month on this project I currently don't have the bandwidth to extensively test this. Can you explain more about this: are you using this in your application today, do you feel your provisioning templates are representative of what folks typically would do and do they still work fine using your change? |
@jansenbe I don't know if I understand your question correctly, but we are using PnP.Framework since a few years for our automated provisioning of SharePoint sites. The templates are not hand-tailored, but extracted from live sites ("special" template ones). We apply different fixes etc. to the template produced by PnP, because not everything that was extracted can be provisioned again. We also saw in the telemetry of the services that usually a few deployment jobs are enough to totally lock up the CPU. So while we discovered the issue with the To answer your question about how representative our use case is: I would say in general PnP is mostly used for smaller tasks. For example PowerShell, providing utility methods and sometimes deploying smaller, hand-tailored templates. So no, I think we are not the usual PnP use case. Regarding if the use cases of other people would still work after those changes: I tested them multiple times by for example comparing the state of the parser at different points in time with previously created states, in order to identify possible issues which might have been introduced. I also let PnP deploy the offending template, which failed after I think a few content types or lists due to a different problem not related to the changes (just as a disclaimer). This totally does not mean that everything is fine and the changes should not been reviewed, because I might have overlooked something not so obvious. Regardless of our use case those changes should be merged into the Currently the parser requires a large amount of CPU time, which is not really necessary. The fact that it is also always in the hot path does not help the case either. Those changes will speed up PowerShell scripts which use the template feature, backend processes which doing something similar etc. Because reviewing those changes will take a fair amount of time, I think we will remove the NuGet package from the project and switch temporarily to the modified version by referencing it directly. This allows us to already use those changes in production and we will also see if everything still works as expected. |
A small update from my side. The modified version is in production for about two days and deployed since then 24 different templates to 97 sites. We have observed no issues so far, but what we definitely noticed is a huge improvement regarding the performance. There are times were it happens that 10 concurrent jobs would deploy templates to 10 different sites. Before the changes situations like this would lead to a 100% CPU load and a really long runtime of each job. With those changes the average load in the same situation is about 40 to 45% percent and the time is about the same as if only one job was executed. In both cases the code was running on a P1v2 App Service. Logically we also noticed a runtime reduced by half in some cases when deploying a template, although this again depends on the template in question. |
This is definitely a lot more efficient than the existing TokenParser. |
This Pull Requests makes changes to the
TokenParser
to optimize the performance when working with larger templates. There is the companion ticket #963 which also contains some questions, which might result in additional commits once clarified. TheClone
topic is one of those.As described in the ticket the performance of PnP was really bad when deploying a larger template which also contained a larger amount of localization resources. It took several minutes to "initialize the engine" (aka the
TokenParser
) and also up to minutes when deploying a field, content type or list, but the problem has been already described in greater detail in the ticket #963.With the help of a profiler I was able to pinpoint the source of the problem to the
TokenParser
, which really taxed the CPU when processing the localization resources and also when parsing a string (ParseString
) due to repeatedly iterating through allTokenDefinition
, their tokens and processing them withRegex.Unescape
. I made some changes to the initialization of the localization resource tokens and the parsing of strings, which greatly reduced the initialization time of theTokenParser
and also the overall CPU utilization when working withParseString
.The following main changes have been made.
Rebase
, the private constructor (for cloning) and creating the instance.TokenDefinition
containing the actualTokenDefinition
instead of the value. This has been added to eliminate the need of creating a dictionary with all non-cacheable definitions most of the times whenParseString
gets called.TokenDefinition
when callingAddToken
orRebuildListTokens
. In most cases the cache dictionaries are initialized with a calculated capacity to avoid resizing operations.AddResourceTokens
to use a dictionary instead of a list to avoid costly lookup operations when addingLocalizationToken
. Additionally the capacity of the_tokens
list gets extended to a calculated amount beforehand to avoid resizing operations.GetListTitleForMainLanguage
and_listsTitles
are now instance members. This fixes issues with environments where multiple threads might deploy a template at the same time.GetListTitleForMainLanguage
now also uses a batch approach for retrieving the titles of the lists in order to reduce the time and amount of server calls required to retrieve them._tokens
when adding a token or creating aTokenParser
instance. It looks like sorting the list is not/no longer required, but might be added back later based on the discussion in ticket Question regarding "TokenParser" because of pending optimizations #963.The following additional changes have been made. The first two are not part of any of the commit messages due to an oversight on my side.
LCID
property fromuint
toint
.LocalizationToken
to use aDictionary
instead of aList
to speed up the lookup performance when trying to retrieve aResourceEntry
for a specific language.ResourceEntries
property is now also anIReadOnlyList
instead of aList
. A caller should not be able to at least add or remove aResourceEntry
on anLocalizationToken
instance. He would still be able to manipulate theResourceEntry
itself though. The change should not have any impact on existing applications, because the whole class isinternal
and not available outside of the library.