-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New (internal) Cache and NoFileCache classes + implement use of these #332
Merged
jrfnl
merged 15 commits into
develop
from
feature/new-internal-cache-classes-with-implementations
Jun 30, 2022
Merged
New (internal) Cache and NoFileCache classes + implement use of these #332
jrfnl
merged 15 commits into
develop
from
feature/new-internal-cache-classes-with-implementations
Jun 30, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jrfnl
force-pushed
the
feature/new-internal-cache-classes-with-implementations
branch
2 times, most recently
from
June 30, 2022 15:24
e1cdc0b
to
b3a5ccc
Compare
Based on a similar class I wrote for the Requests library. If/when that class get published as a package, this class should be removed in favour of the package.
... to allow for caching the results of processing intensive utility methods. The `Cache` class is intended to be used for utility functions which depend on the `$phpcsFile` object. The `NoFileCache` class is for file-independent functions. Use selectively and with care as the memory usage of PHPCS will increase when using these caches! Includes full set of tests to cover this new functionality. These new tests have been added to the `RunFirst` test suite to prevent the cache setting and clearing done in these tests from interfering with the rest of the tests.
Includes: * Additional tests to cover this change on the off-chance that someone will use this functionality in a non-test situation. * Making sure that the tests related to the `[NoFile]Cache` classes will always run with caching turned **on** as running the cache functionality tests without caching enabled is a little pointless (and would fail the tests).
…ew `Cache` class As `declare` constructs using alternative syntax can be long, determining the scope opener/closer can involve a lot of token walking, which is why I'm implementing caching for it. Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set when the `T_DECLARE` token doesn't natively have a scope opener/closer assigned. Includes a dedicated test to verify the cache is used and working.
…he new `Cache` class While arrow functions are generally only small snippets of code, as they don't always have a clear "end token", determining whether something is an arrow function and retrieving the relevant open/close tokens can be token-walking intensive, which is why I'm implementing caching for it. Note: the results will only be cached for the backported functionality. When using a more recent PHPCS version, the cache code will only be reached in the case of parse errors. Includes a dedicated test to verify the cache is used and working.
jrfnl
force-pushed
the
feature/new-internal-cache-classes-with-implementations
branch
from
June 30, 2022 20:59
b3a5ccc
to
58655b0
Compare
This method searches up in a file to try and find an applicable namespace declaration. As non-scoped namespace declarations are often at the top of the file, this can involve a **_lot_** of token walking, which is why I'm implementing caching for it. Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set when the token passed is not in a scoped namespace declaration. Includes a dedicated test to verify the cache is used and working.
…lass While most function calls only involve a few parameters, splitting out an array into the individual array items can be very processing intense and will slow things down considerably when multiple sniffs each need the break down of the same array. With this in mind, I'm implementing caching for it. The saved cache can be quite large for calls to this function, at the same time, the trade off performance-wise is worth it in this case IMO. Note: as this function can be called with a `$limit`, we need to be able to distinguish between "limited" results and "full" results, but should also take advantage of available "full" results when a consecutive function call tries to retrieve a "limited" result. The implementation takes this into account, though the situation where a "limited" result was previously retrieved (and cached), and new "limited" call is made with a _lower_ limit currently does not take full advantage of the available cache. This is possibly an improvement which can still be made in the future. Includes a set of dedicated tests to verify the cache is used and working, including testing specifically how the cache is set and used when the `$limit` parameter has been passed.
…Cache` class This method finds the end of a potentially multi-line text string. While most text strings are short and even multi-line ones are often only a few lines, some can be thousands of lines long. With that in mind, I'm implementing caching for this method. I've considered also applying caching for the `TextStrings::getCompleteTextString()` method, but as - in the case of thousands of lines long text - that would take a huge chunk of memory, I've decided against it. Caching the "end token" of a multi-line text string should sufficiently improve performance. Includes a dedicated test to verify the cache is used and working.
…ache` class Depending on whether or not group use or multi-use statements are used, this method can involve sufficient token walking to make caching relevant. Especially when keeping in mind that this function will likely be used a **_lot_** in the near future as part of the namespace resolution functionality, so retrieving the results via the cache instead of executing the same logic dozens of times for a file, seem prudent. Includes a dedicated test to verify the cache is used and working.
While this function will generally be _fast_ for array entries _with_ a double arrow, it will be _slow_ for array entries without a double arrow, especially if the contents of the array item contains a lot of tokens as the fact that the array item does not have a double arrow can only be determined when the end of the array item has been reached. With this in mind, I'm implementing caching for it. Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set. Includes a dedicated test to verify the cache is used and working.
While most list only involve a few assignment, for more complex and nested list structures, splitting the list into the individual assignments can tax performance, which is why I'm implementing caching for it. Includes a dedicated test to verify the cache is used and working.
… `Cache` class These are easily the most used functions as well as the slowest functions (when dealing with large arrays/lists). While this commit already implements the use of the new `Cache` class, further improvements are still needed and will be pulled in a follow-up PR. Note: code coverage for these functions _may_ go down a little due to the multiple exit points in the functions. I'm not concerned about that as that - again - will be addressed in the follow-up PR. Includes a dedicated test to verify the cache is used and working.
… class Depending on the size of the text string, this function _could_ become pretty slow/performance intense, which is why I'm implementing caching for it. Includes a dedicated test to verify the cache is used and working.
This commit sets up a mechanism in the test `bootstrap.php` file to turn the cache on/off depending on an environment variable, which can be set from within a `phpunit.xml` file or on the OS-level. This allows for running the tests both with caching turned on, as well as with caching turned off. By default, the tests will run with caching turned **off**. Being able to run the tests with both settings will allow for: * [Caching off] Making sure that all code paths can be reached (code coverage check). Caching being enabled could prevent some code paths from being tested. * [Caching on] Making sure that caching does not (negatively) impact the actual results of the functions.
Adjust the GH actions test workflows to run the complete test suite once with the cache turned on and once with the cache turned off. The test run with caching turned on will run the tests twice and only run the tests which don't need isolation. This should safeguard that the cache does not negatively influence sniff results. The code coverage job will (should) run with caching turned _off_ to prevent tests not reaching all code paths due to cached results from previous tests short-circuiting things. Note: the effect of caching on the test suite will be minimal as most tests will only execute a function once for a particular input/code snippet. Still good to safeguard/double-check though.
jrfnl
force-pushed
the
feature/new-internal-cache-classes-with-implementations
branch
from
June 30, 2022 21:09
58655b0
to
ecfef72
Compare
While it looked like Coveralls was starting to do better, looks like when it comes down to it for crucial checks, they are still drunk. Merging as in reality coverage has only gone down with less than 0.1%. |
jrfnl
deleted the
feature/new-internal-cache-classes-with-implementations
branch
June 30, 2022 21:20
This was referenced Apr 29, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tests: add TypeProviderHelper class
Based on a similar class I wrote for the Requests library.
If/when that class get published as a package, this class should be removed in favour of the package.
✨ New (internal) Cache and NoFileCache classes
... to allow for caching the results of processing intensive utility methods.
The
Cache
class is intended to be used for utility functions which depend on the$phpcsFile
object.The
NoFileCache
class is for file-independent functions.Use selectively and with care as the memory usage of PHPCS will increase when using these caches!
Includes full set of tests to cover this new functionality.
These new tests have been added to the
RunFirst
test suite to prevent the cache setting and clearing done in these tests from interfering with the rest of the tests.[NoFile]Cache: allow for disabling the cache in test situations
Includes:
[NoFile]Cache
classes will always run with caching turned on as running the cache functionality tests without caching enabled is a little pointless (and would fail the tests).ControlStructures::getDeclareScopeOpenClose(): implement use of the new
Cache
classAs
declare
constructs using alternative syntax can be long, determining the scope opener/closer can involve a lot of token walking, which is why I'm implementing caching for it.Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set when the
T_DECLARE
token doesn't natively have a scope opener/closer assigned.Includes a dedicated test to verify the cache is used and working.
FunctionDeclarations::getArrowFunctionOpenClose(): implement use of the new
Cache
classWhile arrow functions are generally only small snippets of code, as they don't always have a clear "end token", determining whether something is an arrow function and retrieving the relevant open/close tokens can be token-walking intensive, which is why I'm implementing caching for it.
Note: the results will only be cached for the backported functionality. When using a more recent PHPCS version, the cache code will only be reached in the case of parse errors.
Includes a dedicated test to verify the cache is used and working.
Namespaces::findNamespacePtr(): implement use of the new
Cache
classThis method searches up in a file to try and find an applicable namespace declaration.
As non-scoped namespace declarations are often at the top of the file, this can involve a lot of token walking, which is why I'm implementing caching for it.
Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set when the token passed is not in a scoped namespace declaration.
Includes a dedicated test to verify the cache is used and working.
PassedParameters::getParameters(): implement use of the new
Cache
classWhile most function calls only involve a few parameters, splitting out an array into the individual array items can be very processing intense and will slow things down considerably when multiple sniffs each need the break down of the same array.
With this in mind, I'm implementing caching for it.
The saved cache can be quite large for calls to this function, at the same time, the trade off performance-wise is worth it in this case IMO.
Note: as this function can be called with a
$limit
, we need to be able to distinguish between "limited" results and "full" results, but should also take advantage of available "full" results when a consecutive function call tries to retrieve a "limited" result.The implementation takes this into account, though the situation where a "limited" result was previously retrieved (and cached), and new "limited" call is made with a lower limit currently does not take full advantage of the available cache. This is possibly an improvement which can still be made in the future.
Includes a set of dedicated tests to verify the cache is used and working, including testing specifically how the cache is set and used when the
$limit
parameter has been passed.TextStrings::getEndOfCompleteTextString(): implement use of the new
Cache
classThis method finds the end of a potentially multi-line text string.
While most text strings are short and even multi-line ones are often only a few lines, some can be thousands of lines long.
With that in mind, I'm implementing caching for this method.
I've considered also applying caching for the
TextStrings::getCompleteTextString()
method, but as - in the case of thousands of lines long text - that would take a huge chunk of memory, I've decided against it.Caching the "end token" of a multi-line text string should sufficiently improve performance.
Includes a dedicated test to verify the cache is used and working.
UseStatements::splitImportUseStatement(): implement use of the new
Cache
classDepending on whether or not group use or multi-use statements are used, this method can involve sufficient token walking to make caching relevant.
Especially when keeping in mind that this function will likely be used a lot in the near future as part of the namespace resolution functionality, so retrieving the results via the cache instead of executing the same logic dozens of times for a file, seem prudent.
Includes a dedicated test to verify the cache is used and working.
Arrays::getDoubleArrowPtr(): implement use of the new
Cache
classWhile this function will generally be fast for array entries with a double arrow, it will be slow for array entries without a double arrow, especially if the contents of the array item contains a lot of tokens as the fact that the array item does not have a double arrow can only be determined when the end of the array item has been reached.
With this in mind, I'm implementing caching for it.
Includes some minor tweaks to the exit routes of the function to ensure that the cache will always be set.
Includes a dedicated test to verify the cache is used and working.
Lists::getAssignments(): implement use of the new
Cache
classWhile most list only involve a few assignment, for more complex and nested list structures, splitting the list into the individual assignments can tax performance, which is why I'm implementing caching for it.
Includes a dedicated test to verify the cache is used and working.
Arrays::isShortArray()/Lists::isShortList(): implement use of the new
Cache
classThese are easily the most used functions as well as the slowest functions (when dealing with large arrays/lists).
While this commit already implements the use of the new
Cache
class, further improvements are still needed and will be pulled in a follow-up PR.Note: code coverage for these functions may go down a little due to the multiple exit points in the functions. I'm not concerned about that as that - again - will be addressed in the follow-up PR.
Includes a dedicated test to verify the cache is used and working.
TextStrings::getStripEmbeds(): implement use of the new
NoFileCache
classDepending on the size of the text string, this function could become pretty slow/performance intense, which is why I'm implementing caching for it.
Includes a dedicated test to verify the cache is used and working.
Tests/bootstrap: allow for running the tests with/without caching
This commit sets up a mechanism in the test
bootstrap.php
file to turn the cache on/off depending on an environment variable, which can be set from within aphpunit.xml
file or on the OS-level.This allows for running the tests both with caching turned on, as well as with caching turned off.
By default, the tests will run with caching turned off.
Being able to run the tests with both settings will allow for:
GH Actions: run the tests twice - once with cache, once without
Adjust the GH actions test workflows to run the complete test suite once with the cache turned on and once with the cache turned off.
The test run with caching turned on will run the tests twice and only run the tests which don't need isolation.
This should safeguard that the cache does not negatively influence sniff results.
The code coverage job will (should) run with caching turned off to prevent tests not reaching all code paths due to cached results from previous tests short-circuiting things.
Note: the effect of caching on the test suite will be minimal as most tests will only execute a function once for a particular input/code snippet. Still good to safeguard/double-check though.