Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache canonicalize calls when containingUrl is available #2215

Merged
merged 3 commits into from
Apr 11, 2024

Conversation

nex3
Copy link
Contributor

@nex3 nex3 commented Apr 9, 2024

See #2208

@nex3 nex3 requested a review from Goodwine April 9, 2024 23:13
@nex3
Copy link
Contributor Author

nex3 commented Apr 9, 2024

@ntkme

@nex3 nex3 force-pushed the canonicalize-cache branch from c97ef5e to 6021d0d Compare April 9, 2024 23:21
if (await _canonicalize(importer, url, baseUrl, forImport)
case var result?) {
var key = (url, forImport: forImport);
if (_canonicalizeCache.containsKey(key)) return _canonicalizeCache[key];
Copy link
Contributor

@ntkme ntkme Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of an edge case with two importers [importerA, importerB]:

  1. Resolve an import x, with containing url u:y. importerA accessed containingUrl, but could not resolve it, then importerB resolved it without using containingUrl thus get cached.
  2. Resolve an import x, with containing url file://a.scss. importerA would have been able to resolve it because now this has a different containingUrl that importerA can handle, however, because of the previous cache from importerB, importerA would not even get attempted.

I think this would be unexpected, because importerA is before importerB.

Copy link
Contributor Author

@nex3 nex3 Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, _canonicalize() would return (null, false) for importerA because containingUrl is passed, so cacheable will be set to false on line 188 and importerB's result won't be cached even if it would be cacheable in isolation (since that case is guarded by when cacheable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot this is the first iteration that cacheable is based on whether it's passed or not, instead of whether it's used or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic should work the same if it just checks access—we'd still return (null, true) in the case where importerA accessed containingUrl but couldn't resolve the import.

Copy link
Contributor

@ntkme ntkme Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works correctly.

If I read it correctly, any non-cacheable importer will make all importers after it non-cacheable for the current import. And therefore if a FilesystemImporter is low priority it would not be cached and cost can be high? If that’s the case it might still be worth to have per importer cache key instead of single cache key for all importers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have [nonCacheableImporter, loadPathImporter], with the implementation in this PR the loadPathImporter would not be cached, and repeatedly loading the same file from load path can get slower due to repeated I/O calls.

Thus I think it's worth to make the trade-off to use a little bit more memory to cache per importer (and even cache failed canonicalization if cacheable), and a bit more CPU to re-check cache per importer, like the pseudo code below:

for (var importer in importers) {
    var key = (url, forImport: forImport, importer: importer);
    if (_canonicalizeCache.containsKey(key)) {
      var cached = _canonicalizeCache[key];
      if (cached != null) return cached;
    } else {
      var (result, cacheable) = await _canonicalize(importer, url, baseUrl, forImport);
      if (cacheable) {
        _canonicalizeCache[key] = result;
      }
      if (result != null) return result;
    }
}
return null;

The cost of CPU/memory overhead would likely be lower than the I/O overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's crazy to have a per-importer canonicalize cache in addition to the current whole-load-path cache, although I think we might want to be a bit more sophisticated about only filling it if we run into an uncacheable load in practice. Either way, let's save that for a follow-up after we fix the initial bug.

if (_canonicalizeCache.containsKey(key)) return _canonicalizeCache[key];

var cacheable = true;
for (var importer in _importers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I was able to follow the caching logic, but it wasn't very obvious at the beginning. It would be great if the caching strategy was documented, but at the same time the documenting the strategy also feels like documenting an implementation detail so maybe it doesn't have to be documented at all 🤔.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to explain in more detail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants