Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(jsii): cache results of case conversions #3602

Merged
merged 3 commits into from
Jun 16, 2022

Conversation

RomainMuller
Copy link
Contributor

The Case library appears on top of the heavy weights when profiling
jsii compilation runs on aws-cdk-lib, accounting for about 9.6% of
the overall execution time (6,712 out of 68,935 ms). This is quite
significant.

Faster libraries performing case conversions exist, however they do not
offer the same level of functinoality that Case does, and often
require knowing which casing the string is presently at, when Case
performs normalization before conversion (this likely explains why these
conversions are somewhat expensive), which is convenient in our
use-case. It does not seem to be possible to replace Case with a
faster implementation, as there does not seem to be any.

Running a survey on the usage of all functions from Case being used
during usch a compilation resulted in the following results:

Case.camel: 1881755 cached out of 1893235 (99.39%) - 11480 entries / 164.9 hits per entry
Case.pascal: 1890795 cached out of 1909959 (99.00%) - 19164 entries / 99.7 hits per entry
Case.snake: 1882773 cached out of 1895359 (99.34%) - 12586 entries / 150.6 hits per entry

Since there is a cache hit rate of 99% and entries are re-used 100
times each on average, it appears that adding a cache on those functions
will help alleviate that cost. Indeed, after making this change, Case
is no longer present above the 100ms mark in the list of heavy
weights produced by the profiler.

In order to avoid possibly trading performance for out-of-memory errors,
the caches are keyed off of weak references, so that the caches can be
garbage collected when under memory pressure.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@RomainMuller RomainMuller self-assigned this Jun 15, 2022
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Jun 15, 2022
@RomainMuller RomainMuller force-pushed the rmuller/cache-case-conversions branch 9 times, most recently from c5a7bbb to baaa117 Compare June 16, 2022 10:12
The `Case` library appears on top of the heavy weights when profiling
`jsii` compilation runs on `aws-cdk-lib`, accounting for about `9.6%` of
the overall execution time (`6,712` out of `68,935` ms). This is quite
significant.

Faster libraries performing case conversions exist, however they do not
offer the same level of functinoality that `Case` does, and often
require knowing which casing the string is presently at, when `Case`
performs normalization before conversion (this likely explains why these
conversions are somewhat expensive), which is convenient in our
use-case. It does not seem to be possible to replace `Case` with a
faster implementation, as there does not seem to be any.

Running a survey on the usage of all functions from `Case` being used
during usch a compilation resulted in the following results:

```
Case.camel: 1881755 cached out of 1893235 (99.39%) - 11480 entries / 164.9 hits per entry
Case.pascal: 1890795 cached out of 1909959 (99.00%) - 19164 entries / 99.7 hits per entry
Case.snake: 1882773 cached out of 1895359 (99.34%) - 12586 entries / 150.6 hits per entry
```

Since there is a cache hit rate of `99%` and entries are re-used `100`
times each on average, it appears that adding a cache on those functions
will help alleviate that cost. Indeed, after making this change, `Case`
is no longer present above the `100ms` mark in the list of heavy
weights produced by the profiler.

In order to avoid possibly trading performance for out-of-memory errors,
the caches are keyed off of weak references, so that the caches can be
garbage collected when under memory pressure.
@RomainMuller RomainMuller force-pushed the rmuller/cache-case-conversions branch from baaa117 to e48a0f9 Compare June 16, 2022 10:15

class CacheKey {
// Storing cache keys as weak references to allow garbage collection if there is memory pressure.
private static readonly STORE = new Map<any, WeakRef<CacheKey>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this additional cache - why do you need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to allow the caches to be garbage collected if under memory pressure, I'm using a WeakMap there, which only accepts object keys (primitives such as string are illegal to use as keys there).

The CacheKey is supposed to be a unique value associated to any given case-conversion function, and in order for the WeakMap to work it's magic, I need to ensure I hold no strong reference to the CacheKey objects (or else they won't be GC'd, meaning the string cache also won't be GC'd). I store the function to WeakRef<CacheKey> association so that I'm able to get the "current" CacheKey for a function, but the WeakRef's deref will return undefined after that key was GC'd, so need to make a new one then.

Makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, after I posted It dawned on me that the WeakRef can only accept, well, refs :)


class Cache {
// Cache is indexed on a weak CacheKey so the cache can be purged under memory pressure
private static readonly CACHES = new WeakMap<CacheKey, Map<string, string>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is CacheKey an ok type to use in a map? Shouldn't it be hashable? Why can't this just be the function name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must be thinking about Java. Any object can be used as a key of a WeakMap, they will be matched based on object identity (and V8 knows to get a hash for these).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was definitely thinking about Java 👍

@RomainMuller RomainMuller removed the request for review from MrArnoldPalmer June 16, 2022 11:31
@mergify
Copy link
Contributor

mergify bot commented Jun 16, 2022

Thank you for contributing! ❤️ I will now look into making sure the PR is up-to-date, then proceed to try and merge it!

@mergify mergify bot added the pr/ready-to-merge This PR is ready to be merged. label Jun 16, 2022
@mergify
Copy link
Contributor

mergify bot commented Jun 16, 2022

Merging (with squash)...

@mergify mergify bot merged commit 8e2eea6 into main Jun 16, 2022
@mergify mergify bot deleted the rmuller/cache-case-conversions branch June 16, 2022 11:32
@mergify mergify bot removed the pr/ready-to-merge This PR is ready to be merged. label Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants