Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rosetta): transliterate command does not translate in parallel #3262

Merged
merged 4 commits into from
Dec 17, 2021

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Dec 15, 2021

Make the rosetta transliterate command do translations in parallel.

Achieve this by effectively running rosetta extract beforehand, which
does to translations in parallel using worker threads, but don't save
the results anywhere. Instead, use those results as an in-memory cache
when doing the single pass.

Since there shouldn't be any untranslated snippets left between
extract and transliterate, set the UnknownSnippetMode to FAIL,
which netted me errors showing that extract and transliterate
don't look at the same fields! Specifically, initializer and parameters!


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Make the `rosetta transliterate` command do translations in parallel.

Achieve this by effectively running `rosetta extract` beforehand, which
does to translations in parallel using worker threads, but don't save
the results anywhere. Instead, use those results as an in-memory cache
when doing the single pass.

Since there shouldn't be any untranslated snippets left between
extract and transliterate, set the `UnknownSnippetMode` to `FAIL`,
which netted me errors showing that `extract` and `transliterate`
don't look at the same fields! Specifically, initializer and parameters!
@rix0rrr rix0rrr requested review from kaizencc and a team December 15, 2021 15:41
@rix0rrr rix0rrr self-assigned this Dec 15, 2021
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Dec 15, 2021
Copy link
Contributor

@kaizencc kaizencc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, with some minor nits.

packages/jsii-rosetta/lib/snippet.ts Outdated Show resolved Hide resolved
| {
readonly api: 'parameter';
readonly fqn: string;
readonly methodName: string | typeof INITIALIZER_METHOD_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused here... isn't typeof INITIALIZER_METHOD_NAME also a string? I'm not sure what's going on here but obviously this is something you intended to do. Why?

Copy link
Contributor Author

@rix0rrr rix0rrr Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't typeof INITIALIZER_METHOD_NAME also a string?

In fact it's specifically the string '<initializer>', and no other string.

This is a TypeScript feature, where you can say that the type of a variable is a string literal. It's also used 2 lines up, where we say:

 readonly api: 'parameter';

api is not just any old string, but specifically the string 'parameter'.

(Obviously in this particular case '<initializer>'string so the union is kinda pointless, but I'm writing it here as formal documentation)

const rosetta = new RosettaTabletReader({
includeCompilerDiagnostics: true,
unknownSnippets: UnknownSnippetMode.TRANSLATE,
unknownSnippets: UnknownSnippetMode.FAIL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: if we are removing the TRANSLATE option here, shouldn't we also remove loose: options.loose on line 74? It doesn't change anything, but it looks like loose for RosettaTabletReader is used when translating missing snippets, which we are no longer doing.

Since we are not translating and loose is an optional argument, we should remove it.

@kaizencc kaizencc added the pr/do-not-merge This PR should not be merged at this time. label Dec 16, 2021
rix0rrr and others added 2 commits December 17, 2021 09:55
Co-authored-by: Kaizen Conroy <36202692+kaizen3031593@users.noreply.github.com>
@mergify
Copy link
Contributor

mergify bot commented Dec 17, 2021

The title of this Pull Request does not conform with [Conventional Commits] guidelines. It will need to be adjusted before the PR can be merged.
[Conventional Commits]: https://www.conventionalcommits.org

@rix0rrr rix0rrr removed the pr/do-not-merge This PR should not be merged at this time. label Dec 17, 2021
@mergify
Copy link
Contributor

mergify bot commented Dec 17, 2021

Thank you for contributing! ❤️ I will now look into making sure the PR is up-to-date, then proceed to try and merge it!

@mergify mergify bot added the pr/ready-to-merge This PR is ready to be merged. label Dec 17, 2021
@mergify
Copy link
Contributor

mergify bot commented Dec 17, 2021

Merging (with squash)...

@mergify mergify bot merged commit beeadaa into main Dec 17, 2021
@mergify mergify bot deleted the huijbers/transliterate-parallel branch December 17, 2021 11:15
@mergify
Copy link
Contributor

mergify bot commented Dec 17, 2021

Merging (with squash)...

@mergify mergify bot removed the pr/ready-to-merge This PR is ready to be merged. label Dec 17, 2021
rix0rrr added a commit that referenced this pull request Jan 6, 2022
In #3262, the performance of `transliterate` was improved by running an
`extract` before translating, because `extract` has the facilities to do
parallel translate, and can read from a cache.

However, another thing `extract` does is *discard translations from the
cache* if they are considered "dirty" for some reason. This reason
can be: the current source is different than the cached source, the
translation mechanism has changed, compilation didn't succeed last time,
or the types referenced in the example have changed.

This is actually not desirable for `transliterate`, which wants to
do a last-ditch effort to translate whatever is necessary to translate,
but should really take what it can from the cache if there's a cached
translation available.

Add a flag to allow reading dirty translations from the cache.
mergify bot pushed a commit that referenced this pull request Jan 7, 2022
…#3324)

In #3262, the performance of `transliterate` was improved by running an
`extract` before translating, because `extract` has the facilities to do
parallel translate, and can read from a cache.

However, another thing `extract` does is *discard translations from the
cache* if they are considered "dirty" for some reason. This reason
can be: the current source is different than the cached source, the
translation mechanism has changed, compilation didn't succeed last time,
or the types referenced in the example have changed.

This is actually not desirable for `transliterate`, which wants to
do a last-ditch effort to translate whatever is necessary to translate,
but should really take what it can from the cache if there's a cached
translation available.

Add a flag to allow reading dirty translations from the cache.

Also adds the `rosetta coverage` command, which I used to find out why
the Construct Hub performance on the newest `aws-cdk-lib` version is
still poor.

---

By submitting this pull request, I confirm that my contribution is made under the terms of the [Apache 2.0 license].

[Apache 2.0 license]: https://www.apache.org/licenses/LICENSE-2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants