-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 253: CDK Metadata v2 #254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not gone through entirely. Comments so far...
text/0253-cdk-metadata-v2.md
Outdated
* Verify metrics are coming in | ||
* Remove the old `Modules` field from the metadata resource. | ||
|
||
# Appendix 1: Reference implementation for tree-encoding used above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really understand what this code is doing, or why it's necessary in the RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is going to come in useful for the final implementation if it's someone other than me going to be doing it.
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments so far...
text/0253-cdk-metadata-v2.md
Outdated
files like `index.ts`). If it is, take the file with the file path deepest | ||
in the directory hierarchy. That approach is going to help us identify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is deepest in the directory correct?
I'm a little concerned about the assumptions being made in this algorithm in terms of performance of these scans and jumping into the Nodejs internals. Are they guaranteed to be stable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems so. The API is declared "Stable": https://nodejs.org/api/modules.html
text/0253-cdk-metadata-v2.md
Outdated
const richBucket = new RichBucket(this, 'RichBucket', { ... }); | ||
``` | ||
|
||
### Identifier components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability: Lift this section further up in the document and merge it perhaps with "Collecting construct identifiers"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm yeah. I've been debating the order to introduce stuff here. I feel like I have a cyclic dependency between what I want to collect (specifically: submodule name) and why I decide that I need to collect that (it's because the only thing I can get from NodeJS reflection is the filename).
I guess outlining the subproblems we have to solve first is going to make this section easier to follow.
text/0253-cdk-metadata-v2.md
Outdated
to obtain the module name", for example. This reliance on convention seems | ||
brittle and unflexible. Instead, let's use a configuration file. | ||
|
||
One possible candidate is the jsii manifest. It contains the right module names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the jsii manifest for monocdk-experiment
and here's a snippet for eks Cluster and ecs Cluster -
"name": "Cluster",
"namespace": "aws_ecs",
"properties": [
{
"docs": {
"stability": "experimental",
"summary": "The Amazon Resource Name (ARN) that identifies the cluster."
},
"immutable": true,
"locationInModule": {
"filename": "lib/aws-ecs/lib/cluster.ts",
"line": 85
},
"name": "clusterArn",
"overrides": "monocdk-experiment.aws_ecs.ICluster",
"type": {
"primitive": "string"
}
},
"name": "Cluster",
"namespace": "aws_eks",
"properties": [
{
"docs": {
"remarks": "This role also has `systems:master` permissions.",
"stability": "experimental",
"summary": "An IAM role with administrative permissions to create or update the cluster."
},
"immutable": true,
"locationInModule": {
"filename": "lib/aws-eks/lib/cluster.ts",
"line": 685
},
"name": "adminRole",
"type": {
"fqn": "monocdk-experiment.aws_iam.Role"
}
},
It seems to me that from here we can derive the construct name (name
property), module it is in (namespace
property) and source file (locationInModule
property).
Maybe we can update the JSII manifest to contain a locationInModule
property for where the construct itself is, instead of individual properties.
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locationInModule
is not guaranteed to exist. Only current purpose is to generate links to the GitHub repo in the docs.
Using the JSII manifest limits us to JSII libraries. We might say "well always compile with jsii" but what if for some reason some sister team doesn't want to, but does want tracking?
The JSII manifests are ginormous (22M) and I'd rather avoid reading it if I can.
We would have to make assumptions on filenames that I'm not comfortable with. Today, locationInModule
contains a source path (lib/aws-eks/lib/cluster.ts
), while the runtime path (we still have to read!) is going to refer to the compilation target path /path/to/node_modules/aws-cdk-lib/lib/aws-eks/lib/cluster.js
. So now we have to map .js
to .ts
.
To improve loading speed by a factor of 10 I'd want us to webpack the individual modules in the future (NodeJS takes a rather long time to load each and every individual source file), which means the jsii manifest will contain lib/aws-cdk/lib/cluster.ts
but the runtime path we detect will look like aws-cdk/lib/aws-eks/index.js
. Now we have to map that in addition.
I'd rather not tie these things together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if for some reason some sister team doesn't want to, but does want tracking?
I don't see a use case yet. This will be a nice-to-have, but we probably shouldn't index on this.
To improve loading speed by a factor of 10 I'd want us to webpack the individual modules in the future (NodeJS takes a rather long time to load each and every individual source file), which means the jsii manifest will contain lib/aws-cdk/lib/cluster.ts but the runtime path we detect will look like aws-cdk/lib/aws-eks/index.js. Now we have to map that in addition.
Isn't your current algorithm also subject to the same problems if we change how we pack our output artifacts (webpack or otherwise). Things like - determining module name from file path, using the deepest export - come to mind.
Between this comment and #254 (comment), an option to generate the necessary metadata at compile time or even pack time (so that the path-to-module mapping is consistent) feels worth exploring. Don't you think? At least called out as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just getting into the cdk-metadata.json
section. Maybe this is what you're already doing that file.
Co-authored-by: Niranjan Jayakar <nija@amazon.com>
Co-authored-by: Niranjan Jayakar <nija@amazon.com>
text/0253-cdk-metadata-v2.md
Outdated
|
||
For every construct used in a Stack from a library containing a | ||
`cdk-metadata.json` file, we will collect the library name, version and | ||
qualified class name (using a generated lookup table to obtain submodule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're measuring constructs. class is just the impl.
qualified class name (using a generated lookup table to obtain submodule | |
constructs used from the library (using a generated lookup table to obtain submodule |
text/0253-cdk-metadata-v2.md
Outdated
When we pick option 2, every stack is bound to end up with the constructs `Construct` | ||
and `Resource` (resulting form example code as in `(a)` and `(b)`), leading to noise. | ||
|
||
Nevertheless, we will still pick option 2 (following up the inheritance chain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it makes sense to follow up the inheritance chain and just filter out certain base classes like Construct
and `Resource.
text/0253-cdk-metadata-v2.md
Outdated
to obtain the module name", for example. This reliance on convention seems | ||
brittle and inflexible. Instead, let's use a configuration file. | ||
|
||
#### jsii manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to an appendix of "alternatives considered" since it's just a discussion
Points to discuss:
|
In v2, the `@aws-cdk/core` and `@aws-cdk/cx-api` modules are private. Update the tests so that it will only operate on v1. In v2, we will need to design a new set of tests. This will be included in the re-design of metadata collection - aws/aws-cdk-rfcs#254 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
text/0253-cdk-metadata-v2.md
Outdated
|
||
export class SomeClass { | ||
// Every class gets an additional field added to it | ||
private static [vfqnSym] = 'module.SomeClass@1.2.3'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work for "pure" interfaces? This includes cases where we return a private type that implements a public interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not. This is already explained elsewhere, and is a regrettable problem.
We might conceivably add a feature to jsii-rtti
to manually annotate an anonymous object instance. But that seems like a v2 concern.
/** | ||
* How many classes were skipped before finding an ancestor with type information | ||
* | ||
* This number will be 0 if the object itself was an instance of a class with | ||
* type information. | ||
*/ | ||
readonly classesSkipped: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use-case for this information? Smells of future-proofing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given an inheritance chain that potentially looks like this:
Root -> ClassA -> ClassB -> (unknown class) -> (unknown class) -> INSTANCE
↑
|
this is the first one we can return
Just saying it's a ClassB
is not accurate.
I'm not 100% sure if it makes sense to report ClassB
in this case or simply not report anything. Having this information gives us the flexibility to easily switch between them.
text/0253-cdk-metadata-v2.md
Outdated
|
||
The list of construct identifiers we have discovered may be sizeable. A rough | ||
estimate of the maximum number of constructs used in a stack will be about | ||
400 (maximum of 200 L1s with a corresponding L2 wrapper). We will therefore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to record both L2 and L1 in case the L2 is used? It actually makes sense to omit the L1 in this case I think so we can easily identify direct L1 usage (which is of particular interest for our construct library planning).
The simplest solution would be to omit L1s that are named Resource
and that their patent construct inherits from the Resource base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been on the fence about this, and trying to come up with a more general rule that would also cover this (but failing to).
But I guess it makes sense to just special-case it, you're right.
|
||
Encoding may change the type (for `afx` the encoded value is a `string` but the decoded | ||
value is a `string[]`). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern this might raise is that the metadata sent to aws will become opaque. As mitigation maybe we can emit an additional file to the cloud assembly (analytics.json
) with the unencoded value. This way people will have visibility to it if they examine cdk.out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, and I'm not opposed to it. I'm not convinced that it'll be necessary though, given that the behavior of the afx-encoded string will be such that if you paste it into bash
by doing echo <afx-encoded-string-here>
, bash
brace expansion is going to reconstruct the full list.
It's not like it's a secret, we're publishing the spec out in the open.
But sure, if you feel it should be MORE readable than that, we can also emit the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will add a level of trust
See CDK RFC 253 (aws/aws-cdk-rfcs#254) for background and details. Currently -- if a user has not opted out -- an AWS::CDK::Metadata resource is added to each generated stack template with details about each loaded module and version that matches an Amazon-specific allow list. This modules list is used to: - Track what library versions customers are using so they can be contacted in the event of a severe (security) issue with a library. - Get business metrics on the adoption of CDK and its libraries. This modules list is sometimes inaccurate (a module may be loaded into memory without actually being used) and too braod to support CDK v2. This feature implements (mostly) the specification proposed in RFC 253 to include metadata about what constructs are present in each stack, rather than modules loaded into memory. The allow-list is still used to ensure only CDK/AWS constructs are reported on. Implementation notes: - The format of the Analytics property has changed slightly since the RFC. See the service-side code for justification and latest spec. - How to handle the jsii runtime information was left un-spec'd. I've chosen to create a psuedo-Construct to add to the list as the simplest solution. - `runtime-info.test.ts` leaps through some serious hoops to work equally well for both v1 and v2, and to fail somewhat gracefully locally if `tsc` was used to compile the module instead of `jsii`. Critques of this approach welcome! - I removed an annoyance from `resolve-version-lib.js` that produced error messages when running unit tests.
…13423) See CDK RFC 253 (aws/aws-cdk-rfcs#254) for background and details. Currently -- if a user has not opted out -- an AWS::CDK::Metadata resource is added to each generated stack template with details about each loaded module and version that matches an Amazon-specific allow list. This modules list is used to: - Track what library versions customers are using so they can be contacted in the event of a severe (security) issue with a library. - Get business metrics on the adoption of CDK and its libraries. This modules list is sometimes inaccurate (a module may be loaded into memory without actually being used) and too braod to support CDK v2. This feature (mostly) implements the specification proposed in RFC 253 to include metadata about what constructs are present in each stack, rather than modules loaded into memory. The allow-list is still used to ensure only CDK/AWS constructs are reported on. Implementation notes: - The format of the Analytics property has changed slightly since the RFC. See the service-side code for justification and latest spec. - How to handle the jsii runtime information was left un-spec'd. I've chosen to create a psuedo-Construct to add to the list as the simplest solution. - `runtime-info.test.ts` leaps through some serious hoops to work equally well for both v1 and v2, and to fail somewhat gracefully locally if `tsc` was used to compile the module instead of `jsii`. Critques of this approach welcome! - I removed an annoyance from `resolve-version-lib.js` that produced error messages when running unit tests. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license