Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-cdk-lib): Use a streaming json implementation when serializing the tree.json file #27261

Open
1 of 2 tasks
bmoffatt opened this issue Sep 22, 2023 · 8 comments
Open
1 of 2 tasks
Assignees
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. p1

Comments

@bmoffatt
Copy link

Describe the feature

The tree.json file should be able to be arbitrarily large. Currently it is limited to ~512mb, node's max string size!

Use Case

I have a growing CDK application which is running into limitations of node.js when the tree.json file is being written during synthesis. I'm hitting the max string size!

RangeError: Invalid string length
    at JSON.stringify (<anonymous>)
    at TreeMetadata._synthesizeTree (/<redacted>/node_modules/aws-cdk-lib/core/lib/private/tree-metadata.js:1:1457)

With either of my workarounds, my tree.json ends up being ~630mb.

Proposed Solution

I tried patching aws-cdk-lib to use json-stream-stringify, and while this appeared to resolve my synthesis failures, it didn't pass all the unit tests, presumably because the file write no longer happens synchronously. I'd be willing to spend more time opening a pull-request and iterating on this approach if ya'll think it's reasonable.

bmoffatt@91edfec


The way I've actually unblocked myself before developing the above library change was patching my node.js build - I went this way because I knew I'd be able to validate it faster than learning how to modify and test and contributions to the aws-cdk. Obviously I don't want to maintain either patch long term :)

diff -ruN dist/node-v18.16.1/deps/v8/include/v8-primitive.h dist/node-v18.16.1-patched/deps/v8/include/v8-primitive.h
--- dist/node-v18.16.1/deps/v8/include/v8-primitive.h	2023-06-20 12:50:23.000000000 +0000
+++ dist/node-v18.16.1-patched/deps/v8/include/v8-primitive.h	2023-07-27 21:13:09.852080196 +0000
@@ -123,7 +123,7 @@
 class V8_EXPORT String : public Name {
  public:
   static constexpr int kMaxLength =
-      internal::kApiSystemPointerSize == 4 ? (1 << 28) - 16 : (1 << 29) - 24;
+      internal::kApiSystemPointerSize == 4 ? (1 << 28) - 16 : (1 << 30) - 24;
 
   enum Encoding {
     UNKNOWN_ENCODING = 0x1,

An alternative solution to my problem would be making the generation of the tree.json optional. I'm new-ish to CDK development and don't really know what it's for 🤷‍♂️ - my light googling makes me think it's an optional bit of metadata for use with visualizers

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

v2.91.0

Environment details (OS name and version, etc.)

macOS

@bmoffatt bmoffatt added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Sep 22, 2023
@github-actions github-actions bot added the aws-cdk-lib Related to the aws-cdk-lib package label Sep 22, 2023
@peterwoodworth
Copy link
Contributor

peterwoodworth commented Sep 25, 2023

Thanks for the request @bmoffatt,

I have never seen an error related to generating tree.json before. I'm curious if you are able to share a way to reproduce this? If this is reasonable to run into in some use cases we can take this as a bug report

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Sep 25, 2023
@bmoffatt
Copy link
Author

Here's an (unreasonable) reproduction of the error :) - https://github.com/bmoffatt/cdk-tree-limit/tree/main - 9001 stacks each with 101 sqs queues!

The real-world application I'm working with is several large stacks, some with 100s of resources, fanned out to every AWS region.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Sep 26, 2023
@peterwoodworth peterwoodworth added bug This issue is a bug. p1 needs-review and removed feature-request A feature should be added or improved. needs-review labels Sep 26, 2023
@mikewrighton mikewrighton self-assigned this Oct 10, 2023
@mikewrighton
Copy link
Contributor

Seems like there's a way to avoid the tree.json being generated, with:

const app = new cdk.App({
  treeMetadata: false
});

Is this a reasonable workaround for you?

@bmoffatt
Copy link
Author

Oh nice!

Trying it out briefly, I can see it'll break a workflow used for diff-ing changes during development and code review. I'll have to check with my team to see how feasible it is to change or go without.

@bmoffatt
Copy link
Author

bmoffatt commented Jan 3, 2024

So while no-one has complained yet about losing access to cdk diffs, someone on my team recently ran into an issue with an internal platform tool barfing on not being able to find and walk the tree file. For now, we're back to using my shady node patch workaround.

@scanlonp scanlonp added the @aws-cdk/core Related to core CDK functionality label Apr 11, 2024
@moelasmar moelasmar removed the aws-cdk-lib Related to the aws-cdk-lib package label Aug 28, 2024
@rowantran
Copy link

+1 on this issue -- one of our applications recently passed the threshold to trigger this issue, and setting treeMetadata to false breaks some of our workflows that rely on the tree.json.

We may be able to use something like @bmoffatt 's patch for now to get around this, but we definitely don't want to rely on a patched NodeJS longterm either :)

Would appreciate if the CDK team is able to prioritize this fix!

@comcalvi comcalvi self-assigned this Oct 4, 2024
@SaadRehmanCS
Copy link

SaadRehmanCS commented Oct 4, 2024

+1 our team has encountered this issue and we're currently working around it with the above mentioned ways but we're waiting for a long term fix :)

@rix0rrr rix0rrr self-assigned this Oct 14, 2024
@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 18, 2024

We can switch to a streaming implementation for the writer, but this is used in the AWS plugin for VSCode which is also written in JS, and then it will refuse to read that large file.

As another way to make sure the file doesn't grow too large (a technique we could also apply to the manifest.json), how about the following:

  • After the tree is built, we go through it breadth-first.
  • As soon as we are approaching the size limits (we'll count that simply by counting nodes, not serialized bytes that's too finicky) we are going to replace nodes at the next level with a "your tree is in another castle" node, with a type, message and filename indicating that.
  • We write the tree sharded over files.

That way, consumers still have something, they just lose browseable detail if they have humongous trees, until the code is changed to deal with the indirection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality bug This issue is a bug. p1
Projects
None yet
Development

No branches or pull requests

9 participants