-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: 0060 bazel #61
RFC: 0060 bazel #61
Conversation
outputs = ["lib/amazonmq.generated.ts"], | ||
srcs = glob(["lib/**/*.ts"]), | ||
deps = [ | ||
"//packages/@aws-cdk/cdk", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this information is duplicated from the package.json
file, right? When I investigated bazel
a short while ago this made me feel that those BUILD.bazel
file should be generated from existing project meta-information... Duplicating dependency declarations is neither enjoyable nor safe in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in Elad's comment above, while this would be duplicated for now, it would not be duplicated in the long run as the Bazel team is working on auto-extraction of dependencies from package.json files. However, I think it's worth biting the bullet for now, because it is just the short term, and the risks are minimal in my opinion. For instance, when was the last time you updated the dependencies of a long-standing CDK package? And even if you did, if you skipped adding it to the Bazel file, it wouldn't build, and if you skipped adding it to the package.json file, the integration tests wouldn't run.
|
||
For the first point, the current system has the advantage of being a glorified monorepo wrapper around NPM. Bazel, | ||
on the other hand, has a completely different semantic structure. While using it is straightforward for the most part, | ||
understanding how it works and more importantly how to extend it (by writing custom rules, for example) is not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this statement to be more substantiated. I think we're doing stuff in a way that is very much "how you do it in the node ecosystem"; but I'm also no expert on the subject (I mean, AWS CDK is the first ever node
project I work on) and am willing to believe I may be misled by my own perceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lerna
, while certainly a capable tool, is also a stagnant one. In essence, it's a glorified symlink manager for grouped dependencies in one repository. At one point in time, this was excellent and all that was needed for Node development. However, the size and scope of Node projects in general has increased in complexity, while Lerna's capabilities have not. To a certain degree, Yarn workspaces are trying to help with the innovative process, but it has a long way to go.
I'm not saying that Bazel is "the way to do things" in the JavaScript world (not yet, anyways), but it certainly has its advantages for a sprawling monorepo like the CDK. This would be very different if the CDK were a single module, but it's not. Bazel was built to solve the issue of tons of modules in various states across developers. Lerna was build to solve the issue of a developer with a couple related libraries housed in one repository.
|
||
## Rationale | ||
* Bazel is incremental, meaning that only the changed aspects of the project and its dependencies need to be rebuilt | ||
* Bazel caches all of its builds, making it much faster than the current solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast is desirable. Safe is required. Cache often introduces safety issues, and I think it'd be interesting to add some wording around how Bazel makes sure it's caching mechanism is exempt of invalidation bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something I can add in a references section?
* Bazel is highly extensible, meaning custom rules can be written solely in TypeScript/JavaScript as opposed to Bash, but | ||
still incorporated easily into the toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback that has not been mentioned earlier is that Bazel introduces yet another dependency in the mix that you need to install (and later maintain) in order to contribute to CDK. I believe it can be installed from npm
(mitigating this problem to a large extent), but I think it needs to be mentioned in the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be installed from NPM, and in that way doesn't introduce that much overhead. But definitely worth a mention, you're totally right.
dependencies would be added to the top-level package.json, in addition to the nested dependencies from all of the | ||
subpackages. There should not be too many, since most of the "dependencies" are simply other CDK subpackages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds ripe for automation/generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, it's in progress, but there's no committed timeline as of yet.
|
||
# Unresolved questions | ||
|
||
* Will the project structure of the CDK remain stable over time, or will it move to a singular import? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny you'd ask that 🤣 we've been talking about it quite a lot lately... But we're not quite sure just yet whether this is the right move or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the move to Bazel helps the current setup a lot, it would probably help a single-module setup a little bit less. It can certainly be used in both cases though. But just a matter of how much of a commitment we want to make now.
This proposal is for migrating the build system to [Bazel](https://bazel.build/), as it solves | ||
many if not all of the deficiencies of the current system. This involves adding Bazel resources and rules to the repository to manage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here - or in motivations - I would like to have more verbiage around what "all the deficiencies of the current system" are. Because as much as I'm willing to believe they exist, not explicitly listing them out means:
- We may have different perceptions of what the issues are
- We won't be able to determine whether the proposal actually addresses those
|
||
The CDK currently uses a combination of [Lerna](https://lerna.js.org/), [Yarn Workspaces](https://yarnpkg.com/lang/en/docs/workspaces/), | ||
and Bash scripts to orchestrate build and test workflows. This has several advantages for managing a monorepo structure, but can also | ||
be very slow, difficult to work with and extend, and hard to parallelize amongst the team (ie when one unrelated resource changes, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current thing fells not difficult to extend/parallelize to me. I'd love having some examples, or maybe a little tighter wording (I may simply be interpreting it wrong!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to be more about how Bazel is literally "plug-any-play" with any rule. An output from one rule can easily be used and transformed by another. And yes, while you can do that with the current system as well, there are less guardrails and tooling in place to ensure that it works well out-of-the-box. Maybe not a dealbreaker, but it was noticeable to me. I can try to add an example to illustrate this, but I may just remove the verbiage too if it's too misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I absolutely love your submitting this RFC! It's really interesting, and having looked into this myself (probably as a reaction to your suggesting it in an issue earlier) means I get to have a deeper look into what'd make it desirable...
This is a great start, although as I have commented I would really like it to go deeper on the details of what it'd improve, etc... There's also opportunity for writing minimal tooling to actually generate a lot of the needed configuration from the existing (and required) metadata documentation of a node project (which I'd like factored in; or proven detrimental to the whole premise of the change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely needs more "meat" in terms of the problems this would solve. On the surface definitely feels like this can be valuable.
I am wondering if there is a way for us to prototype this at a smaller scale with less risk (maybe in jsii, @RomainMuller what do you think)?
# Basic Example | ||
|
||
Instead of running `bash build.sh` or `npm run build` on a subpackage, an engineer would simply run `bazel build <path-to-resource>` or | ||
`bazel test <path-to-resource`. Running a specific Bazel rule (for instance orchestrating a release) is also straightforward, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`bazel test <path-to-resource`. Running a specific Bazel rule (for instance orchestrating a release) is also straightforward, with | |
`bazel test <path-to-resource>`. Running a specific Bazel rule (for instance orchestrating a release) is also straightforward, with |
|
||
# Basic Example | ||
|
||
Instead of running `bash build.sh` or `npm run build` on a subpackage, an engineer would simply run `bazel build <path-to-resource>` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "resource"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about use cases like:
- buildup/builddown
- foreach
How do they map to Bazel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A resource in this case could be any build/test target in a subpackage. For instance, you could have one resource as the Lambdas that support a package, another for the API layer for that package, and another for the test suite that brings them together.
I'm not entirely sure what you mean by buildup/builddown and foreach. Can you give me an example of both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUILD.bazel | ||
package.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there duplication between BUILD.bazel and package.json? For example, where do we specify the module's dependencies all the jsii information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would do that in the BUILD.bazel, similarly to how package.json handles it. There is work underway right now to pull the dependencies automatically from the package.json, but this would not extend to jsii
metadata. Personally I think it's ok to migrate jsii
information out of package.json, especially since it has no bearing on Node distributions (kind of breaking the contract of a Node distributed module).
In the meantime it would be duplicated, which I cop to is increased work, but the dependencies of packages change so infrequently, and if it's work that's missed it is also easily detectable.
The `BUILD` files themselves are quite simple, since Bazel has rules built-in for compiling a TypeScript library and for | ||
running JavaScript tests. We would need to write custom rules in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CDK is built with the jsii
compiler (which uses typescript under the hood). What are the implications?
Also, what are the implications on all the tools in tools/cdk-build-tools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of those tools would get wrapped with Bazel rules. This would "supercharge" them with the benefits of Bazel with minimal added work to migrate the existing build system. It also gives a great template for adding more scripts to the tools
part of the repository.
The implications for jsii
are exactly the same, we would just wrap invoking jsii
with a Bazel rule, which augments it with incrementality and all of the other Bazel-associated benefits.
``` | ||
|
||
The `pkg_npm` rule is also built-in to Bazel, and bundles a package based on given sources, along with other needed files like a package.json. | ||
Note that the `cdk_library` rule is only needed for packages that have a CFN to TS transformation. If that's not needed, it's simply a `ts_library` rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really because we use jsii
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better then!
|
||
There are two main drawbacks to migrating to Bazel: | ||
|
||
1. Lack of familiarity with Bazel amongst the team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and contributors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, but like what the Angular team does, we could include a short reference guide, linked to from the contributor's guide. If your goal is to simply develop for the CDK, and not the tooling behind it, the commands to enable contributions are simple and easy to use.
There are two main drawbacks to migrating to Bazel: | ||
|
||
1. Lack of familiarity with Bazel amongst the team | ||
1. Relative newness of Bazel in the JavaScript/TypeScript ecosystem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there examples of large typescript projects that use bazel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Angular and RxJS come to mind, I'll try and find a few others.
The Bazel rules can be added in parallel with current efforts by the team. None of the resources involved would | ||
overwrite or interfere with the current project structure, so adding files would not interfere with developer flow. | ||
Switching over to Bazel could also be opt-in as people want to try it, with the legacy system in place to manage | ||
releases and for developers to fallback on if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in the transition we will have to maintain two sets of build definitions, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maintain" implies that both are ready for use, but with the adoption of any new system there is always a migratory period. How long that takes is up to the team, but while the new system is being built, there is only one system that builds the project, which needs to be maintained. The new one is being built, which would have to be done anyway. In the transition, it could be made clear that the legacy system is not supported during the migration, or that it will only be supported for a short time.
|
||
# Future Possibilities | ||
|
||
* We could setup Remote Build Execution (RBE) using a fleet on EC2 or ECS, further speeding up the team's builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that instead of using your local machine to execute build and test tasks, your Bazel instance would "outsource" the job to a team-available fleet (only accessible by trusted contributors, of course). Since Bazel is parallelizable, the work for a large and complex build task like the CDK could be spread out over 3 or 4 EC2 hosts and return the result to you locally in a fraction of the time it would otherwise take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please request review when you are done with the next revision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent comments! I'll incorporate as many as I can into the RFC's next draft. I'll wait a little bit to get any responses to my responses before I do though.
The Bazel rules can be added in parallel with current efforts by the team. None of the resources involved would | ||
overwrite or interfere with the current project structure, so adding files would not interfere with developer flow. | ||
Switching over to Bazel could also be opt-in as people want to try it, with the legacy system in place to manage | ||
releases and for developers to fallback on if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Maintain" implies that both are ready for use, but with the adoption of any new system there is always a migratory period. How long that takes is up to the team, but while the new system is being built, there is only one system that builds the project, which needs to be maintained. The new one is being built, which would have to be done anyway. In the transition, it could be made clear that the legacy system is not supported during the migration, or that it will only be supported for a short time.
|
||
The CDK currently uses a combination of [Lerna](https://lerna.js.org/), [Yarn Workspaces](https://yarnpkg.com/lang/en/docs/workspaces/), | ||
and Bash scripts to orchestrate build and test workflows. This has several advantages for managing a monorepo structure, but can also | ||
be very slow, difficult to work with and extend, and hard to parallelize amongst the team (ie when one unrelated resource changes, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was meant to be more about how Bazel is literally "plug-any-play" with any rule. An output from one rule can easily be used and transformed by another. And yes, while you can do that with the current system as well, there are less guardrails and tooling in place to ensure that it works well out-of-the-box. Maybe not a dealbreaker, but it was noticeable to me. I can try to add an example to illustrate this, but I may just remove the verbiage too if it's too misleading.
|
||
# Basic Example | ||
|
||
Instead of running `bash build.sh` or `npm run build` on a subpackage, an engineer would simply run `bazel build <path-to-resource>` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A resource in this case could be any build/test target in a subpackage. For instance, you could have one resource as the Lambdas that support a package, another for the API layer for that package, and another for the test suite that brings them together.
I'm not entirely sure what you mean by buildup/builddown and foreach. Can you give me an example of both?
BUILD.bazel | ||
package.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would do that in the BUILD.bazel, similarly to how package.json handles it. There is work underway right now to pull the dependencies automatically from the package.json, but this would not extend to jsii
metadata. Personally I think it's ok to migrate jsii
information out of package.json, especially since it has no bearing on Node distributions (kind of breaking the contract of a Node distributed module).
In the meantime it would be duplicated, which I cop to is increased work, but the dependencies of packages change so infrequently, and if it's work that's missed it is also easily detectable.
The `BUILD` files themselves are quite simple, since Bazel has rules built-in for compiling a TypeScript library and for | ||
running JavaScript tests. We would need to write custom rules in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of those tools would get wrapped with Bazel rules. This would "supercharge" them with the benefits of Bazel with minimal added work to migrate the existing build system. It also gives a great template for adding more scripts to the tools
part of the repository.
The implications for jsii
are exactly the same, we would just wrap invoking jsii
with a Bazel rule, which augments it with incrementality and all of the other Bazel-associated benefits.
|
||
## Rationale | ||
* Bazel is incremental, meaning that only the changed aspects of the project and its dependencies need to be rebuilt | ||
* Bazel caches all of its builds, making it much faster than the current solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something I can add in a references section?
* Bazel is highly extensible, meaning custom rules can be written solely in TypeScript/JavaScript as opposed to Bash, but | ||
still incorporated easily into the toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be installed from NPM, and in that way doesn't introduce that much overhead. But definitely worth a mention, you're totally right.
dependencies would be added to the top-level package.json, in addition to the nested dependencies from all of the | ||
subpackages. There should not be too many, since most of the "dependencies" are simply other CDK subpackages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, it's in progress, but there's no committed timeline as of yet.
|
||
# Unresolved questions | ||
|
||
* Will the project structure of the CDK remain stable over time, or will it move to a singular import? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the move to Bazel helps the current setup a lot, it would probably help a single-module setup a little bit less. It can certainly be used in both cases though. But just a matter of how much of a commitment we want to make now.
|
||
# Future Possibilities | ||
|
||
* We could setup Remote Build Execution (RBE) using a fleet on EC2 or ECS, further speeding up the team's builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that instead of using your local machine to execute build and test tasks, your Bazel instance would "outsource" the job to a team-available fleet (only accessible by trusted contributors, of course). Since Bazel is parallelizable, the work for a large and complex build task like the CDK could be spread out over 3 or 4 EC2 hosts and return the result to you locally in a fraction of the time it would otherwise take.
PS I think the idea of prototyping this with |
Closing this for now for lack of progress. Please reopen when ready. |
title: "RFC: #60 Bazel"
Notes
Rendered
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache-2.0 license