Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: introduce mechanism to implant generated code from the bazel sandbox back into the source tree #58018

Closed
irfansharif opened this issue Dec 17, 2020 · 7 comments
Assignees
Labels
A-build-system C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-dev-inf

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Dec 17, 2020

For bazel generated files, especially pb.go ones once we make that happen (#56067), we’ll want a way for it to exist outside the sandbox and in the source tree. The reason being that IDEs would want to know where certain definitions exist, and it’s difficult to reach individual IDEs/editors to look within the sandbox. (Related: bazel-contrib/rules_go#512)

See https://github.com/dropbox/dbx_build_tools, the folks over at dropbox have a way to explicitly pull out bazel generated files out of the sandbox and into the working directory for what I think is this exact reason.

Epic CRDB-8036

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-build-system labels Dec 17, 2020
@irfansharif
Copy link
Contributor Author

I believe this mechanism should be folded into #56965.

irfansharif added a commit that referenced this issue Dec 17, 2020
Part of #57801. Generating these files within the sandbox was a bit
tricky seeing as how the optgen language compiler uses itself to
generate its AST expressions. This is a form of compiler
bootstrapping[1], and needed some manual overrides to fit into how Bazel
wants things to be.

To this end, we define a second `go_library` target[2] that sources
pre-generated `og.go` files that are already checked into the source
tree. These files are then used to compile `langgen`, which is in turn
used to generate newer revisions of those same generated files. All
dependants of [3] will need to depend on the og.go files generated in
the Bazel sandbox, which is achieved by [4]. Conversely, to instruct
gazelle/Bazel to resolve langgen's import of [3] appropriately, we added
a custom resolve directive.

We'll eventually want to be able to implant the sandbox generated files
back into the source tree. This will be necessary for IDEs, but also for
a sane workflow for the engineers working on langgen. If they were only
using Bazel today, langgen dependent builds would misbehave as we
wouldn't be updating the checked-in og.go files used to build langgen
itself (assuming any diffs there would result in different
code/compilation behaviour). See
#58018.

[1]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[2]: //pkg/sql/opt/optgen/lang:bootstrap
[3]: github.com/cockroachdb/cockroach/pkg/sql/opt/optgen/lang
[4]: See the "gazelle:resolve" directive in the top-level BUILD.bazel file

Release note: None
irfansharif added a commit that referenced this issue Dec 17, 2020
Part of #57801. Generating these files within the sandbox was a bit
tricky seeing as how the optgen language compiler uses itself to
generate its AST expressions. This is a form of compiler
bootstrapping[1], and needed some manual overrides to fit into how Bazel
wants things to be.

To this end, we define a second `go_library` target[2] that sources
pre-generated `og.go` files that are already checked into the source
tree. These files are then used to compile `langgen`, which is in turn
used to generate newer revisions of those same generated files. All
dependants of [3] will need to depend on the og.go files generated in
the Bazel sandbox, which is achieved by [4]. Conversely, to instruct
gazelle/Bazel to resolve langgen's import of [3] appropriately, we added
a custom resolve directive.

We'll eventually want to be able to implant the sandbox generated files
back into the source tree. This will be necessary for IDEs, but also for
a sane workflow for the engineers working on langgen. If they were only
using Bazel today, langgen dependent builds would misbehave as we
wouldn't be updating the checked-in og.go files used to build langgen
itself (assuming any diffs there would result in different
code/compilation behaviour). See
#58018.

[1]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[2]: //pkg/sql/opt/optgen/lang:bootstrap
[3]: github.com/cockroachdb/cockroach/pkg/sql/opt/optgen/lang
[4]: See the "gazelle:resolve" directive in the top-level BUILD.bazel file

Release note: None
irfansharif added a commit that referenced this issue Dec 17, 2020
Part of #57801. Generating these files within the sandbox was a bit
tricky seeing as how the optgen language compiler uses itself to
generate its AST expressions. This is a form of compiler
bootstrapping[1], and needed some manual overrides to fit into how Bazel
wants things to be.

To this end, we define a second `go_library` target[2] that sources
pre-generated `og.go` files that are already checked into the source
tree. These files are then used to compile `langgen`, which is in turn
used to generate newer revisions of those same generated files. All
dependants of [3] will need to depend on the og.go files generated in
the Bazel sandbox, which is achieved by [4]. Conversely, to instruct
gazelle/Bazel to resolve langgen's import of [3] appropriately, we added
a custom resolve directive.

We'll eventually want to be able to implant the sandbox generated files
back into the source tree. This will be necessary for IDEs, but also for
a sane workflow for the engineers working on langgen. If they were only
using Bazel today, langgen dependent builds would misbehave as we
wouldn't be updating the checked-in og.go files used to build langgen
itself (assuming any diffs there would result in different
code/compilation behaviour). See
#58018.

[1]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[2]: //pkg/sql/opt/optgen/lang:bootstrap
[3]: github.com/cockroachdb/cockroach/pkg/sql/opt/optgen/lang
[4]: See the "gazelle:resolve" directive in the top-level BUILD.bazel file

Release note: None
irfansharif added a commit that referenced this issue Dec 17, 2020
Part of #57801. Generating these files within the sandbox was a bit
tricky seeing as how the optgen language compiler uses itself to
generate its AST expressions. This is a form of compiler
bootstrapping[1], and needed some manual overrides to fit into how Bazel
wants things to be.

To this end, we define a second `go_library` target[2] that sources
pre-generated `og.go` files that are already checked into the source
tree. These files are then used to compile `langgen`, which is in turn
used to generate newer revisions of those same generated files. All
dependants of [3] will need to depend on the og.go files generated in
the Bazel sandbox, which is achieved by [4]. Conversely, to instruct
gazelle/Bazel to resolve langgen's import of [3] appropriately, we added
a custom resolve directive.

We'll eventually want to be able to implant the sandbox generated files
back into the source tree. This will be necessary for IDEs, but also for
a sane workflow for the engineers working on langgen. If they were only
using Bazel today, langgen dependent builds would misbehave as we
wouldn't be updating the checked-in og.go files used to build langgen
itself (assuming any diffs there would result in different
code/compilation behaviour). See
#58018.

[1]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[2]: //pkg/sql/opt/optgen/lang:bootstrap
[3]: github.com/cockroachdb/cockroach/pkg/sql/opt/optgen/lang
[4]: See the "gazelle:resolve" directive in the top-level BUILD.bazel file

Release note: None
craig bot pushed a commit that referenced this issue Dec 17, 2020
58019: lang{,gen},bazel: generate more files within the sandbox r=irfansharif a=irfansharif

Part of #57801. Generating these files within the sandbox was a bit
tricky seeing as how the optgen language compiler uses itself to
generate its AST expressions. This is a form of compiler
bootstrapping [1], and needed some manual overrides to fit into how Bazel
wants things to be.

To this end, we define a second `go_library` target [2] that sources
pre-generated `og.go` files that are already checked into the source
tree. These files are then used to compile `langgen`, which is in turn
used to generate newer revisions of those same generated files. All
dependants of [3] will need to depend on the og.go files generated in
the Bazel sandbox, which is achieved by [4]. Conversely, to instruct
gazelle/Bazel to resolve langgen's import of [3] appropriately, we added
a custom resolve directive.

We'll eventually want to be able to implant the sandbox generated files
back into the source tree. This will be necessary for IDEs, but also for
a sane workflow for the engineers working on langgen. If they were only
using Bazel today, langgen dependent builds would misbehave as we
wouldn't be updating the checked-in og.go files used to build langgen
itself (assuming any diffs there would result in different
code/compilation behaviour). See
#58018.

```
[1]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
[2]: //pkg/sql/opt/optgen/lang:bootstrap
[3]: github.com/cockroachdb/cockroach/pkg/sql/opt/optgen/lang
[4]: See the "gazelle:resolve" directive in the top-level BUILD.bazel file
```
Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
rickystewart added a commit to rickystewart/cockroach that referenced this issue May 4, 2021
* Add a target to build `docs/generated/logging.md` in the Bazel
  sandbox.
* Add some machinery to run `pkg/util/fsm/gen` in the sandbox, and add
  a target in `pkg/sql` to generate the `txnstatetransitions` files.
* A couple build targets that were already using `//pkg/util/log/gen`
  had it in their `srcs` list instead of the `exec_tools` list, so fix
  that in a few places.
* Augment the `check-genfiles` script to check that no new `go:generate`
  comments are added. This will help us ensure the shape of the build
  doesn't change under our feet too much.

The following work items still remain to be done:

* Start building this generated code in CI to ensure that there aren't
  regressions.
* Start consuming generated file targets from `dev` and stage them in
  the working directory (cockroachdb#58018).
* The following files contain `go:generate` comments that are not
  bazelfied: `pkg/ccl/sqlproxyccl/admitter/service.go`,
  `pkg/ccl/sqlproxyccl/tenant/directory.go`,
  `pkg/security/certmgr/cert.go`, and `pkg/server/api_v2.go`.

Release note: None
rickystewart added a commit to rickystewart/cockroach that referenced this issue May 10, 2021
* Add a target to build `docs/generated/logging.md` in the Bazel
  sandbox.
* Add some machinery to run `pkg/util/fsm/gen` in the sandbox, and add
  a target in `pkg/sql` to generate the `txnstatetransitions` files.
* A couple build targets that were already using `//pkg/util/log/gen`
  had it in their `srcs` list instead of the `exec_tools` list, so fix
  that in a few places.
* Augment the `check-genfiles` script to check that no new `go:generate`
  comments are added. This will help us ensure the shape of the build
  doesn't change under our feet too much.

The following work items still remain to be done:

* Start building this generated code in CI to ensure that there aren't
  regressions.
* Start consuming generated file targets from `dev` and stage them in
  the working directory (cockroachdb#58018).
* The following files contain `go:generate` comments that are not
  bazelfied: `pkg/ccl/sqlproxyccl/admitter/service.go`,
  `pkg/ccl/sqlproxyccl/denylist/service.go`,
  `pkg/ccl/sqlproxyccl/tenant/directory.go`,
  `pkg/security/certmgr/cert.go`, and `pkg/server/api_v2.go`.

Release note: None
craig bot pushed a commit that referenced this issue May 10, 2021
64658: bazel: build some generated files in the bazel sandbox r=rail a=rickystewart

* Add a target to build `docs/generated/logging.md` in the Bazel
  sandbox.
* Add some machinery to run `pkg/util/fsm/gen` in the sandbox, and add
  a target in `pkg/sql` to generate the `txnstatetransitions` files.
* A couple build targets that were already using `//pkg/util/log/gen`
  had it in their `srcs` list instead of the `exec_tools` list, so fix
  that in a few places.
* Augment the `check-genfiles` script to check that no new `go:generate`
  comments are added. This will help us ensure the shape of the build
  doesn't change under our feet too much.

The following work items still remain to be done:

* Start building this generated code in CI to ensure that there aren't
  regressions.
* Start consuming generated file targets from `dev` and stage them in
  the working directory (#58018).
* The following files contain `go:generate` comments that are not
  bazelfied: `pkg/ccl/sqlproxyccl/admitter/service.go`,
  `pkg/ccl/sqlproxyccl/denylist/service.go`,
  `pkg/ccl/sqlproxyccl/tenant/directory.go`,
  `pkg/security/certmgr/cert.go`, and `pkg/server/api_v2.go`.


Release note: None

Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@rickystewart
Copy link
Collaborator

quoth irfan who poked at this during breather week:

this is too difficult so I’ll give up. the basic version through dev where it searches through bazel to find all generated pb.go files is easy enough, but I don’t think it’s the right approach. It doesn’t take advantage of any sort of lazy evaluation (“don’t copy any files if nothing’s changed”), and to generate the proto files, we’d basically have to re-build the entire source tree (there’s no easy way to bazel build a go_proto_library target).
I think the right approach would be adding a rule, that can depend on go_proto_library targets, use the rules api to determine the right dependencies, run the specific GoProtocGen action to generate the right go_generated_src file, and copy that over in the right path. when expressed as a rule that depends on bazel targets, we’d get the right kind of lazy evaluation and transitive ordering.

@irfansharif
Copy link
Contributor Author

irfansharif commented Jul 8, 2021

Some relevant links: bazel-contrib/rules_go#2111. I think the long-term strategy the bazel/rules_go folks are thinking about with GOPACKAGESDRIVER (bazel-contrib/rules_go#512, https://pkg.go.dev/golang.org/x/tools/go/packages?utm_source=godoc), though cool, is probably too much magic for me for the short-term. I also suspect the path to adoption for IDEs might be slow: https://youtrack.jetbrains.com/issue/GO-8753. I still think copying over the generated files (either in the dumb out-of-band manner, or in a smarter, rules-driven manner) is the way to go.

@irfansharif
Copy link
Contributor Author

I haven't studied the https://github.com/dropbox/dbx_build_tools too closely. @otan or @AlexTalks, by any chance do you know where this "copy over generated files" code lives? Or know a friend who knows?

@otan
Copy link
Contributor

otan commented Jul 8, 2021

gazel was done here: https://github.com/dropbox/dbx_build_tools/blob/master/build_tools/bzl_lib/gazel.py
and BUILD.in -> BUILD.bazel is done here: https://github.com/dropbox/dbx_build_tools/blob/master/build_tools/bzl_lib/gen_build_go.py#L75

note we didn't really allow go directives (e.g. go:generate ...), so we'll have to make our own story for that.

rickystewart added a commit to rickystewart/cockroach that referenced this issue Aug 11, 2021
One might be expected that this command be implemented as
`dev generate protobuf` or `dev generate execgen` or similar, but that's
not really practical for a couple reasons:

1. Building all the protobuf in-tree isn't really possible on its own:
   see cockroachdb#58018 (comment).
2. For non-proto generated code, one can imagine that we do a bunch of
   `bazel query`s to list all the generated files in tree and to find
   which files are generated. Given that there are hundreds of generated
   files in tree, and a single `bazel query` can take ~1s, this is not
   tractable.

The implementation here is ad-hoc and a bit hacky. Basically we search
`bazel-bin` for files ending in `.go` and apply some heuristics to
figure out whether they should be copied and where those files should
go.

This is gated behind a feature flag for now because actually copying
these files into the workspace right now dirties a ton of files. (The
diffs are benign -- mostly comments.) We can flip this option to be on
by default when it's safe (probably after the `Makefile` is deleted).

Closes cockroachdb#68565.

Release note: None
rickystewart added a commit to rickystewart/cockroach that referenced this issue Aug 11, 2021
One might be expected that this command be implemented as
`dev generate protobuf` or `dev generate execgen` or similar, but that's
not really practical for a couple reasons:

1. Building all the protobuf in-tree isn't really possible on its own:
   see cockroachdb#58018 (comment).
2. For non-proto generated code, one can imagine that we do a bunch of
   `bazel query`s to list all the generated files in tree and to find
   which files are generated. Given that there are hundreds of generated
   files in tree, and a single `bazel query` can take ~1s, this is not
   tractable.

The implementation here is ad-hoc and a bit hacky. Basically we search
`bazel-bin` for files ending in `.go` and apply some heuristics to
figure out whether they should be copied and where those files should
go.

This is gated behind a feature flag for now because actually copying
these files into the workspace right now dirties a ton of files. (The
diffs are benign -- mostly comments.) We can flip this option to be on
by default when it's safe (probably after the `Makefile` is deleted).

Closes cockroachdb#68565.

Release note: None
rickystewart added a commit to rickystewart/cockroach that referenced this issue Aug 12, 2021
One might be expected that this command be implemented as
`dev generate protobuf` or `dev generate execgen` or similar, but that's
not really practical for a couple reasons:

1. Building all the protobuf in-tree isn't really possible on its own:
   see cockroachdb#58018 (comment).
2. For non-proto generated code, one can imagine that we do a bunch of
   `bazel query`s to list all the generated files in tree and to find
   which files are generated. Given that there are hundreds of generated
   files in tree, and a single `bazel query` can take ~1s, this is not
   tractable.

The implementation here is ad-hoc and a bit hacky. Basically we search
`bazel-bin` for files ending in `.go` and apply some heuristics to
figure out whether they should be copied and where those files should
go.

This is gated behind a feature flag for now because actually copying
these files into the workspace right now dirties a ton of files. (The
diffs are benign -- mostly comments.) We can flip this option to be on
by default when it's safe (probably after the `Makefile` is deleted).

Closes cockroachdb#68565.

Release note: None
craig bot pushed a commit that referenced this issue Aug 12, 2021
68102: storage/storageccl: Add an option to break export mid key r=pbardea,sumeerbhola a=aliher1911

Previously when exporting ranges with very large number of versions
export could hit the situation where single key export would exceed
maximum export byte limit. To resolve this, safety threshold has to
be raised which could cause nodes going out of memory.

To address this, this patch adds an ability to stop and resume on
the arbitrary key timestamp so that it could be stopped mid key and
resumed later.

Release note: None

Fixes #68231

68390: ui: rebuild the databases pages r=matthewtodd a=matthewtodd

Previously, the databases pages featured an older, outdated design. This
change aligns their UX with that of the other modern pages in the DB
console, statements, transactions, and sessions.

This is a first pass at the job. We will return to layer on more
features, including search, and more data, including node & region
information.

Addresses #65737.

Before | After
--- | ---
<img width="1372" alt="Screen Shot 2021-08-05 at 4 54 31 PM" src="https://user-images.githubusercontent.com/5261/128420405-7a096c2f-32a1-42d9-a8f0-186f699fb612.png"> | <img width="1372" alt="Screen Shot 2021-08-03 at 2 50 12 PM" src="https://user-images.githubusercontent.com/5261/128070069-28d0f461-69dd-47cb-a04b-bc6cd0126ca4.png">
<img width="1372" alt="Screen Shot 2021-08-05 at 4 54 57 PM" src="https://user-images.githubusercontent.com/5261/128420491-71d03570-1c0d-413b-ab40-09b45cf614ee.png"> | <img width="1372" alt="Screen Shot 2021-08-03 at 2 50 21 PM" src="https://user-images.githubusercontent.com/5261/128070090-bd62d125-a1b2-4c83-b0a5-33d0723849eb.png">
<img width="1372" alt="Screen Shot 2021-08-05 at 5 03 37 PM" src="https://user-images.githubusercontent.com/5261/128420959-338ea79b-abf0-4773-91bf-cecac64fec5f.png"> | (This overview of database grants no longer exists.)
(This rolled-up view of table grants did not previously exist.) | <img width="1372" alt="Screen Shot 2021-08-03 at 2 50 27 PM" src="https://user-images.githubusercontent.com/5261/128070100-1305c94f-c017-40a5-ba2d-f19fd61bdecf.png">
<img width="1372" alt="Screen Shot 2021-08-05 at 4 55 21 PM" src="https://user-images.githubusercontent.com/5261/128420596-2d9f04ad-718c-4dce-b045-b5647f718339.png"> | <img width="1372" alt="Screen Shot 2021-08-03 at 2 50 35 PM" src="https://user-images.githubusercontent.com/5261/128070103-831fd5c3-38da-4864-9d82-9091f3209606.png">
<img width="1372" alt="Screen Shot 2021-08-05 at 4 55 28 PM" src="https://user-images.githubusercontent.com/5261/128420658-4042c631-b283-4b0c-ac4a-3235bd3bcf39.png"> | <img width="1372" alt="Screen Shot 2021-08-03 at 2 50 41 PM" src="https://user-images.githubusercontent.com/5261/128070113-73528f2c-56bf-4fcf-a8d8-fcc3f100695f.png">

Release note (ui change): The databases pages in the DB console were
updated to bring them into alignment with our modern UX.

68619: storage: add file storing min version needed for backwards compatibility r=jbowens a=andyyang890

A new STORAGE_MIN_VERSION file containing the minimum version that the
storage engine needs to maintain backwards compatibility with will be
added to the directory for each storage engine. This will be used in
the encryption-at-rest registry migration.

Release note: None

68621: dev: implement `dev generate docs` r=rail a=rickystewart

Do some refactoring so that we don't have to repeat code in a bunch of
places. The implementation is pretty straightforward and just involves
building the docs then copying them over to the workspace.

Closes #68563.

Release note: None

68663: dev: hoist generated code out of sandbox into workspace r=rail a=rickystewart

(Only the most recent commit applies for this review.)

One might be expected that this command be implemented as
`dev generate protobuf` or `dev generate execgen` or similar, but that's
not really practical for a couple reasons:

1. Building all the protobuf in-tree isn't really possible on its own:
   see #58018 (comment).
2. For non-proto generated code, one can imagine that we do a bunch of
   `bazel query`s to list all the generated files in tree and to find
   which files are generated. Given that there are hundreds of generated
   files in tree, and a single `bazel query` can take ~1s, this is not
   tractable.

The implementation here is ad-hoc and a bit hacky. Basically we search
`bazel-bin` for files ending in `.go` and apply some heuristics to
figure out whether they should be copied and where those files should
go.

This is gated behind a feature flag for now because actually copying
these files into the workspace right now dirties a ton of files. (The
diffs are benign -- mostly comments.) We can flip this option to be on
by default when it's safe (probably after the `Makefile` is deleted).

Closes #68565.

Release note: None

68806: bazel: set `--include_source_info` when generating protobuf code r=rail a=rickystewart

This flag [instructs Bazel](bazelbuild/rules_proto#56 (comment))
to set a [command-line flag](protocolbuffers/protobuf#7623 (comment))
when invoking `protoc` that causes the generated proto descriptor sets
to contain extra info:

```
  --include_source_info       When using --descriptor_set_out, do not strip
                              SourceCodeInfo from the FileDescriptorProto.
                              This results in vastly larger descriptors that
                              include information about the original
                              location of each decl in the source file as
                              well as surrounding comments.
```

Setting this solves two problems:

1. We need the descriptor sets to have comments for #65814.
2. Without this change, generated `.pb.go` files from the sandbox won't
   contain comments. This makes the files more difficult to read and
   dirties the files in your checkout if you copy those `.pb.go` files
   to your workspace.

Also delete an unnecessary `--symlink_prefix=_bazel/` from the `test`
configuration (it's inherited from the `build` configuration so it's
redundant).

Release note: None

Co-authored-by: Oleg Afanasyev <oleg@cockroachlabs.com>
Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Co-authored-by: Andy Yang <ayang@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
@rickystewart
Copy link
Collaborator

boom baby

@otan
Copy link
Contributor

otan commented Aug 18, 2021

noiceee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-dev-inf
Projects
None yet
Development

No branches or pull requests

4 participants