You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make bsc output an .embeds file together with the .ast file, if the file processed has embeds. It'll also print 1 to stdout if it found embeds. More about .embeds and its format later.
Run a PPX that replaces the embed tags with links to the generated module for that content. More on that later too.
Generators and embeds are used a bit interchangeably in the text below. Generators are the program that generates code from some source input. Embeds is that source input embedded into ReScript source itself.
Configuring generators in the consuming project
We need a way to configure what generators to use, so the build system knows what to run for each embed. This should be done in rescript.json for consistency.
Suggestion: Like PPXes, point to a path
In this alternative, you point to a path. That path should be some sort of configuration file, that the build system can read once, and figure out what it needs for what generator this is, and how to run it. Example:
We'll go more into how to build generators later, but the build system would expect to be able to send some configuration as arg to that command and have it generate from that config.
Note that the command could be any type of binary. It's bun here but it could be node, or a Rust/OCaml/whatever binary. Doesn't matter. It's up to the user to have what's needed installed on its system to be able to run the generation.
This leaves us room to add more configuration if wanted, as well as give good DX with minimal manual work.
So, to recap what the build system would do:
Read embeds in rescript.json
Resolve each embed as it resolves the path to a PPX today
Append .json if it's not already in the file path
Read the configuration in the embed json file
It now knows what generator this is, how to run it, and what tags to run it for.
Configuring where to emit the generated content
I think we should force the user to configure a central place where to emit generated files, like ./src/__generated__. This will simplify a lot, and scale well up to the point where there's so many files in the same folder that you start to get perf issues. At which point we can solve that in a number of ways.
We need to check that that folder is inside of a configured ReScript source folder etc, but that should be fine.
Questions and things to figure out
What if things clash, as in several embeds operate on the same tag names?
Overview of potential setup in build system
Here's an overview of how the build system could handle running generators.
This is how it looks at a high level:
Finding embeds
You can embed other languages or any string content into tags inside of ReScript. Example:
letfindOne= %sql.one(`select * from users where id = :id!`)
letfindMany= %sql.many(`select * from users`)
If there's a generator configured for sql.one, bsc will spit out a .embeds file next to .ast when it's asked to produce the .ast file. It looks roughly like this (format very much subject to change, we'll make it whatever makes most sense and is easiest/most efficient to read from the build system):
<<- item begin ->>
sql.one
select * from users where id = :id!
1:23-1:60
<<- item begin ->>
sql.many
select * from users
3:88-3:109
If bsc found embeds and printed a .embed file, it'll output 1 to stdout.
Running generators
Now, if we found embeds we'll want to run the appropriate generator for that file, if the embedded content has changed.
Generators are expected to be idempotent. We're building a pretty aggressive cache mechanism into this. This is important and will make the DX much better, including not having to run any generators in CI etc unless you really want to. Control that by simply committing or not committing the generated files.
So, we load the .embeds file, go through each of the embeds, and check whether they've already been generated. If they've been generated, we check if the generated content was generated from the same input, via a comment with a hash of the source content at the top of the generated file. If the generated file wasn't generated from the same source, or if it hasn't been generated yet, we run the appropriate generator and write the generated file.
Here's a number of hands on examples:
First time a generation runs
// SomeFile.res
let findOne = %sql.one(`select * from users where id = :id!`)
let findMany = %sql.many(`select * from users`)
bsc extracts 2 embeds from SomeFile.res and prints 1 to stdout to signify that
The build system reads the SomeFile.embeds file generated by bsc, and figures out that 2 files are to be generated: src/__generated__/SomeFile__sql_one__M1.res and src/__generated__/SomeFile__sql_many__M1.res. Notice the file format <sourceModuleName>__<tagName.replace(".", "_")>__M<indexOfTagInFile>. If multiple embeds of the same tag exists in the same file (multiple %sql.one for example), the M part is incremented, like src/__generated__/SomeFile__sql_one__M2.res for the next embed.
The build system checks if the generated files exist already. They don't, so...
...the build system triggers the appropriate generator for each embedded content. Maybe by passing stringified JSON as the sole argument to the generator: /command/to/run/generator '{"tag":"sql.one","content":"select * from users where id = :id!","loc":{"start":{"line":1,"col":23},"end":{"line":1,"col":60}}}'. This can all be done in parallell, since the generators should be idempotent (at least to start with).
The generator runs, and returns either the generated content, or errors. More about errors below.
The build system writes the generated content, including a source hash for the input it was generated from at the top of each generated file. Here's how a file could look: src/__generated__/SomeFile__sql_one__M1.res
// @sourceHash 83mksdf8782m4884i34
type response = {...}
// More generated content in here
New files were added, so we need to add these new files to the build system build state, and trigger ast generation of them. Notice that embeds in files generated by other embeds are not allowed. That way we avoid potentially slow and recursive embeds.
The build system cleans up any lingering embeds that are now irrelevant, if they exist. Maybe by just querying the file system for src/__generated__/SomeFile__sql_one__*.res and src/__generated__/SomeFile__sql_many__*.res and then remove any of them that aren't in use any more. This also needs to be updated in the build state.
Finally, when things have settled and the build system is ready, we move on to the compilation phase, as usual.
When generated content hasn't changed
The same setup as the first example, up until point 3, where instead:
3. Generated files exist for both embeds: src/__generated__/SomeFile__sql_one__M1.res and src/__generated__/SomeFile__sql_many__M1.res
4. The build system reads the first line of each of those files, and extracts the @sourceHash
5. It then compares the hash from the file with hashing the content extracted from the .embeds file.
6. All hashes match, so no generation needs to run, and the build state can be considered valid. Continue to regular compilation.
When generated content has changed
The same setup as above, but from point 5:
5. The hashes does not match. Run the generation again, as noted by point 4 in the first example.
Cleaning up
We'll need to continuously ensure that we clean up:
.embeds files when there aren't any embeds anymore (as notced by bsc not writing 1 to stdout)
Generated files when their parent source tag don't exist anymore
When errors in generation happen
We can flesh this out more, but ideally, when errors in generation happen, we can propagate those to the build system and have the build system both fail and write them to .compiler.log so that they end up in the editor tooling.
The one thing to take care of here is to translate the error locations so that the generator can return errors relative to the content it received, whereas the error itself is presented by the build system and in the editor tooling offset to the correct location in the source file.
Regenerating content?
The idea is that you can simply remove the generated file, at which point it'll be regenerated the next time the build system processes the file with the source content.
Questions and thoughts
Should generators be idempotent? This makes things a lot easier, and faster, but what about the scenario where for example a GraphQL schema changes, and we want to regenerate because of that? We probably need to figure out a few more strategies.
The text was updated successfully, but these errors were encountered:
One idea for the case where there are additional inputs that should control whether something is regenerated or not (like with GraphQL where ideally both the actual GraphQL text input, and the source schema should control whether things are regenerated) - let people define additional input(s) that the build system can take into account when writing the hash:
The build system can then track and hash that file as well, and use the hash of that file in addition to the source hash when comparing whether things need to be regenerated or not.
@zth -- I've enabled wiki's for the project so we can move these sort of 'permanent' issues (that we want to keep around for documentation) to there. Would you like me to move it over? I think you can do that as well as you're an author 👌
This is a WIP discussion for implementing generators support in the style of https://github.com/zth/rescript-embed-lang natively in rewatch and the compiler itself.
Relevant compiler PR: rescript-lang/rescript#6823. That PR does the following in the compiler:
bsc
output an.embeds
file together with the.ast
file, if the file processed has embeds. It'll also print1
to stdout if it found embeds. More about.embeds
and its format later.Generators and embeds are used a bit interchangeably in the text below. Generators are the program that generates code from some source input. Embeds is that source input embedded into ReScript source itself.
Configuring generators in the consuming project
We need a way to configure what generators to use, so the build system knows what to run for each embed. This should be done in
rescript.json
for consistency.Suggestion: Like PPXes, point to a path
In this alternative, you point to a path. That path should be some sort of configuration file, that the build system can read once, and figure out what it needs for what generator this is, and how to run it. Example:
rescript.json
in the consuming project.Example
embed.json
in thepgtyped-rescript
package:We'll go more into how to build generators later, but the build system would expect to be able to send some configuration as arg to that
command
and have it generate from that config.Note that the command could be any type of binary. It's
bun
here but it could benode
, or a Rust/OCaml/whatever binary. Doesn't matter. It's up to the user to have what's needed installed on its system to be able to run the generation.This leaves us room to add more configuration if wanted, as well as give good DX with minimal manual work.
So, to recap what the build system would do:
embeds
inrescript.json
.json
if it's not already in the file pathIt now knows what generator this is, how to run it, and what tags to run it for.
Configuring where to emit the generated content
I think we should force the user to configure a central place where to emit generated files, like
./src/__generated__
. This will simplify a lot, and scale well up to the point where there's so many files in the same folder that you start to get perf issues. At which point we can solve that in a number of ways.A proposed config could look like this:
We need to check that that folder is inside of a configured ReScript source folder etc, but that should be fine.
Questions and things to figure out
Overview of potential setup in build system
Here's an overview of how the build system could handle running generators.
This is how it looks at a high level:
Finding embeds
You can embed other languages or any string content into tags inside of ReScript. Example:
If there's a generator configured for
sql.one
,bsc
will spit out a.embeds
file next to.ast
when it's asked to produce the.ast
file. It looks roughly like this (format very much subject to change, we'll make it whatever makes most sense and is easiest/most efficient to read from the build system):If
bsc
found embeds and printed a.embed
file, it'll output1
to stdout.Running generators
Now, if we found embeds we'll want to run the appropriate generator for that file, if the embedded content has changed.
So, we load the
.embeds
file, go through each of the embeds, and check whether they've already been generated. If they've been generated, we check if the generated content was generated from the same input, via a comment with a hash of the source content at the top of the generated file. If the generated file wasn't generated from the same source, or if it hasn't been generated yet, we run the appropriate generator and write the generated file.Here's a number of hands on examples:
First time a generation runs
bsc
extracts 2 embeds fromSomeFile.res
and prints1
to stdout to signify thatSomeFile.embeds
file generated bybsc
, and figures out that 2 files are to be generated:src/__generated__/SomeFile__sql_one__M1.res
andsrc/__generated__/SomeFile__sql_many__M1.res
. Notice the file format<sourceModuleName>__<tagName.replace(".", "_")>__M<indexOfTagInFile>
. If multiple embeds of the same tag exists in the same file (multiple%sql.one
for example), theM
part is incremented, likesrc/__generated__/SomeFile__sql_one__M2.res
for the next embed./command/to/run/generator '{"tag":"sql.one","content":"select * from users where id = :id!","loc":{"start":{"line":1,"col":23},"end":{"line":1,"col":60}}}'
. This can all be done in parallell, since the generators should be idempotent (at least to start with).src/__generated__/SomeFile__sql_one__M1.res
src/__generated__/SomeFile__sql_one__*.res
andsrc/__generated__/SomeFile__sql_many__*.res
and then remove any of them that aren't in use any more. This also needs to be updated in the build state.When generated content hasn't changed
The same setup as the first example, up until point 3, where instead:
3. Generated files exist for both embeds:
src/__generated__/SomeFile__sql_one__M1.res
andsrc/__generated__/SomeFile__sql_many__M1.res
4. The build system reads the first line of each of those files, and extracts the
@sourceHash
5. It then compares the hash from the file with hashing the content extracted from the
.embeds
file.6. All hashes match, so no generation needs to run, and the build state can be considered valid. Continue to regular compilation.
When generated content has changed
The same setup as above, but from point 5:
5. The hashes does not match. Run the generation again, as noted by point 4 in the first example.
Cleaning up
We'll need to continuously ensure that we clean up:
.embeds
files when there aren't any embeds anymore (as notced bybsc
not writing1
to stdout)When errors in generation happen
We can flesh this out more, but ideally, when errors in generation happen, we can propagate those to the build system and have the build system both fail and write them to
.compiler.log
so that they end up in the editor tooling.The one thing to take care of here is to translate the error locations so that the generator can return errors relative to the content it received, whereas the error itself is presented by the build system and in the editor tooling offset to the correct location in the source file.
Regenerating content?
The idea is that you can simply remove the generated file, at which point it'll be regenerated the next time the build system processes the file with the source content.
Questions and thoughts
The text was updated successfully, but these errors were encountered: