-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🧹 Add a build tool #4799
🧹 Add a build tool #4799
Conversation
We are accumulating quite a lot of build scripts that are all written in bash, and need to be implemented in a just-so order, at various points in time. It might be time to start looking into moving to an actual tool to manage the build for us. These come in a lot of flavors with various capabilities. I've looked around a bunch, and I think [DoIt](https://pydoit.org) looks quite nice. Here are the features that I think make this tool quite nice: - It's written in Python, so comes in automatically as Python dependency during `pip install`. - The configuration file is written in Python as well, so we don't have to learn a new configuration language just for this tool. - Tasks have dependencies between them, so if there are multiple that need to run, they automatically run in the right order. - Tasks can (optionally) declare files that are inputs to the task itself, and if the input files have not changed the task is skipped. This makes it safe to just run the build toold and trust that you're not waiting too long for nothing. I've started converting some build scripts over to this new tool to show how it works. You can see it in action by going into the virtual env and running (must be in the root directory, unfortunately): ``` $ doit $ doit run typescript $ doit help ``` I've paused here because I'd like to solicit feedback. Does the tool look okay? Is this understandable? Do we prefer this to bash, or to other build tools?
for more information, see https://pre-commit.ci
) | ||
|
||
|
||
def task_typescript(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I really like this!!!
WOW Rico, I think this is a magnificent idea! I personally don't love bash and indeed is quite difficult to be aware of the bunch of bash scripts we use for configuration. So this is great, it also seems like is very easy to work with! With this I imagine we'd also need to change GitHub actions a bit? |
Yes, I would say that we move everything from bash into here, and then everywhere we need to invoke some scripts we change them to invoke |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
# file outputs, as well as other task dependencies. If none of the files that a task | ||
# depends on have changed since the last time it ran, the task will be skipped. That's | ||
# good for speed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds promising! How does it do that and does it do it with tasks as well? Or it's just with files and tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just with files. It does it by calculating and storing the md5
hash of every file. By doing that, it can check if files have changed, and skip the task if they haven't.
# Keep the following in mind: | ||
# - You can specify commands as a string `'mkdir dir'` or as an array `['mkdir', 'dir']`. | ||
# The second case doesn't allow shell features, but is safer in case of variables. | ||
# - `file_dep` requires a list of source files, but you can use `glob('*.yaml')` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why you would need to add the source file twice; here and in actions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every task looks like this:
┌──────────────┐
│ │
input ────────▶│ Script │────────▶ output
│ │
└──────────────┘
file_dep
should at least be the set of input
files. That way, if any of the input files change, you know that the output
needs to be regenerated.
But what if you change the script itself? The thing that turns input
into output
?
If you don't put the script itself into file_dep
as well, doit
will think that output
is still up-to-date! Because none of the input
files have changed, so according to the list of instructions output
doesn't need to be regenerated.
You probably do want the task to run again if you change the script though, because probably the output
would change. That's why you should add the scripts to file_dep
as well.
And in actions
: there it means the command you want to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that makes sense. Shouldn't we then also include the .doit.db
? Otherwise all tasks will run no matter whether the inputs/outputs changed?
], | ||
title=lambda _: 'Generate Tailwind CSS', | ||
actions=[ | ||
[script], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these files would still run with bash? I personally don't mind bash as a build-tool at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering since this PR aims at using doit
as the primary build-tool. But I'm guessing you kept this file just because we may need to run it locally; e.g., watching changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can implement tasks fully in Python, or you can write them in bash and call them using doit. I think both are fine solutions. In fact I did both here myself 🙃.
The most important part I think is that:
doit
is the primary interface, so everybody knows where to startdoit
will take care of dependencies: run tasks that depend on each other in the right order, won't run them twice, and won't run them if it's not necessary
Once we have that, whether a task is implemented in Python or in bash is an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doit
will take care of dependencies: run tasks that depend on each other in the right order, won't run them twice, and won't run them if it's not necessary
Wanted to investigate whether it does that. So, i created this dummy function:
# Create a simple task
def task_copy_file():
return dict(
title=lambda _: "copy file into another",
actions=['cp "dev_database.json" "test here.txt"'],
file_dep=['dev_database.json'],
targets=['test here.txt']
)
Then ran: doit copy_file
. Notably, it did two things:
- it ran the task that you created:
devdb
andcopy_file
. Probably because the file dep in copy_file is in the target of devdb! - i ran it again after a few minutes, and as you explained, it actually did not rerun the tasks. Really nice!!
Later, i removed .doit.db
, and reran it, boths tasks were fired. So, i do believe we need to include it to git, otherwise all tasks will be executed!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably because the file dep in copy_file is in the target of devdb!
Yes! Exactly, it traces task dependencies through declared file dependencies. In this case, that's probably not what we want to happen, so that might be something we need to be careful with...
.doit.db
[...] we need to include it to git, otherwise all tasks will be executed!?
I've been thinking about this a little, and I think it's safer to leave the database out of git for now.
- First, the database is a binary file, and git doesn't deal well with binary files usually
- Second, especially because it's a binary file, and many PRs will make changes that cause changes to this file, we will nearly always have conflicts on this file when trying to merge any PR. That will get annoying very quickly.
- If we check this file in, it will also go to the server, and the server will also skip build steps based on this file. That MAY be fine, but it may also become the cause of hard-to-diagnose bugs in the future, if any bugs slip in (if we make mistakes in dependency declarations on the tasks, for example).
The only reason to check in the database file would be to save running tasks across machines. I.e., I run the task, and then you don't have to anymore. That's nice, but it's not the end of the world if you have run the tasks once, and can skip them after.
All that is to say, I think the benefit is not massive, and the annoyances and risks are real, so I don't think checking in the database file would be worth it.
dodo.py
Outdated
# Use tsc to generate .js (this downlevels to old JavaScript versions for old browsers) | ||
['npx', 'tsc', '--outDir', '__tmp__'], | ||
|
||
# Then bundle JavaScript into a single bundle | ||
['npx', 'esbuild', '__tmp__/static/js/index.js', | ||
'--bundle', '--sourcemap', '--minify', '--target=es2017', | ||
'--global-name=hedyApp', '--platform=browser', | ||
'--outfile=static/js/appbundle.js'], | ||
|
||
# Delete tempdir | ||
['rm', '-rf', '__tmp__'], | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like this syntax. I think having this command as a string and splitting it into multiple lines (if needed) would be more readable. And in the case of including variables, we could simply use string interpolation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in the case of including variables, we could simply use string interpolation!
Yeah but here's the thing: you need to escape special characters inside those strings, otherwise they get pass uninterpreted to the shell and the command might do something else than you want.
A trivial example that will probably only fail:
def copy_file(file1, file2):
run_command(f'cp {file1} {file2}') # Seems easy enough
# This is fine
copy_file('one.txt', 'two.txt') # cp one.txt two.txt
# This is not
copy_file('file with spaces.txt', 'two.txt') # cp file with spaces.txt two.txt
# ^^^ oops! Now the cp command gets 4 arguments instead of 2!
Not to mention special characters like &
and >
and ;
etc, that can make the shell do weird things.
copy_file('file1.txt', 'file2.txt; rm -rf /') # cp file1.txt file2.txt; rm -rf /
# ^^^^ the shell will happily execute this
I guess for the domain of a build tool, where we don't really handle untrusted strings, it matters less. But I've learned to be hyper-allergic to strings for shell commands, for this reason.
# We can also run in `--watch` mode, in which case we bypass the TypeScript | ||
# compiler and only run esbuild. This is because the TypeScript compiler is | ||
# rather slow (think 5-10 seconds to do a compilation), and we want to | ||
# optimize for iteration speed in that case. That does mean that the generated | ||
# JavaScript is slightly different between those cases--however, the final | ||
# JavaScript will always be generated on deploy using the official generation | ||
# method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh is this why you kept the bash files even though we could run them with doit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no. This functionality will actually be gone. I don't know how many people use it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use it :c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahah :D. We're going to need a doit
replacement for that then!
""" | ||
return dict( | ||
actions=None, | ||
task_dep=['frontend', 'backend'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this means that the deploy
task would run both of the frontend
and backend
tasks? That's actually very nice for organization purposes!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking, yes.
On deploy on a Heroku machine we would run the deploy
task, and that would just do "everything" that's necessary to get both the backend and frontend in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Rico! I have a couple of questions!
- When I execute a task it also generates this files that are tracked by git: .doit.db.bak, .doit.db.dat, .doit.db.dir perhaps we should also add them to .gitignore
- Why does the tailwind task also call these:
- Generate Python prefixes for TypeScript
- Compile Babel files
- Generate client messages.
Is this because the input files of this task are also the same for those other tasks? Im not saying good or bad! im just curious
- Is there a possibility to run these tasks on watch mode? I found that feature really useful when working on the front-end
Overall, this is really good,
return dict( | ||
actions=None, | ||
task_dep=[ | ||
'tailwind', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add the lezer parsers here? They're also part of the front-end, and also since this is called on deploy we wouldn't call that task there, depending on people checking in the automatically generated files.
Oh, I hadn't seen those. I just saw
Yeah, good question. When I do:
It's showing that the Tailwind job depends on all the Because that list is So as far as the current dependency graph is aware: regenerating We could break the dependency by making sure that
Excellent question. The docs say that doit has a "watch" mode but on my machine it doesn't appear when I do We could go and try to figure out why that is, or add our own watch tasks (but they will necessarily only watch TypeScript, for example, not all files like I presume
I was under the assumption that the lezer parsers were being generated as part of the TypeScript build. But if they aren't, yeah, we should definitely add a dependency somewhere. Probably to the TypeScript task, as the lezer parsers generate @jpelay, in the interest of gaining some experience with this tool, would you mind taking a stab at these yourself? 😇 |
Yes! I actually wanna try those suggestions! |
I think I got Fwiw, we control the watch plugin code (I forked it from the original |
Thanks! Indeed it's become a bit much, and I like have a better organized set of tools. I don't think it's very high priority but definitely not complaining for getting something nice! I have no hands-on experience with doit but from what I've seen it's is a good middle position between bash/makefiles and full blown build tools like bazel. Sounds like the right approach! |
Let's hope that setting |
So if we change the moduleResolution to Bundler the Lezer tests won't work, because these tests call application typescript code, I think, and setting it to node makes the others fail because they need it to be Bundler |
This reverts commit 04ad130.
The problem seems to be here: I don't think the build was designed for this. The build was designed to build and bundle all I'm honestly not even sure how sideloading these lezer files from Cypress tests works at all, given that this requires |
It also looks like you're using the Cypress test framework to run JavaScript unit tests that don't interact with the browser at all. I guess that works, but it's also not necessary. If you want to run a suite of JavaScript tests, you can use a JavaScript test framework like Jest to run some tests before bundling even happens, and they don't need to run in the browser either: they can run in Node. |
I see the value in using Cypress to run JS tests, in that people wanting to run the tests don't need to install Node. I also now understand why the include of a So the pipeline for Cypress is that:
The very first error was I managed to fiddle with the config enough to disable that, but then the next error becomes:
And I don't know how to change this. I've been messing around with Cypress wrapping WebPack wrapping Babel wrapping TypeScript trying to pass configs around to maybe use an alternative I've been spending 2 hours on this already, I'm giving up for now. |
Ah, with
This leaves us stuck in the middle, unable to move to the
|
I made it that way, mainly because we already had Cypress and it worked fine, also since Mocha is already included in Cypress I didn't see the need. However, in light of these problems maybe we need to change the tests and use Jest or other framework, I'm totally up for it! So if you want we can disable the Lezer tests for now, we open an issue about replacing them with Jest or other similar framework and merge this! What do you think? |
Sounds good to me! I'm just a tiny tad confused why all these problems are suddenly cropping up on this branch, and not the |
I believe it's because in main we run a different command. Namely, we don't compile ts first and then run esbuild, as Lines 160 to 163 in 20c8d4f
I'm not sure what happens in the I was also following your progress not understanding why this is happening, but this is the only and main difference that would make sense on why! |
Ready to ship! 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazingly done 💯
Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork). |
We are accumulating quite a lot of build scripts that are all written in bash, and need to be implemented in a just-so order, and you need to know exactly when you need to run them. It might be time to start looking into moving to an actual tool to manage the build for us.
These come in a lot of flavors with various capabilities. I've looked around a bunch, and I think DoIt looks quite nice.
Here are the features that I think make this tool quite nice:
pip install
.I've started converting some build scripts over to this new tool to show how it works. You can see it in action by going into the virtual env and running (must be in the root directory, unfortunately):
The
.doit.db.db
that I've ignored is a file it uses internally to track changes to dependencies.I've paused here because I'd like to solicit feedback. Does the tool look okay? Is this understandable? Do we prefer this to bash, or to other build tools?