Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧹 Add a build tool #4799

Merged
merged 36 commits into from
Jan 4, 2024
Merged

🧹 Add a build tool #4799

merged 36 commits into from
Jan 4, 2024

Conversation

rix0rrr
Copy link
Collaborator

@rix0rrr rix0rrr commented Nov 23, 2023

We are accumulating quite a lot of build scripts that are all written in bash, and need to be implemented in a just-so order, and you need to know exactly when you need to run them. It might be time to start looking into moving to an actual tool to manage the build for us.

These come in a lot of flavors with various capabilities. I've looked around a bunch, and I think DoIt looks quite nice.

Here are the features that I think make this tool quite nice:

  • It's written in Python, so comes in automatically as Python dependency during pip install.
  • The configuration file is written in Python as well, so we don't have to learn a new configuration language just for this tool.
  • Tasks have dependencies between them, so if there are multiple that need to run, they automatically run in the right order.
  • Tasks can (optionally) declare files that are inputs to the task itself, and if the input files have not changed the task is skipped. This makes it safe to just run the build toold and trust that you're not waiting too long for nothing.

I've started converting some build scripts over to this new tool to show how it works. You can see it in action by going into the virtual env and running (must be in the root directory, unfortunately):

$ doit
$ doit run typescript
$ doit help

The .doit.db.db that I've ignored is a file it uses internally to track changes to dependencies.

I've paused here because I'd like to solicit feedback. Does the tool look okay? Is this understandable? Do we prefer this to bash, or to other build tools?

We are accumulating quite a lot of build scripts that are all written
in bash, and need to be implemented in a just-so order, at various
points in time.

It might be time to start looking into moving to an actual tool to
manage the build for us.

These come in a lot of flavors with various capabilities. I've looked
around a bunch, and I think [DoIt](https://pydoit.org) looks quite nice.

Here are the features that I think make this tool quite nice:

- It's written in Python, so comes in automatically as Python
  dependency during `pip install`.
- The configuration file is written in Python as well, so we don't
  have to learn a new configuration language just for this tool.
- Tasks have dependencies between them, so if there are multiple
  that need to run, they automatically run in the right order.
- Tasks can (optionally) declare files that are inputs to the task
  itself, and if the input files have not changed the task is skipped.
  This makes it safe to just run the build toold and trust that you're
  not waiting too long for nothing.

I've started converting some build scripts over to this new tool
to show how it works. You can see it in action by going into
the virtual env and running (must be in the root directory,
unfortunately):

```
$ doit
$ doit run typescript
$ doit help
```

I've paused here because I'd like to solicit feedback.  Does the tool
look okay? Is this understandable? Do we prefer this to bash, or to
other build tools?
@rix0rrr rix0rrr requested a review from jpelay November 23, 2023 19:05
@rix0rrr rix0rrr marked this pull request as draft November 23, 2023 19:05
@ghost
Copy link

ghost commented Nov 23, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

)


def task_typescript():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I really like this!!!

@jpelay
Copy link
Member

jpelay commented Nov 23, 2023

WOW Rico, I think this is a magnificent idea!

I personally don't love bash and indeed is quite difficult to be aware of the bunch of bash scripts we use for configuration. So this is great, it also seems like is very easy to work with! With this I imagine we'd also need to change GitHub actions a bit?

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Nov 24, 2023

With this I imagine we'd also need to change GitHub actions a bit?

Yes, I would say that we move everything from bash into here, and then everywhere we need to invoke some scripts we change them to invoke doit run <whatever> instead.

Comment on lines +7 to +9
# file outputs, as well as other task dependencies. If none of the files that a task
# depends on have changed since the last time it ran, the task will be skipped. That's
# good for speed!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds promising! How does it do that and does it do it with tasks as well? Or it's just with files and tasks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just with files. It does it by calculating and storing the md5 hash of every file. By doing that, it can check if files have changed, and skip the task if they haven't.

# Keep the following in mind:
# - You can specify commands as a string `'mkdir dir'` or as an array `['mkdir', 'dir']`.
# The second case doesn't allow shell features, but is safer in case of variables.
# - `file_dep` requires a list of source files, but you can use `glob('*.yaml')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why you would need to add the source file twice; here and in actions!

Copy link
Collaborator Author

@rix0rrr rix0rrr Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every task looks like this:

               ┌──────────────┐                
               │              │                
input ────────▶│    Script    │────────▶ output
               │              │                
               └──────────────┘                

file_dep should at least be the set of input files. That way, if any of the input files change, you know that the output needs to be regenerated.

But what if you change the script itself? The thing that turns input into output?

If you don't put the script itself into file_dep as well, doit will think that output is still up-to-date! Because none of the input files have changed, so according to the list of instructions output doesn't need to be regenerated.

You probably do want the task to run again if you change the script though, because probably the output would change. That's why you should add the scripts to file_dep as well.

And in actions: there it means the command you want to run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. Shouldn't we then also include the .doit.db? Otherwise all tasks will run no matter whether the inputs/outputs changed?

],
title=lambda _: 'Generate Tailwind CSS',
actions=[
[script],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these files would still run with bash? I personally don't mind bash as a build-tool at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering since this PR aims at using doit as the primary build-tool. But I'm guessing you kept this file just because we may need to run it locally; e.g., watching changes!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement tasks fully in Python, or you can write them in bash and call them using doit. I think both are fine solutions. In fact I did both here myself 🙃.

The most important part I think is that:

  • doit is the primary interface, so everybody knows where to start
  • doit will take care of dependencies: run tasks that depend on each other in the right order, won't run them twice, and won't run them if it's not necessary

Once we have that, whether a task is implemented in Python or in bash is an implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • doit will take care of dependencies: run tasks that depend on each other in the right order, won't run them twice, and won't run them if it's not necessary

Wanted to investigate whether it does that. So, i created this dummy function:

# Create a simple task
def task_copy_file():
    return dict(
      title=lambda _: "copy file into another",
      actions=['cp "dev_database.json" "test here.txt"'],
      file_dep=['dev_database.json'],
      targets=['test here.txt']
      )

Then ran: doit copy_file. Notably, it did two things:

  1. it ran the task that you created: devdb and copy_file. Probably because the file dep in copy_file is in the target of devdb!
  2. i ran it again after a few minutes, and as you explained, it actually did not rerun the tasks. Really nice!!

Later, i removed .doit.db, and reran it, boths tasks were fired. So, i do believe we need to include it to git, otherwise all tasks will be executed!?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because the file dep in copy_file is in the target of devdb!

Yes! Exactly, it traces task dependencies through declared file dependencies. In this case, that's probably not what we want to happen, so that might be something we need to be careful with...

.doit.db [...] we need to include it to git, otherwise all tasks will be executed!?

I've been thinking about this a little, and I think it's safer to leave the database out of git for now.

  • First, the database is a binary file, and git doesn't deal well with binary files usually
  • Second, especially because it's a binary file, and many PRs will make changes that cause changes to this file, we will nearly always have conflicts on this file when trying to merge any PR. That will get annoying very quickly.
  • If we check this file in, it will also go to the server, and the server will also skip build steps based on this file. That MAY be fine, but it may also become the cause of hard-to-diagnose bugs in the future, if any bugs slip in (if we make mistakes in dependency declarations on the tasks, for example).

The only reason to check in the database file would be to save running tasks across machines. I.e., I run the task, and then you don't have to anymore. That's nice, but it's not the end of the world if you have run the tasks once, and can skip them after.

All that is to say, I think the benefit is not massive, and the annoyances and risks are real, so I don't think checking in the database file would be worth it.

dodo.py Outdated
Comment on lines 153 to 164
# Use tsc to generate .js (this downlevels to old JavaScript versions for old browsers)
['npx', 'tsc', '--outDir', '__tmp__'],

# Then bundle JavaScript into a single bundle
['npx', 'esbuild', '__tmp__/static/js/index.js',
'--bundle', '--sourcemap', '--minify', '--target=es2017',
'--global-name=hedyApp', '--platform=browser',
'--outfile=static/js/appbundle.js'],

# Delete tempdir
['rm', '-rf', '__tmp__'],
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like this syntax. I think having this command as a string and splitting it into multiple lines (if needed) would be more readable. And in the case of including variables, we could simply use string interpolation!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in the case of including variables, we could simply use string interpolation!

Yeah but here's the thing: you need to escape special characters inside those strings, otherwise they get pass uninterpreted to the shell and the command might do something else than you want.

A trivial example that will probably only fail:

def copy_file(file1, file2):
  run_command(f'cp {file1} {file2}')  # Seems easy enough

# This is fine
copy_file('one.txt', 'two.txt')  # cp one.txt two.txt

# This is not
copy_file('file with spaces.txt', 'two.txt')   # cp file with spaces.txt two.txt
# ^^^ oops! Now the cp command gets 4 arguments instead of 2!

Not to mention special characters like & and > and ; etc, that can make the shell do weird things.

copy_file('file1.txt', 'file2.txt; rm -rf /')    # cp file1.txt file2.txt; rm -rf / 
                                        # ^^^^ the shell will happily execute this

I guess for the domain of a build tool, where we don't really handle untrusted strings, it matters less. But I've learned to be hyper-allergic to strings for shell commands, for this reason.

Comment on lines -23 to -29
# We can also run in `--watch` mode, in which case we bypass the TypeScript
# compiler and only run esbuild. This is because the TypeScript compiler is
# rather slow (think 5-10 seconds to do a compilation), and we want to
# optimize for iteration speed in that case. That does mean that the generated
# JavaScript is slightly different between those cases--however, the final
# JavaScript will always be generated on deploy using the official generation
# method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh is this why you kept the bash files even though we could run them with doit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no. This functionality will actually be gone. I don't know how many people use it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use it :c

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah :D. We're going to need a doit replacement for that then!

"""
return dict(
actions=None,
task_dep=['frontend', 'backend']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this means that the deploy task would run both of the frontend and backend tasks? That's actually very nice for organization purposes!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, yes.

On deploy on a Heroku machine we would run the deploy task, and that would just do "everything" that's necessary to get both the backend and frontend in order.

Copy link
Member

@jpelay jpelay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Rico! I have a couple of questions!

  • When I execute a task it also generates this files that are tracked by git: .doit.db.bak, .doit.db.dat, .doit.db.dir perhaps we should also add them to .gitignore
  • Why does the tailwind task also call these:
    • Generate Python prefixes for TypeScript
    • Compile Babel files
    • Generate client messages.
      Is this because the input files of this task are also the same for those other tasks? Im not saying good or bad! im just curious
  • Is there a possibility to run these tasks on watch mode? I found that feature really useful when working on the front-end

Overall, this is really good,

return dict(
actions=None,
task_dep=[
'tailwind',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add the lezer parsers here? They're also part of the front-end, and also since this is called on deploy we wouldn't call that task there, depending on people checking in the automatically generated files.

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Dec 8, 2023

When I execute a task it also generates this files that are tracked by git: .doit.db.bak, .doit.db.dat, .doit.db.dir perhaps we should also add them to .gitignore

Oh, I hadn't seen those. I just saw .doit.db I think. Yes, we should probably gitignore .doit.db*.

Why does the tailwind task also call these:

Yeah, good question. When I do:

$ doit info tailwind

tailwind

Build Tailwind.

status     : run
 * The following file dependencies have changed:
    - static/js/syntaxModesRules.ts
    - static/js/modal.ts
    - static/js/auth.ts
    - templates/customize-class/partial-sortable-adventures.html
    - templates/quiz/partial-review-quiz.html
    - templates/printable/certificate.html
    - static/js/teachers.ts
    - static/js/parsons.ts
    -  ....

It's showing that the Tailwind job depends on all the .ts files. Presumably because any of the .ts files could include references to Tailwind CSS class names, and so those must be found and not be stripped from the final CSS bundle.

Because that list is *.ts, it includes static/js/client-messages.ts, which is the output of generate client messages.

So as far as the current dependency graph is aware: regenerating client-messages.ts could potentially change the list of CSS classes referenced in the code, and so it must be rerun before generating the Tailwind CSS file.

We could break the dependency by making sure that client-messages.ts is NOT in the file_dep list for the Tailwind task.

Is there a possibility to run these tasks on watch mode? I found that feature really useful when working on the front-end

Excellent question. The docs say that doit has a "watch" mode but on my machine it doesn't appear when I do doit help.

We could go and try to figure out why that is, or add our own watch tasks (but they will necessarily only watch TypeScript, for example, not all files like I presume doit would do), or use an external watch tool, like for example entr (just found this by Googling, there are zillions of these).

Maybe we could add the lezer parsers here? They're also part of the front-end, and also since this is called on deploy we wouldn't call that task there, depending on people checking in the automatically generated files.

I was under the assumption that the lezer parsers were being generated as part of the TypeScript build. But if they aren't, yeah, we should definitely add a dependency somewhere. Probably to the TypeScript task, as the lezer parsers generate .ts files which still need to be compiled to .js (during the Typescript task).


@jpelay, in the interest of gaining some experience with this tool, would you mind taking a stab at these yourself? 😇

@jpelay
Copy link
Member

jpelay commented Dec 8, 2023

@jpelay, in the interest of gaining some experience with this tool, would you mind taking a stab at these yourself? 😇

Yes! I actually wanna try those suggestions!

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Dec 9, 2023

I think I got watch figured out. I added a plugin for the watch command, so now doit watch typescript or just doit watch should work.

Fwiw, we control the watch plugin code (I forked it from the original auto plugin, which doesn't work anymore but some other people helpfully left PRs to make it work again ☺️) so if we want anything changed on how the watching works, what it prints, whatever, we can.

@jtwaleson
Copy link
Collaborator

I've paused here because I'd like to solicit feedback. Does the tool look okay? Is this understandable? Do we prefer this to bash, or to other build tools?

Thanks! Indeed it's become a bit much, and I like have a better organized set of tools. I don't think it's very high priority but definitely not complaining for getting something nice! I have no hands-on experience with doit but from what I've seen it's is a good middle position between bash/makefiles and full blown build tools like bazel. Sounds like the right approach!

@jpelay
Copy link
Member

jpelay commented Jan 3, 2024

Let's hope that setting moduleResolution to node fixes the issue! (Edit: it didn't :/)

@jpelay
Copy link
Member

jpelay commented Jan 3, 2024

So if we change the moduleResolution to Bundler the Lezer tests won't work, because these tests call application typescript code, I think, and setting it to node makes the others fail because they need it to be Bundler

@jpelay jpelay changed the title [CHORE] Add a build tool 🧹 Add a build tool Jan 3, 2024
@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

The problem seems to be here:

image

I don't think the build was designed for this. The build was designed to build and bundle all .ts files into appbundle.js, which is to be loaded into the Hedy website.

I'm honestly not even sure how sideloading these lezer files from Cypress tests works at all, given that this requires .js files but in the source are only .ts files 🤔, and as far as I can tell there's nothing that produces the .js files...

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

It also looks like you're using the Cypress test framework to run JavaScript unit tests that don't interact with the browser at all.

I guess that works, but it's also not necessary. If you want to run a suite of JavaScript tests, you can use a JavaScript test framework like Jest to run some tests before bundling even happens, and they don't need to run in the browser either: they can run in Node.

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

I see the value in using Cypress to run JS tests, in that people wanting to run the tests don't need to install Node. I also now understand why the include of a .ts file just works -- it's Babel doing that!

So the pipeline for Cypress is that:

  • Cypress uses WebPack to bundle the .js files into a single file that can be loaded into the Electron instance
  • WebPack uses Babel to load both .js and .ts files, with some default behavior
  • Babel then uses TypeScript as soon as it finds a .ts file, with some default behavior

The very first error was TS5053: Option 'sourceMap' cannot be specified with option 'inlineSourceMap'. because Cypress' use of WebPack dictates the use of devtool: "inline-source-map", which ultimately gets passed to TypeScript and conflicts there with the sourceMap: true in tsconfig.json (you can't have both sourceMap: true and inlineSourceMap: true).

I managed to fiddle with the config enough to disable that, but then the next error becomes:

Module not found: Error: Can't resolve '../../../../../static/js/lezer-parsers/level1-parser' in '/Users/huijbers/Dev/hedy2/tests/cypress/e2e/tools/lezer'

And I don't know how to change this. I've been messing around with Cypress wrapping WebPack wrapping Babel wrapping TypeScript trying to pass configs around to maybe use an alternative tsconfig.webpack.json file or something, but I can't get it to work.

I've been spending 2 hours on this already, I'm giving up for now.

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

Ah, with moduleResolution: "node16" we're running into the issue that the author of @codemirror/state and the TypeScript team disagree on where the "types" declaration on package.json should be if an exports key is present.

This leaves us stuck in the middle, unable to move to the node16 moduleResolution strategy.

Bundler is apparently a nice middle ground for TypeScript itself, but breaks when being used by Cypress-WebPack-Babel.

@jpelay
Copy link
Member

jpelay commented Jan 4, 2024

It also looks like you're using the Cypress test framework to run JavaScript unit tests that don't interact with the browser at all.

I guess that works, but it's also not necessary. If you want to run a suite of JavaScript tests, you can use a JavaScript test framework like Jest to run some tests before bundling even happens, and they don't need to run in the browser either: they can run in Node.

I made it that way, mainly because we already had Cypress and it worked fine, also since Mocha is already included in Cypress I didn't see the need. However, in light of these problems maybe we need to change the tests and use Jest or other framework, I'm totally up for it! So if you want we can disable the Lezer tests for now, we open an issue about replacing them with Jest or other similar framework and merge this! What do you think?

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

Sounds good to me! I'm just a tiny tad confused why all these problems are suddenly cropping up on this branch, and not the main one...

@hasan-sh
Copy link
Contributor

hasan-sh commented Jan 4, 2024

Sounds good to me! I'm just a tiny tad confused why all these problems are suddenly cropping up on this branch, and not the main one...

I believe it's because in main we run a different command. Namely, we don't compile ts first and then run esbuild, as task_typescript now does:

hedy/dodo.py

Lines 160 to 163 in 20c8d4f

['npx', 'tsc', '--outDir', '__tmp__'],
# Then bundle JavaScript into a single bundle
['npx', 'esbuild', '__tmp__/static/js/index.mjs',

I'm not sure what happens in the generate-typescript bash file, but I do remember running esbuild on the index.ts file directly whenever there's a --tests flag, and also remember that you removed that flag from the cypresstests.yml file, assuming that only generating bundle files would solve the cypress problem. However, we somehow overlooked the fact that even if the flag wasn't passed, the generate-typescript file does does the same thing. https://github.com/hedyorg/hedy/blob/main/build-tools/heroku/generate-typescript#L73

I was also following your progress not understanding why this is happening, but this is the only and main difference that would make sense on why!

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 4, 2024

Ready to ship! 🎉

Copy link
Contributor

@hasan-sh hasan-sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazingly done 💯

Copy link
Contributor

mergify bot commented Jan 4, 2024

Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit c234719 into main Jan 4, 2024
11 checks passed
@mergify mergify bot deleted the add-build-tool branch January 4, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants