Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dev container configuration #34

Merged
merged 5 commits into from
Aug 3, 2023

Conversation

EliahKagan
Copy link
Collaborator

@EliahKagan EliahKagan commented Aug 1, 2023

This adds a dev container configuration for trying out or working on this project in a codespace or local dev container. The default dev container configuration using the universal image does not support this project well.

Why the default config doesn't support this project (click to expand if interested)

Not every project requires such a configuration to be usable in a dev container. When no devcontainer.json file is present, a universal image is used, which is a large image with tools for many languages and frameworks. Furthermore, it can be better to use the universal image than a custom image, if the custom image would have a longer startup time, as is often the case.

Unfortunately, the universal image is unsuitable for working on this project because the Python implementation it ships is built without lzma support. npc-gzip can be used as a library where lzma support is absent, so long as it is not needed. But it is needed for some unit tests to pass (as well as to actually do anything with the lzma compressor). Furthermore, because users often use a codespace to get a feel for a project, all of a project's core functionality should be usable in a dev container, where feasible. The lack of lzma support was helpful when reviewing #30 because it allowed me to verify that the error-handling logic for that condition was working, but for any other goal it is a disadvantage.

Another reason this project benefits from a custom devcontainer.json is that the most popular way to use dev containers is in GitHub Codespaces with small codespaces not offering GPU access. Nothing in this project requires a GPU--that's one of the benefits of the technique presented in the paper and implemented in this codebase--but the modules in examples use PyTorch. On GNU/Linux systems, installing torch from PyPI provides a build that expects CUDA libraries to be installed and raises ImportError when they are absent. One of the goals of dev containers is to be able to get up and running with minimal effort, and providing a custom devcontainer.json enables a workaround for this issue to be applied.

This adds a custom devcontainer.json using the Debian 12 based Python 3.11 image, which is one of the prebuilt Python images. Unlike the universal image, python in that image is built with lzma support, and all tests are able to run and pass. I also included a script run on dev container creation that takes care of installing poetry with useful plugins, using it to install the project, and modifying the resulting environment with a compatible torch package so that the code in examples can also be imported and run.

This supports using the new codebase, including the examples, with minimal effort. Although it would be useful for the dev container to also support using the old codebase with minimal effort, this does not attempt that, because:

  • I think it may be better to have a separate selectable dev container configuration for it, especially if its dependencies are to be automatically installed during container creation.
  • More importantly, installing the torch dependency--for old_codebase, not for the new code--uses more RAM than a default codespace provides. (I found that out when testing locally while limiting memory usage, and also on a separate branch with a 3.10 container, since the old codebase requirements.txt is not compatible with 3.11.)

This adds a dev container configuration for Python 3.11. The reason
for this is that the universal image used when creating a codespace
(or local dev container) for a repository that doesn't have a dev
container configuration ships a Python built without lzma. For this
project, that prevents some tests from being able to run, and also
prevents experimentation with lzma compression.
Instead of with the dev container feature.

This seems to be no slower, and often faster, and it makes it easy
to install plugins along with it. I've included two useful plugins.
This only affects the dev container. It consists of further
commands run in the script launched by the onCreateCommand.

The core package does not depend on torch (PyTorch), but the
examples do. By default, on GNU/Linux systems, torch expects to use
CUDA. This installs the CPU build of torch, replacing the one
poetry picks (because the GNU/Linux CPU build is not from PyPI).

Modifying pyproject.toml or poetry.lock to achieve this would get
in the way of using this dev container for development. To avoid
that, I use the somewhat ugly technique of running pip in the
poetry-managed environment.

It is possible to use a GPU in a dev container, and it's possible
someone is using this project together with work that benefits from
one. But this project does not require a GPU, and most use of dev
containers is in codespaces that don't offer a GPU by default and
where a codespace supporting GPU computations is more expensive.
So having PyTorch use the CPU to do computations is a reasonable
default, for a dev container whose chief purpose is to allow people
to get up and running fast.
This removes some VS Code extensions from the list in
devcontainer.json whose advantages don't clearly outweigh their
disadvantages in a dev container for this project.

That doesn't keep people from using those extensions in the dev
container if they install them, nor does it prevent them from being
installed automatically if they would be for some other reason.

Extensions I was listing before and have removed:

- markdown-preview-github-styles: Makes Markdown render the same as
  on GitHub, but we don't have a lot of Markdown, and also it will
  not always render the initial readme preview in this style, as
  that preview is shown before all extensions finish installing.

- vscode-github-actions: This extension is experimental and shows
  a lot of spurious warnings as well as indenting incorrectly (at
  least in this project).

- jupyter: Jupyter is very widely used and many peopele are likely
  to use it, even though this project doesn't ship any notebooks.
  But the extension increases load time for the dev container, and
  using Jupyter with this project already requires further changes
  because no dependency group in pyproject.toml lists ipykernel
  (nor any package that pulls it in as a dependency).

Extensions I have kept:

- gitlens: Extremely popular. Gives lots of useful Git information.

- git-graph: Visual history of commits. Especially helpful for
  projects that mostly use plain merges (rather than "rebase and
  merge" or "squash and merge").

- isort: Editor integration for isort.

- python: Self-explanatory.

- even-better-toml: Syntax highlighting for pyproject.toml.
@bazingagin
Copy link
Owner

wow this is very considerate @EliahKagan ! thanks!

@bazingagin bazingagin merged commit b737fb0 into bazingagin:main Aug 3, 2023
@EliahKagan
Copy link
Collaborator Author

No problem--I find it useful myself!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants