Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial docker setup in core ODC repo #382

Merged
merged 16 commits into from
Mar 14, 2018

Conversation

alexgleith
Copy link
Contributor

@alexgleith alexgleith commented Mar 7, 2018

Reason for this pull request

This PR includes a suggested way to include a Dockerfile and docker-compose configuration for development.

Proposed changes

  • Add a Dockerfile, which uses the standard install procedure on top of a Ubuntu 16.04 image
  • Add a docker-compose file, which enables a dev environment with standard dependencies from the above Docker image
  • Changes a dependency for the recommended one (psycopg2 for psycopg2-binary)
  • Add a .dockerignore file

Edit: I think this is ready to go now.

  • Tests added / passed
  • Fully documented, including docs/about/whats_new.rst for all changes

Alex Leith added 4 commits March 7, 2018 18:55
This is required to stop the warnings being printed to stdout
The Dockerfile uses the standard install process from the readme, and uses the local files in the repo to install from.
This is a first step in setting up a docker-compose file to use in interactive development.
@coveralls
Copy link

coveralls commented Mar 7, 2018

Coverage Status

Coverage remained the same at 82.639% when pulling 81feb1a on crc-si:docker-setup into 5b4e669 on opendatacube:develop.

@rtaib
Copy link
Contributor

rtaib commented Mar 7, 2018

+1
I'll look into it next week.

datacube:
build: .
volumes:
- ./:/code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we using a volume for code or are we copying code during build?
Seem to be doing both.

Copy link
Contributor Author

@alexgleith alexgleith Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Kirill, the docker-compose file uses the Dockerfile as a base, so that dependencies are already installed, and then mounts in the volume so that it has your local changes. This means you can change your code locally, and run it in Docker with the changes without rebuilding the image and creating a new container.

Note that it's using the path to the cli_app.py so that it is not using the installed version of the Open Data Cube, it's using the one from your local directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but won't it trigger rebuild anyway, since you changed folder content and that get's copied into the docker, so from COPY down things will need to be redone anyway?

Also we use python "entry points" (https://amir.rachum.com/blog/2017/07/28/python-entry-points/), how this will interact with the proposed setup?

Copy link
Contributor Author

@alexgleith alexgleith Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Docker Compose, the first build is mandatory, but later builds are manually triggered (with docker-compose build). So no, it won't trigger a new build.

I'm aware of the entry points in the setup.py file, and that's where I copied the main one from. I think a Docker Compose file can only have one in it, so unless we set up a whole new way of doing it, the other commands will need to be referenced by their full path. That'll probably need to be documented.

Just FYI, if you use this docker-compose file, the entrypoint commands (like datacube and pixeldrill) will still work, but if you use them, they will point to the installed version, and not the version mounted by the volume. That's a bit of a gotcha. To run code that you have changed, you need to use the path, like /code/datacube/scripts/cli_app.py.

Dockerfile Outdated
# Get the code, and put it in /opt
ENV APPDIR=/code
RUN mkdir -p $APPDIR
COPY . $APPDIR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure about "getting code from working copy strategy".
I'd rather fetch directly from github with a hash/tag/branch supplied as build argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept here is that the Dockefile is in with the code, so that it can be automatically built using Docker Cloud. We can configure tagged builds that are in sync with this repo automatically that way.

At the moment, I'm thinking we have a latest tag for the docker image coming out of the develop branch (that should always build and work, right?) and then tagged builds from releases.

It was suggested that we do the install from PIP, since that's the 'point of truth'. But if we have the code local to the build, it's not necessary to go to GitHub and fetch it again.

@alexgleith
Copy link
Contributor Author

In thinking about this, the docker-compose file should really bring up Postgres too, as well as passing in DB username and password parameters. Are there any other requirements for a fully functioning development environment? Maybe a /data directory mounted in as a volume?

Copy link
Member

@omad omad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Alex. I think the main thing missing now is documentation. Including a Dockerfile is a good start, but new users are going to need guidance in how to use it, and what is included..

Adopting Docker as a Development and Test environment sounds reasonable to me, initially as an alternative to the current conda methods.

Eventually we could consider making Docker the primary supported development environment, maybe.

setup.py Outdated
@@ -84,7 +84,7 @@
'jsonschema',
'netcdf4',
'numpy',
'psycopg2',
'psycopg2-binary',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still depend on psycopg2. The binary package is available as an installation option, but, not a package that should be depended on.

By recent discussions at psycopg2 issue 674, it also doesn't sound to be completely resolved.

It would be totally fine to pip install psycopg2-binary in the Dockerfile, but leave the dependency alone.

@alexgleith
Copy link
Contributor Author

As discussed today, I've updated this PR with the following changes:

  • Removed docker-compose file. If we want to set up a docker-compose development environment, then let's do it in a separate PR with more thought
  • Changed how to handle the Psycopg2 warnings, by installing it using @Kirill888's PIP line (see Dockerfile)
  • Added a line to the docs for "what's new". I think we should do more documentation later, when there's a reference implementation. We can then write it up in docs or point to another location in the readme, I think. If there is somewhere we should be documenting, let me know.
  • Note that I didn't do any cleanup, and Docker is doing apt-clean after each install, so we don't need to do that. Since you can't make an image smaller by removing things. The image is big because it has some big dependencies.

@alexgleith
Copy link
Contributor Author

Another change. It now has an entrypoint script that sets up the config file. This does need documenting... @omad (or others), where should I document it?

@omad
Copy link
Member

omad commented Mar 14, 2018

Just add something to the bottom of the README. I don't think inflicting the rest of the docs on you is necessary, yet. :) Also, I'm not quite sure where in there to put it.

@alexgleith
Copy link
Contributor Author

I've added something to the readme. Took me a little while to realise restructured text is different to markdown!

@alexgleith
Copy link
Contributor Author

Ok, folks, how are we going with this? It would be great if we can get it merged in!

Copy link
Member

@omad omad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start. Thanks Alex.

I'm sure there'll be issues and improvements required. But it's all new, so there's no chance of negatively impacting anything and is safe to merge.

@omad omad merged commit bcab111 into opendatacube:develop Mar 14, 2018
@roarmstrong
Copy link
Contributor

Might be worth cleaning the pip cache with rm -rf $HOME/.cache/pip in each layer that installs with pip. Cuts the size of the final image down by ~150MB

@alexgleith
Copy link
Contributor Author

Aw, @roarmstrong, you were too late! That's a great idea, though.

There are probably a few other things to consider, such as explicitly adding only the files that we need for each step, and removing them after.

Also, I will set the auto-build up on the develop branch, building to the lastest tag on Docker Hub. But at some point (when the next release goes into stable, the latest build should probably be switched to there, and a develop tag built from the develop branch. Later though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants