Skip to content

Commit

Permalink
Fix Dockerfile to inherit from a CUDA-enabled image, use 4GB of RAM, …
Browse files Browse the repository at this point in the history
…and edit PYTHONPATH.

Closes #2, #13, #14.
  • Loading branch information
alexpolozov committed Aug 15, 2020
1 parent b61e39c commit 4a013a9
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 9 deletions.
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.7-slim
FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-devel

ENV LC_ALL=C.UTF-8 \
LANG=C.UTF-8
Expand Down Expand Up @@ -42,6 +42,9 @@ RUN mkdir -p /mnt/data && \
ln -snf /mnt/data/wikisql wikisql

# Convert all shell scripts to Unix line endings, if any
RUN /bin/bash -c 'if compgen -G "/app/**/*.sh" > /dev/null; then dos2unix /app/**/*.sh; fi'
RUN /bin/bash -c 'if compgen -G "/app/**/*.sh" > /dev/null; then dos2unix /app/**/*.sh; fi'

# Extend PYTHONPATH to load WikiSQL dependencies
ENV PYTHONPATH="/app/third_party/wikisql/:${PYTHONPATH}"

ENTRYPOINT bash
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,18 @@ If you use RAT-SQL in your work, please cite it as follows:
}
```

## Changelog

**2020-08-14:**
- The Docker image now inherits from a CUDA-enabled base image.
- Clarified memory and dataset requirements on the image.

## Usage

### Step 1: Download third-party datasets & dependencies

Download the datasets: [Spider](https://yale-lily.github.io/spider) and [WikiSQL](https://github.com/salesforce/WikiSQL). Unpack them somewhere outside this project to create the following directory structure:
Download the datasets: [Spider](https://yale-lily.github.io/spider) and [WikiSQL](https://github.com/salesforce/WikiSQL). In case of Spider, make sure to download the `08/03/2020` version or newer.
Unpack the datasets somewhere outside this project to create the following directory structure:
```
/path/to/data
├── spider
Expand Down Expand Up @@ -57,13 +64,11 @@ It assumes that you mount the datasets downloaded in Step 1 as a volume `/mnt/da
Thus, the environment setup for RAT-SQL is:
``` bash
docker build -t ratsql .
docker run --rm -v /path/to/data:/mnt/data -it ratsql
docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql
```

Within the image, add the location of WikiSQL scripts to PYTHONPATH so that their internal imports can be resolved by Python:
``` bash
export PYTHONPATH=/app/third_party/wikisql/:$PYTHONPATH
```
Note that the image requires at least 4 GB of RAM to run preprocessing.
By default, [Docker Desktop for Mac](https://hub.docker.com/editions/community/docker-ce-desktop-mac/) and [Docker Desktop for Windows](https://hub.docker.com/editions/community/docker-ce-desktop-windows) run containers with 2 GB of RAM.
The `-m4g` switch overrides it; alternatively, you can increase the default limit in the Docker Desktop settings.

> If you prefer to set up and run the codebase without Docker, follow the steps in `Dockerfile` one by one.
> Note that this repository requires Python 3.7 or higher and a JVM to run [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/).
Expand Down

0 comments on commit 4a013a9

Please sign in to comment.