Skip to content

Commit

Permalink
Adjust local dataset defaults (#156)
Browse files Browse the repository at this point in the history
* Add default to download script and adjust yamls

Co-authored-by: dblalock <davis@mosaicml.com>
  • Loading branch information
Landanjs and dblalock authored Feb 11, 2023
1 parent 370a2a6 commit 47d5086
Show file tree
Hide file tree
Showing 5 changed files with 14 additions and 10 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Data
my-copy-c4/
examples/deeplab/ade20k
*.jsonl*

# WandB
Expand Down
4 changes: 2 additions & 2 deletions examples/deeplab/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,11 @@ Now that you have explored the code, let's jump into the prerequisites for train

## Prepare your data

This benchmark assumes that [ADE20k Dataset](https://groups.csail.mit.edu/vision/datasets/ADE20K/) is already stored on your local machine or stored in an S3 bucket after being processed into a streaming dataset. ADE20K can be downloaded by running:
This benchmark assumes that [ADE20k Dataset](https://groups.csail.mit.edu/vision/datasets/ADE20K/) is already stored on your local machine or stored in an S3 bucket after being processed into a streaming dataset. ADE20K can be downloaded by running the command below. This takes up about 1GB of storage and will default to storing the dataset in `./ade20k`.

```bash
# download ADE20k to specified local directory
python download_ade20k.py path/to/data
python download_ade20k.py
```

To convert ADE20k to a [streaming format](https://github.com/mosaicml/streaming) for efficient training from an object store like S3, use [this script](https://github.com/mosaicml/streaming/blob/main/streaming/vision/convert/ade20k.py).
Expand Down
5 changes: 4 additions & 1 deletion examples/deeplab/download_ade20k.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,10 @@

parser = argparse.ArgumentParser()

parser.add_argument('path', help='ADE20k Download directory.', type=str)
parser.add_argument('--path',
help='ADE20k Download directory.',
type=str,
default='./ade20k')

args = parser.parse_args()

Expand Down
12 changes: 6 additions & 6 deletions examples/deeplab/yamls/deeplabv3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ model:

# Training Dataset Parameters
train_dataset:
is_streaming: true # If true, use streaming dataset
path: s3://my-bucket/ade20k # Path to S3 bucket if streaming, otherwise path to local data directory
local: /tmp/mds-cache/mds-ade20k/ # Local cache when streaming data
is_streaming: false # If true, use streaming dataset
path: ./ade20k/ # Path to S3 bucket if streaming, otherwise path to local data directory
local: null # Local cache when streaming data
base_size: 512 # Initial size of the image and target before other augmentations
min_resize_scale: 0.5 # The minimum value the samples can be rescaled
max_resize_scale: 2.0 # The maximum value the samples can be rescaled
Expand All @@ -27,9 +27,9 @@ train_dataset:

# Validation Dataset Parameters
eval_dataset:
is_streaming: true # If true, use streaming dataset
path: s3://my-bucket/ade20k # Path to S3 bucket if streaming, otherwise path to local data directory
local: /tmp/mds-cache/mds-ade20k/ # Local cache when streaming data
is_streaming: false # If true, use streaming dataset
path: ./ade20k/ # Path to S3 bucket if streaming, otherwise path to local data directory
local: null # Local cache when streaming data
base_size: 512 # Initial size of the image and target before other augmentations
min_resize_scale: 0.5 # The minimum value the samples can be rescaled
max_resize_scale: 2.0 # The maximum value the samples can be rescaled
Expand Down
2 changes: 1 addition & 1 deletion examples/deeplab/yamls/mcloud_run.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ integrations:
command: |
cd examples/examples/deeplab
composer main.py /mnt/config/parameters.yaml
# Configuration copied from baseline.yaml
# Configuration similar to baseline.yaml
parameters:
run_name: deeplabv3_ade20k # Name of the training run used for checkpointing and other logging
is_train: true # Trains the model if true, otherwise runs evaluation
Expand Down

0 comments on commit 47d5086

Please sign in to comment.