-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fashion MNIST using improve unified interface #108
base: develop
Are you sure you want to change the base?
Changes from all commits
6feca7a
231691a
5dc5f4f
6d93c27
f95a917
81e56a3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
#INTRODUCTION | ||
This directory contains the code to run the fashion-mnist example and explains the changes made to the original python notebook to make it work with IMPROVE. | ||
|
||
The python notebook ***fashion-mnist.ipynb*** consists of 3 sections. | ||
Please read the comments on top of that notebook. | ||
|
||
Notice the changes marked by comment line *## IMPROVE* | ||
in the following files: | ||
#Running | ||
``` | ||
PYTHONPATH=<PATH-TO-IMPROVE>:$PYTHONPATH | ||
IMPROVE_DATA_DIR=<PATH-TO-IMPROVE-DATA-DIR> | ||
|
||
python preprocess.py | ||
python train.py (if you rerun this, it will use the checkpointed model and restart) | ||
|
||
python infer.py | ||
``` | ||
|
||
#NOTES | ||
IMPROVE_DATA_DIR is the directory where the data is downloaded and preprocessed. It is also the place where the model checkpoints are stored. It also saves the model and inference output under the same directory. | ||
|
||
All the above files read the configuration and hyperparameters from the file *fashion-mnist_default_model.txt* | ||
|
||
The 3 subsections in *fashion-mnist_default_model.txt* are [preprocess], [train] and [infer]. The [preprocess] section is used by preprocess.py, [train] section is used by train.py and [infer] section is used by infer.py. Train has a few more parameters than the other two. It includes the checkpointing parameters. Checkpointing functions used in the model can be found [here]([text](https://candle-lib.readthedocs.io/en/latest/api_ckpt_pytorch_utils/_autosummary/candle.ckpt_pytorch_utils.CandleCkptPyTorch.html)) | ||
|
||
``` | ||
save_path='save/' | ||
ckpt_save_best=True | ||
ckpt_save_interval=1 | ||
``` | ||
|
||
|
||
Note that checkpointing is not enabled in the original python notebook, but it is enabled in train.py. See train.py for implementation details. | ||
|
||
Directory Structure for IMPROVE: | ||
IMPROVE_DATA_DIR should contain the following directory structure: | ||
``` | ||
└── raw_data | ||
├── splits | ||
├── x_data | ||
└── y_data | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Fashion MNIST Image Classification with PyTorch\n", | ||
"\n", | ||
"## Overview\n", | ||
"The [IMPROVE Project](https://jdacs4c-improve.github.io/docs/index.html) focus on data standardization/quality/integration for a variety of cancer drug response problems. The intention with this example is to show how to modify such an example to comply with the [IMPROVE library:](https://github.com/JDACS4C-IMPROVE/IMPROVE). See the [Unified Interface Documentation](https://jdacs4c-improve.github.io/docs/content/unified_interface.html) \n", | ||
"\n", | ||
"Note: This model does not use any of the cancer data, it simply classifies images of clothing items from the Fashion MNIST dataset into 10 categories:\n", | ||
"\n", | ||
"T-shirt/top\n", | ||
"Trouser\n", | ||
"Pullover\n", | ||
"Dress\n", | ||
"Coat\n", | ||
"Sandal\n", | ||
"Shirt\n", | ||
"Sneaker\n", | ||
"Bag\n", | ||
"Ankle boot\n", | ||
"\n", | ||
"## Requirements\n", | ||
"Break the model's workflow into three primary steps:\n", | ||
"\n", | ||
"Preprocessing\n", | ||
"Training\n", | ||
"Inference\n", | ||
"\n", | ||
"## Preprocessing\n", | ||
"\n", | ||
"Load the Fashion MNIST dataset using PyTorch's torchvision.datasets.FashionMNIST class.\n", | ||
"Split the data into training and validation sets.\n", | ||
"Normalize pixel values to a range of [0, 1] and ensure them to have a mean of 0.5 and a standard deviation of 0.5.\n", | ||
"This helps stabilize the training process and improve model convergence.\n", | ||
"Use torchvision.datasets.FashionMNIST with ``Download=True`` to download the dataset locally.\n", | ||
"\n", | ||
"## Training\n", | ||
"This section assumes that the data has been preprocessed and is available for training.\n", | ||
"We first define a neural network architecture suitable for image classification (e.g., a convolutional neural network).\n", | ||
"Instantiate the model with PyTorch's torch.nn modules.\n", | ||
"Select a loss function: cross-entropy loss. Choose the optimizer SGD to update model parameters during training.\n", | ||
"Iterate through training data in batches for a fixed number of epochs:\n", | ||
"Feed a batch of images and labels to the model.\n", | ||
"Calculate the loss based on the model's predictions and true labels.\n", | ||
"Backpropagate the loss to update model parameters using the optimizer.\n", | ||
"\n", | ||
"## Inference\n", | ||
"\n", | ||
"Load the trained model's state.\n", | ||
"For new, unseen images:\n", | ||
"Preprocess them as done during training.\n", | ||
"Pass the preprocessed images through the model to compute predictions and compare with the ground truth labels.\n", | ||
"## Implementation Details\n", | ||
"\n", | ||
"Refer to the code for specific model architecture, hyperparameter choices, and implementation choices." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import torch\n", | ||
"import torch.nn as nn\n", | ||
"import torch.optim as optim\n", | ||
"import torchvision\n", | ||
"import torchvision.transforms as transforms\n", | ||
"\n", | ||
"# PREPARE DATA\n", | ||
"\n", | ||
"# Define transformations for data preprocessing\n", | ||
"transform = transforms.Compose([\n", | ||
" transforms.ToTensor(),\n", | ||
" transforms.Normalize((0.5,), (0.5,))\n", | ||
"])\n", | ||
"\n", | ||
"# Load Fashion MNIST dataset\n", | ||
"# NOTE: train=True for trainset and train=False for testset; dowload=True for both.\n", | ||
"trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)\n", | ||
"testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"-- Epoch: 1, Loss: 1.027475444969338\n", | ||
"-- Epoch: 2, Loss: 0.5553531648317126\n", | ||
"-- Epoch: 3, Loss: 0.4895101859371291\n", | ||
"-- Epoch: 4, Loss: 0.45323895183262797\n", | ||
"-- Epoch: 5, Loss: 0.429904751845006\n", | ||
"-- Epoch: 6, Loss: 0.41118921979721673\n", | ||
"-- Epoch: 7, Loss: 0.3971544050458652\n", | ||
"-- Epoch: 8, Loss: 0.3844668244454525\n", | ||
"-- Epoch: 9, Loss: 0.37378820707040555\n", | ||
"-- Epoch: 10, Loss: 0.3639335038978408\n", | ||
"Training finished.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Part 2: Model Training\n", | ||
"\n", | ||
"# Create data loaders using the trainset and testset in Part 1, with batch size 64 and shuffle=True for trainset\n", | ||
"trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)\n", | ||
"testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)\n", | ||
"\n", | ||
"# Define the neural network model\n", | ||
"class Net(nn.Module):\n", | ||
" def __init__(self):\n", | ||
" super(Net, self).__init__()\n", | ||
" self.fc1 = nn.Linear(784, 256)\n", | ||
" self.fc2 = nn.Linear(256, 128)\n", | ||
" self.fc3 = nn.Linear(128, 10)\n", | ||
"\n", | ||
" def forward(self, x):\n", | ||
" x = x.view(x.size(0), -1)\n", | ||
" x = torch.relu(self.fc1(x))\n", | ||
" x = torch.relu(self.fc2(x))\n", | ||
" x = self.fc3(x)\n", | ||
" return x\n", | ||
"\n", | ||
"# Create an instance of the model\n", | ||
"model = Net()\n", | ||
"\n", | ||
"# Define loss function and optimizer\n", | ||
"criterion = nn.CrossEntropyLoss()\n", | ||
"optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n", | ||
"\n", | ||
"# Train the model\n", | ||
"for epoch in range(10):\n", | ||
" running_loss = 0.0\n", | ||
" for i, data in enumerate(trainloader, 0):\n", | ||
" inputs, labels = data\n", | ||
"\n", | ||
" optimizer.zero_grad()\n", | ||
"\n", | ||
" outputs = model(inputs)\n", | ||
" loss = criterion(outputs, labels)\n", | ||
" loss.backward()\n", | ||
" optimizer.step()\n", | ||
"\n", | ||
" running_loss += loss.item()\n", | ||
" # Print loss every 200 mini-batches\n", | ||
" # if i % 200 == 199:\n", | ||
" # print(f'Epoch: {epoch + 1}, Batch: {i + 1}, Loss: {running_loss / 200}')\n", | ||
" # running_loss = 0.0\n", | ||
" \n", | ||
" loss = running_loss / len(trainloader)\n", | ||
" print(f'-- Epoch: {epoch + 1}, Loss: {loss}')\n", | ||
" \n", | ||
"print('Training finished.')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 13, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Accuracy on test set: 85.62%\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# Part 3: Model Inferencing\n", | ||
"\n", | ||
"correct = 0\n", | ||
"total = 0\n", | ||
"with torch.no_grad():\n", | ||
" for data in testloader:\n", | ||
" images, labels = data\n", | ||
" outputs = model(images)\n", | ||
" _, predicted = torch.max(outputs.data, 1)\n", | ||
" total += labels.size(0)\n", | ||
" correct += (predicted == labels).sum().item()\n", | ||
"\n", | ||
"print(f'Accuracy on test set: {(100 * correct / total):.2f}%')\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Benchmarks", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
[Global_Params] | ||
model_name="fashion-mnist" | ||
|
||
[Preprocess] | ||
data_dir="./data" | ||
batch_size=32 | ||
|
||
[train] | ||
data_dir="./data" | ||
batch_size=32 | ||
learning_rate=0.001 | ||
epochs=10 | ||
momentum=0.9 | ||
save_path='save/' | ||
ckpt_save_best=True | ||
ckpt_save_interval=1 | ||
|
||
[infer] | ||
data_dir="./data" | ||
batch_size=32 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
import torch | ||
import torch.nn as nn | ||
import torch.optim as optim | ||
import torchvision | ||
import torchvision.transforms as transforms | ||
|
||
# Define the neural network model | ||
class Net(nn.Module): | ||
def __init__(self): | ||
super(Net, self).__init__() | ||
self.fc1 = nn.Linear(784, 256) | ||
self.fc2 = nn.Linear(256, 128) | ||
self.fc3 = nn.Linear(128, 10) | ||
|
||
def forward(self, x): | ||
x = x.view(x.size(0), -1) | ||
x = torch.relu(self.fc1(x)) | ||
x = torch.relu(self.fc2(x)) | ||
x = self.fc3(x) | ||
return x | ||
|
||
## IMPROVE | ||
from improve import framework as frm | ||
import candle | ||
from pathlib import Path | ||
|
||
filepath = Path(__file__).resolve().parent | ||
|
||
|
||
# Part 3: Model Testing | ||
## IMPROVE | ||
def run(params): | ||
## | ||
|
||
# Need to get testloader from Part 1. | ||
transform = transforms.Compose([ | ||
transforms.ToTensor(), | ||
transforms.Normalize((0.5,), (0.5,)) | ||
]) | ||
|
||
# Get the data directory, batch size and other hyperparameters from params | ||
##IMPROVE | ||
batch_size = params["batch_size"] | ||
learning_rate = params["learning_rate"] | ||
momentum = params["momentum"] | ||
dataset_dir = params["data_dir"] | ||
|
||
# NOTE: using false now for data loading | ||
testset = torchvision.datasets.FashionMNIST(root=dataset_dir, train=False, download=False, transform=transform) | ||
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Loading test data for inference? I would expect to load the model weights. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See line 72, how are you going to get images to test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand, or is this a problem of naming conventions? We are doing label prediction in this script no testing. Do you have a specific use case in mind? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rajeeja Any thoughts? |
||
|
||
# Check if GPU is available, else use CPU | ||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
|
||
# Create a neural network model | ||
model = Net().to(device) | ||
|
||
# Define optimizer | ||
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above :) we can get the model weights in some other fashion, do you know how else to get the model weights to perform inferece? - where optimizer or learning rate is not needed, it is a minor thing and can be ignored, IMO. The overall logic is to get the model weights and infer on what was done in the training step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, get the model weights and infer on any input data in the input data directory. The model weights should be located there as well. The outputs of training are model weights and learning metrics. |
||
|
||
##IMPROVE | ||
# Use CANDLE checkpointing to load the model weights for inferencing | ||
ckpt = candle.CandleCkptPyTorch(params) | ||
ckpt.set_model({"model": model, "optimizer": optimizer}) | ||
J = ckpt.restart(model) | ||
## | ||
|
||
correct = 0 | ||
total = 0 | ||
with torch.no_grad(): | ||
for data in testloader: | ||
images, labels = data | ||
outputs = model(images) | ||
_, predicted = torch.max(outputs.data, 1) | ||
total += labels.size(0) | ||
correct += (predicted == labels).sum().item() | ||
|
||
print(f'Accuracy on test set: {(100 * correct / total):.2f}%') | ||
|
||
|
||
|
||
|
||
## IMPROVE | ||
# Note some of these are similar to previous section and may be adjusted as per model requirements | ||
model_infer_params = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which parameter is for loading model weights? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm loading it from the ckpt files, so no specific parameter. this can be done by using a specific directory to save the model weights and load from there, something along the lines of |
||
|
||
{"name": "data_dir", # default | ||
"type": str, | ||
"help": "Directory containing the Fashion MNIST dataset.", | ||
}, | ||
] | ||
|
||
infer_params = [ | ||
|
||
{"name": "batch_size", # default | ||
"type": int, | ||
"help": "Batch size for creating data loaders.", | ||
}, | ||
] | ||
|
||
req_infer_args = [ll["name"] for ll in infer_params] | ||
|
||
def main(): | ||
params = frm.initialize_parameters( | ||
filepath, | ||
default_model="fashion-mnist_default_model.txt", | ||
additional_definitions=model_infer_params, | ||
required=req_infer_args, | ||
) | ||
run(params) | ||
|
||
if __name__ == "__main__": | ||
main() | ||
|
||
## END IMPROVE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these parameters for inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajeeja I agree with @wilke, it's weird that these parameters are defined in inference script. Generally, inference is done without knowledge of training settings (e.g., train batch size, learning rate optimizer, etc.). Is there a reason why these are defined here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adpartin @wilke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but it is unclear why you must load the data for inference in batches. Is this in any way faster than a simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a better way of loading the model. We want the model from the input directory. No optimizer is needed. If this is a problem, I suggest writing a load_model_weights function as a wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajeeja @adpartin If we use this as an example, we have to make it clean. This is a great hack but not a sustainable solution. Please come up with a better solution or hide it in a function call. These are constants in this case.