Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fashion MNIST using improve unified interface #108

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions test/fashion-mnist/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#INTRODUCTION
This directory contains the code to run the fashion-mnist example and explains the changes made to the original python notebook to make it work with IMPROVE.

The python notebook ***fashion-mnist.ipynb*** consists of 3 sections.
Please read the comments on top of that notebook.

Notice the changes marked by comment line *## IMPROVE*
in the following files:
#Running
```
PYTHONPATH=<PATH-TO-IMPROVE>:$PYTHONPATH
IMPROVE_DATA_DIR=<PATH-TO-IMPROVE-DATA-DIR>

python preprocess.py
python train.py (if you rerun this, it will use the checkpointed model and restart)

python infer.py
```

#NOTES
IMPROVE_DATA_DIR is the directory where the data is downloaded and preprocessed. It is also the place where the model checkpoints are stored. It also saves the model and inference output under the same directory.

All the above files read the configuration and hyperparameters from the file *fashion-mnist_default_model.txt*

The 3 subsections in *fashion-mnist_default_model.txt* are [preprocess], [train] and [infer]. The [preprocess] section is used by preprocess.py, [train] section is used by train.py and [infer] section is used by infer.py. Train has a few more parameters than the other two. It includes the checkpointing parameters. Checkpointing functions used in the model can be found [here]([text](https://candle-lib.readthedocs.io/en/latest/api_ckpt_pytorch_utils/_autosummary/candle.ckpt_pytorch_utils.CandleCkptPyTorch.html))

```
save_path='save/'
ckpt_save_best=True
ckpt_save_interval=1
```


Note that checkpointing is not enabled in the original python notebook, but it is enabled in train.py. See train.py for implementation details.

Directory Structure for IMPROVE:
IMPROVE_DATA_DIR should contain the following directory structure:
```
└── raw_data
├── splits
├── x_data
└── y_data
```
215 changes: 215 additions & 0 deletions test/fashion-mnist/fashion-mnist.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fashion MNIST Image Classification with PyTorch\n",
"\n",
"## Overview\n",
"The [IMPROVE Project](https://jdacs4c-improve.github.io/docs/index.html) focus on data standardization/quality/integration for a variety of cancer drug response problems. The intention with this example is to show how to modify such an example to comply with the [IMPROVE library:](https://github.com/JDACS4C-IMPROVE/IMPROVE). See the [Unified Interface Documentation](https://jdacs4c-improve.github.io/docs/content/unified_interface.html) \n",
"\n",
"Note: This model does not use any of the cancer data, it simply classifies images of clothing items from the Fashion MNIST dataset into 10 categories:\n",
"\n",
"T-shirt/top\n",
"Trouser\n",
"Pullover\n",
"Dress\n",
"Coat\n",
"Sandal\n",
"Shirt\n",
"Sneaker\n",
"Bag\n",
"Ankle boot\n",
"\n",
"## Requirements\n",
"Break the model's workflow into three primary steps:\n",
"\n",
"Preprocessing\n",
"Training\n",
"Inference\n",
"\n",
"## Preprocessing\n",
"\n",
"Load the Fashion MNIST dataset using PyTorch's torchvision.datasets.FashionMNIST class.\n",
"Split the data into training and validation sets.\n",
"Normalize pixel values to a range of [0, 1] and ensure them to have a mean of 0.5 and a standard deviation of 0.5.\n",
"This helps stabilize the training process and improve model convergence.\n",
"Use torchvision.datasets.FashionMNIST with ``Download=True`` to download the dataset locally.\n",
"\n",
"## Training\n",
"This section assumes that the data has been preprocessed and is available for training.\n",
"We first define a neural network architecture suitable for image classification (e.g., a convolutional neural network).\n",
"Instantiate the model with PyTorch's torch.nn modules.\n",
"Select a loss function: cross-entropy loss. Choose the optimizer SGD to update model parameters during training.\n",
"Iterate through training data in batches for a fixed number of epochs:\n",
"Feed a batch of images and labels to the model.\n",
"Calculate the loss based on the model's predictions and true labels.\n",
"Backpropagate the loss to update model parameters using the optimizer.\n",
"\n",
"## Inference\n",
"\n",
"Load the trained model's state.\n",
"For new, unseen images:\n",
"Preprocess them as done during training.\n",
"Pass the preprocessed images through the model to compute predictions and compare with the ground truth labels.\n",
"## Implementation Details\n",
"\n",
"Refer to the code for specific model architecture, hyperparameter choices, and implementation choices."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"\n",
"# PREPARE DATA\n",
"\n",
"# Define transformations for data preprocessing\n",
"transform = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.5,), (0.5,))\n",
"])\n",
"\n",
"# Load Fashion MNIST dataset\n",
"# NOTE: train=True for trainset and train=False for testset; dowload=True for both.\n",
"trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)\n",
"testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-- Epoch: 1, Loss: 1.027475444969338\n",
"-- Epoch: 2, Loss: 0.5553531648317126\n",
"-- Epoch: 3, Loss: 0.4895101859371291\n",
"-- Epoch: 4, Loss: 0.45323895183262797\n",
"-- Epoch: 5, Loss: 0.429904751845006\n",
"-- Epoch: 6, Loss: 0.41118921979721673\n",
"-- Epoch: 7, Loss: 0.3971544050458652\n",
"-- Epoch: 8, Loss: 0.3844668244454525\n",
"-- Epoch: 9, Loss: 0.37378820707040555\n",
"-- Epoch: 10, Loss: 0.3639335038978408\n",
"Training finished.\n"
]
}
],
"source": [
"# Part 2: Model Training\n",
"\n",
"# Create data loaders using the trainset and testset in Part 1, with batch size 64 and shuffle=True for trainset\n",
"trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)\n",
"testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)\n",
"\n",
"# Define the neural network model\n",
"class Net(nn.Module):\n",
" def __init__(self):\n",
" super(Net, self).__init__()\n",
" self.fc1 = nn.Linear(784, 256)\n",
" self.fc2 = nn.Linear(256, 128)\n",
" self.fc3 = nn.Linear(128, 10)\n",
"\n",
" def forward(self, x):\n",
" x = x.view(x.size(0), -1)\n",
" x = torch.relu(self.fc1(x))\n",
" x = torch.relu(self.fc2(x))\n",
" x = self.fc3(x)\n",
" return x\n",
"\n",
"# Create an instance of the model\n",
"model = Net()\n",
"\n",
"# Define loss function and optimizer\n",
"criterion = nn.CrossEntropyLoss()\n",
"optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n",
"\n",
"# Train the model\n",
"for epoch in range(10):\n",
" running_loss = 0.0\n",
" for i, data in enumerate(trainloader, 0):\n",
" inputs, labels = data\n",
"\n",
" optimizer.zero_grad()\n",
"\n",
" outputs = model(inputs)\n",
" loss = criterion(outputs, labels)\n",
" loss.backward()\n",
" optimizer.step()\n",
"\n",
" running_loss += loss.item()\n",
" # Print loss every 200 mini-batches\n",
" # if i % 200 == 199:\n",
" # print(f'Epoch: {epoch + 1}, Batch: {i + 1}, Loss: {running_loss / 200}')\n",
" # running_loss = 0.0\n",
" \n",
" loss = running_loss / len(trainloader)\n",
" print(f'-- Epoch: {epoch + 1}, Loss: {loss}')\n",
" \n",
"print('Training finished.')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy on test set: 85.62%\n"
]
}
],
"source": [
"# Part 3: Model Inferencing\n",
"\n",
"correct = 0\n",
"total = 0\n",
"with torch.no_grad():\n",
" for data in testloader:\n",
" images, labels = data\n",
" outputs = model(images)\n",
" _, predicted = torch.max(outputs.data, 1)\n",
" total += labels.size(0)\n",
" correct += (predicted == labels).sum().item()\n",
"\n",
"print(f'Accuracy on test set: {(100 * correct / total):.2f}%')\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Benchmarks",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
20 changes: 20 additions & 0 deletions test/fashion-mnist/fashion-mnist_default_model.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[Global_Params]
model_name="fashion-mnist"

[Preprocess]
data_dir="./data"
batch_size=32

[train]
data_dir="./data"
batch_size=32
learning_rate=0.001
epochs=10
momentum=0.9
save_path='save/'
ckpt_save_best=True
ckpt_save_interval=1

[infer]
data_dir="./data"
batch_size=32
115 changes: 115 additions & 0 deletions test/fashion-mnist/infer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Define the neural network model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 10)

def forward(self, x):
x = x.view(x.size(0), -1)
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x

## IMPROVE
from improve import framework as frm
import candle
from pathlib import Path

filepath = Path(__file__).resolve().parent


# Part 3: Model Testing
## IMPROVE
def run(params):
##

# Need to get testloader from Part 1.
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])

# Get the data directory, batch size and other hyperparameters from params
##IMPROVE
batch_size = params["batch_size"]
learning_rate = params["learning_rate"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these parameters for inference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajeeja I agree with @wilke, it's weird that these parameters are defined in inference script. Generally, inference is done without knowledge of training settings (e.g., train batch size, learning rate optimizer, etc.). Is there a reason why these are defined here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adpartin @wilke

  1. we need to load the test data in batches -> Hence - batch_size (see line 50)
  2. I'm getting the model from the ckpt method, we can get it via other means also. The ckpt object needs optimizer to instantiate -> Hence - optimizer (see line 64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but it is unclear why you must load the data for inference in batches. Is this in any way faster than a simple

for v in File
  label=infer(v)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a better way of loading the model. We want the model from the input directory. No optimizer is needed. If this is a problem, I suggest writing a load_model_weights function as a wrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajeeja @adpartin If we use this as an example, we have to make it clean. This is a great hack but not a sustainable solution. Please come up with a better solution or hide it in a function call. These are constants in this case.

momentum = params["momentum"]
dataset_dir = params["data_dir"]

# NOTE: using false now for data loading
testset = torchvision.datasets.FashionMNIST(root=dataset_dir, train=False, download=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading test data for inference? I would expect to load the model weights.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 72, how are you going to get images to test?
outputs = model(images)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, or is this a problem of naming conventions? We are doing label prediction in this script no testing. Do you have a specific use case in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajeeja Any thoughts?


# Check if GPU is available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create a neural network model
model = Net().to(device)

# Define optimizer
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above :) we can get the model weights in some other fashion, do you know how else to get the model weights to perform inferece? - where optimizer or learning rate is not needed, it is a minor thing and can be ignored, IMO. The overall logic is to get the model weights and infer on what was done in the training step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, get the model weights and infer on any input data in the input data directory. The model weights should be located there as well. The outputs of training are model weights and learning metrics.


##IMPROVE
# Use CANDLE checkpointing to load the model weights for inferencing
ckpt = candle.CandleCkptPyTorch(params)
ckpt.set_model({"model": model, "optimizer": optimizer})
J = ckpt.restart(model)
##

correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f'Accuracy on test set: {(100 * correct / total):.2f}%')




## IMPROVE
# Note some of these are similar to previous section and may be adjusted as per model requirements
model_infer_params = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parameter is for loading model weights?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loading it from the ckpt files, so no specific parameter.

this can be done by using a specific directory to save the model weights and load from there, something along the lines of test_ml_data_dir


{"name": "data_dir", # default
"type": str,
"help": "Directory containing the Fashion MNIST dataset.",
},
]

infer_params = [

{"name": "batch_size", # default
"type": int,
"help": "Batch size for creating data loaders.",
},
]

req_infer_args = [ll["name"] for ll in infer_params]

def main():
params = frm.initialize_parameters(
filepath,
default_model="fashion-mnist_default_model.txt",
additional_definitions=model_infer_params,
required=req_infer_args,
)
run(params)

if __name__ == "__main__":
main()

## END IMPROVE
Loading