-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data/benchmarks #422
data/benchmarks #422
Changes from all commits
2bd17bf
4e8f41c
0818eb6
590a20b
411167b
4fffddf
07739b2
fea3322
aec3a6f
621ca0b
1c06bae
9001b00
95d48ae
4b8da09
58101fe
18d6a2e
e6638c7
6cae2a3
cfa49aa
a933753
eed5203
cf54e20
8ace4ff
4933239
a09826d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Install dependencies | ||
|
||
``` | ||
pip install -r benchmarks/requirements.txt | ||
python setup.py develop | ||
``` | ||
|
||
# Usage instructions | ||
|
||
``` | ||
usage: run_benchmark.py [-h] [--dataset DATASET] [--model_name MODEL_NAME] [--batch_size BATCH_SIZE] [--device DEVICE] [--num_epochs NUM_EPOCHS] | ||
[--report_location REPORT_LOCATION] [--num_workers NUM_WORKERS] [--shuffle] [--dataloaderv DATALOADERV] | ||
``` | ||
|
||
## Available metrics | ||
|
||
- [x] Total time | ||
- [x] Time per batch | ||
- [x] Time per epoch | ||
- [x] Precision over time | ||
- [x] CPU Load | ||
- [x] GPU Load | ||
- [x] Memory usage | ||
|
||
## Additional profiling | ||
|
||
The PyTorch profiler doesn't work quite well with `torchdata` for now https://github.com/pytorch/kineto/issues/609 but | ||
there are other good options like `py-spy` or `scalene` which could be used like so `profiler_name run_benchmark.py` | ||
|
||
## Other benchmarks in the wild | ||
|
||
- https://github.com/pytorch/kineto/blob/main/tb_plugin/examples/datapipe_example.py | ||
- https://github.com/pytorch/text/tree/main/test/datasets | ||
- https://github.com/pytorch/vision/tree/main/torchvision/prototype/datasets |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
from dataclasses import dataclass, fields | ||
from enum import Enum | ||
|
||
from simple_parsing import ArgumentParser | ||
|
||
|
||
@dataclass(frozen=True) | ||
class BenchmarkConfig: | ||
dataset: str = "gtsrb" # TODO: Integrate with HF datasets | ||
model_name: str = "resnext50_32x4d" # TODO: torchvision models supported only | ||
batch_size: int = 1 | ||
device: str = "cuda:0" # Options are cpu or cuda:0 | ||
num_epochs: int = 1 | ||
report_location: str = "report.csv" | ||
num_wokers: int = 1 | ||
shuffle: bool = True | ||
dataloader_version: int = 1 # Options are 1 or 2 | ||
|
||
|
||
## Arg parsing | ||
def arg_parser(): | ||
parser = ArgumentParser() | ||
parser.add_arguments(BenchmarkConfig, dest="options") | ||
args = parser.parse_args() | ||
benchmark_config = args.options | ||
return benchmark_config | ||
|
||
|
||
if __name__ == "__main__": | ||
arg_parser() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
This folder contains templates that are useful for cloud setups | ||
|
||
Idea would be to provision a machine by configuring it in a YAML file and then running a benchmark script on it automatically. This is critical both for ad hoc benchmarking that are reproducible but also including real world benchmarks in a release. | ||
|
||
We've provided some useful `yml` templates for you to get started | ||
|
||
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-cli-creating-stack.html | ||
|
||
## Setup aws cli | ||
`aws configure` and enter your credentials | ||
|
||
## Setup stack (machine configuration) | ||
|
||
```sh | ||
aws cloudformation create-stack \ | ||
--stack-name torchdatabenchmark \ | ||
--template-body ec2.yml \ | ||
--parameters ParameterKey=InstanceTypeParameter,ParameterValue=p3.2xlarge ParameterKey=DiskType,ParameterValue=gp3 | ||
``` | ||
|
||
## Ssh into machine and run job | ||
``` | ||
ssh elastic_ip | ||
git clone https://github.com/pytorch/data | ||
cd data/benchmarks | ||
python run_benchmark.py | ||
``` | ||
|
||
Visually inspect logs | ||
|
||
## Shut down stack | ||
|
||
`aws cloudformation delete-stack --stack-name myteststack` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# This script sets up an Ec2 instance with elastic IP and a disk volume | ||
Parameters: | ||
InstanceTypeParameter: | ||
Type: String | ||
Default: c5n.large | ||
AllowedValues: | ||
- c5n.large | ||
- p2.2xlarge | ||
- p3.2xlarge | ||
- p3.8xlarge | ||
Description: Instance type CPU, GPU | ||
DiskSize: | ||
Type: Number | ||
Default: 100 | ||
Description: Disk size in GB | ||
DiskType: | ||
Type: String | ||
Default: gp2 | ||
AllowedValues: | ||
- gp2 | ||
- gp3 | ||
- io1 | ||
- io2 | ||
- sc1 | ||
- st1 | ||
- standard | ||
Description: Enter Disk type SSD, HDD | ||
|
||
Resources: | ||
MyInstance: | ||
Type: AWS::EC2::Instance | ||
Properties: | ||
AvailabilityZone: us-west-2a | ||
ImageId: ami-0306d46d05aaf8663 # Deep Learning AMI | ||
InstanceType: | ||
Ref: InstanceTypeParameter | ||
SecurityGroups: | ||
- !Ref SSHSecurityGroup | ||
|
||
# Elastic IP so I can easily ssh into the machine | ||
MyEIP: | ||
Type: AWS::EC2::EIP | ||
Properties: | ||
InstanceId: !Ref MyInstance | ||
|
||
# Open security group for SSH | ||
SSHSecurityGroup: | ||
Type: AWS::EC2::SecurityGroup | ||
Properties: | ||
GroupDescription: Enable SSH access via port 22 | ||
SecurityGroupIngress: | ||
- CidrIp: 0.0.0.0/0 | ||
FromPort: 22 | ||
IpProtocol: tcp | ||
ToPort: 22 | ||
|
||
|
||
NewVolume: | ||
Type: AWS::EC2::Volume | ||
Properties: | ||
Size: | ||
Ref: DiskSize | ||
VolumeType: | ||
Ref: DiskType | ||
AvailabilityZone: !GetAtt MyInstance.AvailabilityZone | ||
Tags: | ||
- Key: MyTag | ||
Value: TagValue | ||
DeletionPolicy: Snapshot | ||
|
||
MountPoint: | ||
Type: AWS::EC2::VolumeAttachment | ||
Properties: | ||
InstanceId: !Ref MyInstance | ||
VolumeId: !Ref NewVolume | ||
Device: /dev/sdh | ||
|
||
# # Volume | ||
# SSD: | ||
# Type: AWS::EC2::VolumeAttachment | ||
# Properties: | ||
# InstanceId: !Ref MyInstance | ||
|
||
# HDD: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
from torchvision import transforms, datasets | ||
import torch | ||
from abc import ABC, abstractmethod | ||
|
||
class DataPipeReadyBenchmark(ABC): | ||
@abstractmethod | ||
def prepare_pipe(self, params): | ||
return NotImplementedError | ||
|
||
class GTSRBReadyBenchmark(DataPipeReadyBenchmark): | ||
def transform(img): | ||
t= transforms.Compose([ | ||
transforms.ToPILImage(), | ||
transforms.Resize(size=(100,100)), | ||
transforms.ToTensor()] | ||
) | ||
return t(img) | ||
|
||
def str_to_list(str): | ||
l = [] | ||
for char in str: | ||
l.append(int(char)) | ||
return l | ||
|
||
def prepare_pipe(self, params): | ||
batch_size, device, dp = params | ||
# Filter out bounding box and path to image | ||
dp = dp.map(lambda sample : {"image" : sample["image"], "label" : sample["label"]}) | ||
|
||
# Apply image preprocessing | ||
dp = dp.map(lambda sample : transform(sample.decode().to(torch.device(device))), input_col="image") | ||
dp = dp.map(lambda sample : torch.tensor(str_to_list(sample.to_categories())).to(torch.device(device)), input_col="label") | ||
|
||
# Batch | ||
dp = dp.batch(batch_size) | ||
return dp | ||
|
||
class HuggingFaceReadyBenchmark(DataPipeReadyBenchmark): | ||
def prepare(self, dataset_name): | ||
return NotImplementedError |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
import csv | ||
from abc import ABC, abstractclassmethod | ||
from dataclasses import dataclass, fields | ||
from statistics import mean | ||
from typing import Dict, list, tuple | ||
|
||
import numpy as np | ||
|
||
duration = int | ||
|
||
|
||
@dataclass | ||
class MetricCache: | ||
epoch_durations: list[duration] | ||
batch_durations: list[duration] | ||
total_duration: int = 0 | ||
|
||
|
||
class MetricExporter(ABC): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following up on our discussion from Monday, I took a look at this and I think it should be reasonably simple to plug it into the other |
||
@abstractclassmethod | ||
def export(self, metric_cache: MetricCache) -> None: | ||
return NotImplementedError | ||
|
||
def calculate_percentiles(self, metric_cache: MetricCache) -> Dict[str, float]: | ||
output = {} | ||
for field in fields(metric_cache): | ||
duration_list = getattr(metric_cache, field.name) | ||
percentiles = [ | ||
np.percentile(duration_list, 0.5), | ||
np.percentile(duration_list, 0.9), | ||
np.percentile(duration_list, 0.99), | ||
] | ||
output[field.name] = percentiles | ||
return output | ||
|
||
|
||
class StdOutReport(MetricExporter): | ||
@staticmethod | ||
def export(self, metric_cache): | ||
percentiles_dict = metric_cache.calculate_percentiles() | ||
for field, percentiles in percentiles_dict.items: | ||
print(f"{field} duration is {percentiles}") | ||
|
||
|
||
class CSVReport(MetricExporter): | ||
@staticmethod | ||
def export(self, metric_cache: MetricCache, filepath: str): | ||
percentiles_dict = metric_cache.calculate_percentiles() | ||
with open(filepath, "w") as file: | ||
writer = csv.writer(file) | ||
for field, percentiles in percentiles_dict.items: | ||
writer.writerow(field + percentiles) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--extra-index-url https://download.pytorch.org/whl/nightly/cu113 | ||
simple-parsing | ||
dill | ||
numpy | ||
torch | ||
torchvision | ||
torchaudio | ||
torchtext | ||
transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NivekT what do you think about this kind of reporting instead? Probably still a few bugs around but just wanna make sure I'm not overengineering things