Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major update with CM automation for the latest ABTF model, Cognata dataset and loadgen #5

Merged
merged 37 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
0a24740
CM repo meta
gfursin Apr 10, 2024
f92b9f4
first commit
gfursin Apr 10, 2024
2dc1954
various fixes
gfursin Apr 10, 2024
2d95420
clean up
gfursin Apr 10, 2024
bdc534c
added README
gfursin Apr 10, 2024
75c4d70
Updated docs
gfursin Apr 10, 2024
2c2a0db
fixed typo
gfursin Apr 10, 2024
4401e1e
fixed typos in READMEs
gfursin Apr 10, 2024
7829237
improving readmes
gfursin Apr 10, 2024
7587490
clean up
gfursin Apr 10, 2024
2eeb1e4
clean up docs
gfursin Apr 10, 2024
96b2de6
Merge branch 'dev' of https://github.com/mlcommons/cm4abtf into dev
gfursin Apr 10, 2024
0304d25
clean up
gfursin Apr 10, 2024
84421a1
clean up
gfursin Apr 10, 2024
d4f0df0
clean up
gfursin Apr 10, 2024
90efeec
fixed _repo tag
gfursin Apr 10, 2024
47fe251
Merge branch 'dev' of https://github.com/mlcommons/cm4abtf into dev
gfursin Apr 10, 2024
193d090
fixed abtf model meta
gfursin Apr 10, 2024
e87f4d9
clean up readme
gfursin Apr 10, 2024
d91a51a
clean up
gfursin Apr 10, 2024
e6fb6d2
working on container demo
gfursin Apr 12, 2024
e92470a
working on container demo
gfursin Apr 12, 2024
4d8ddce
Merge branch 'dev' of https://github.com/mlcommons/cm4abtf into dev
gfursin Apr 12, 2024
de0a6eb
moved --gpus=all only for CUDA
gfursin Apr 13, 2024
7dffa8c
fixes to suppport containers
gfursin Apr 13, 2024
73fec8c
added deps on CM-MLOps repo
gfursin Apr 13, 2024
09e8c63
improving support for Docker
gfursin Apr 13, 2024
627de03
added requirement for GH private token and extra repo when using docker
gfursin Apr 15, 2024
6957460
clean up
gfursin Apr 15, 2024
ccf03d0
added default Docker base
gfursin Apr 15, 2024
7da54be
Merge branch 'dev' of https://github.com/mlcommons/cm4abtf into dev
gfursin Apr 15, 2024
d8f8d93
added CM repo deps and antideps
gfursin Apr 17, 2024
c179e42
updated deps
gfursin Apr 17, 2024
ab848c8
working on Cognata dataset
gfursin Apr 19, 2024
1140a91
created cognata automation
gfursin Apr 19, 2024
4584c2c
update
gfursin Apr 19, 2024
932166b
clean up
gfursin Apr 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
1 change: 1 addition & 0 deletions COPYRIGHT.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Copyright (c) 2024 MLCommons
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Collective Mind interface and automation for ABTF

This repository contains [CM scripts (cross-platform automation recipes)](https://github.com/mlcommons/ck)
to make it easier to prepare and benchmark different versions of ABTF models
(public or private) with MLPerf loadgen across different software and hardware.

* Run and benchmark reference ABTF model via CM (CPU and CUDA): [README](docs/test-abtf-model/README.md)
* Knowledge base: [README](docs/test-abtf-model/README-kb.md)
13 changes: 13 additions & 0 deletions cmr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
alias: mlcommons@cm4abtf
uid: 566d31eda11948a9

git: true

deps:
- alias: mlcommons@cm4mlops
uid: 9e97bb72b0474657

- alias: mlcommons@ck
uid: a4705959af8e447a
conflict: True

88 changes: 88 additions & 0 deletions docs/test-abtf-model/README-cuda.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
[ [Back to the main page](README.md) ]


## Prepare workflow to benchmark ABTF model on CUDA-based device

### Prerequisites

* We expect that you already have CUDA driver installed
* Tested with PyTorch 2.2.2 and CUDA 11.8 and 12.1



### Detect or install CUDA toolkit and libraries

```bash
cmr "get cuda _toolkit _cudnn"
cmr "get cuda-devices"
```

### Build MLPerf loadgen

```bash
cmr "get mlperf inference loadgen _copy" --version=main
```


### Install or detect PyTorch and PyTorchVision

#### CUDA 11.8

```bash
cmr "get generic-python-lib _torch_cuda" --extra-index-url=https://download.pytorch.org/whl/cu118 --force-install
cmr "get generic-python-lib _torchvision_cuda" --extra-index-url=https://download.pytorch.org/whl/cu118 --force-install
```

#### CUDA 12.1

```bash
cmr "get generic-python-lib _torch_cuda" --extra-index-url=https://download.pytorch.org/whl/cu121 --force-install
cmr "get generic-python-lib _torchvision_cuda" --extra-index-url=https://download.pytorch.org/whl/cu121 --force-install
```





## Test Model with a test image

```bash
cmr "test abtf ssd-resnet50 cognata pytorch _cuda" --input=0000008766.png --output=0000008766_prediction_test.jpg --config=baseline_8MP_ss_scales --num-classes=13
```

## Benchmark model with MLPerf loadgen

```bash
cmr "generic loadgen python _pytorch _cuda _custom _cmc" --samples=5 --modelsamplepath=0000008766.png.cuda.pickle --modelpath=baseline_8mp_ss_scales_ep15.pth --modelcfg.num_classes=13 --modelcfg.config=baseline_8MP_ss_scales
```


## Benchmarking other models

Other ways to download public or private model code and weights:
```bash
cmr "get ml-model abtf-ssd-pytorch _skip_weights" --adr.abtf-ml-model-code-git-repo.env.CM_ABTF_MODEL_CODE_GIT_URL=https://github.com/mlcommons/abtf-ssd-pytorch
cmr "get ml-model abtf-ssd-pytorch _skip_weights" --model_code_git_url=https://github.com/mlcommons/abtf-ssd-pytorch --model_code_git_branch=cognata-cm
cmr "get ml-model abtf-ssd-pytorch _skip_weights _skip_code"
```

Other ways to run local (private) model:

You can first copy ABTF model code from GitHub to your local directory `my-model-code`.

```
cmr "generic loadgen python _pytorch _cuda _custom _cmc" --samples=5 --modelsamplepath=0000008766.png.cpu.pickle \
--modelpath=baseline_8mp_ss_scales_ep15.pth \
--modelcfg.num_classes=13 \
--modelcodepath="my-model-code" \
--modelcfg.config=baseline_8MP_ss_scales
```





## Feedback

Join MLCommons discord or get in touch with developer: gfursin@cknowledge.org

17 changes: 17 additions & 0 deletions docs/test-abtf-model/README-kb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Issues

## Weird case on Windows

### 20240410: Grigori

If I download baseline_8mp_ss_scales_ep15.pth to the ROOT directory with the virtual environment,
pip stops working since it considers this file as a broken package ...


# Misc commands

Register local ABTF model in CM cache to be the default

```bash
cmr "get ml-model abtf-ssd-pytorch _local.baseline_8mp_ss_scales_ep15.pth"
```
Loading
Loading