HET

This repository has been deactivated and merged to Hetu. Currently, HET is one of the supported features of Hetu.

A distributed deep learning framework for huge embedding model training (previouly named Athena). HET is developed by DAIR Lab at Peking University. This is a previewed version for the reviewers to verify our reproducibility and the whole system is not fully released. If you have any questions, please email to xupeng.miao@pku.edu.cn

Installation

Clone the respository.
Edit the athena.exp file and set the environment path for python.

source athena.exp

CMake is used to compile Hetu. Generate the Makefile first:

conda install cmake # ensure cmake version >= 3.18
cp cmake/config.example.cmake cmake/config.cmake
# modify paths for CUDA, CUDNN, NCCL, MPI in cmake/config.cmake if necessary
mkdir build && cd build && cmake ..
# if nccl needed, please download nccl 2.7.8 and install.
# if hetu cache needed, please install pybind11: conda install pybind11.
# if GNN needed, please install metis.

Compile Athena by Makefile

# current directory is ./build/
make clean
make athena version=mkl -j 32
make athena version=gpu -j 32
# or: make athena version=all -j 32
make ps pslib -j 32 # for ps support
make mpi mpi_nccl -j 32 # for mpi-based allreduce, time-consuming
# btw: make -j32 does all the things

Install graphviz to support graph board visualization (not maintained, may deprecate)

sudo apt-get install graphviz
sudo pip install graphviz

Run some simple examples

Train logistic regression with gpu:

python tests/models_tests/main.py --model logreg --validate

Train a 3-layer mlp with cpu:

python tests/models_tests/main.py --model mlp --validate --gpu -1

Train a 3-layer mlp with gpu:

python tests/models_tests/main.py --model mlp --validate

Train a 3-layer cnn with cpu:

python tests/models_tests/main.py --model cnn_3_layers --validate --gpu -1

Train a 3-layer cnn with gpu:

python tests/models_tests/main.py --model cnn_3_layers --validate

Train a 3-layer mlp with allreduce on 2 gpus (use mpirun in open-mpi path):

path/to/deps/mpirun --allow-run-as-root -np 2 python tests/models_tests/allreduce_main.py --model mlp --validate

Train a 3-layer mlp with PS on 1 server and 2 workers (need to set configurations in json files):

# in scheduler process
python tests/models_tests/ps_main.py --model mlp --setting scheduler_conf.json
# in server process
python tests/models_tests/ps_main.py --model mlp --setting server_conf.json
# in worker1 process
python tests/models_tests/ps_main.py --model mlp --setting worker_conf.json --validate
# in worker2 process
python tests/models_tests/ps_main.py --model mlp --setting worker_conf_2.json --validate

Graphboard is on http://localhost:9997 during training. The port can be changed by the PORT of mnist_dlsys.py. (not maintained, may deprecate)

Evaluation on CTR and GNN tasks:

Please refer to our examples.

License

The entire codebase is under Apache-2.0 license

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
MNIST_data		MNIST_data
cmake		cmake
geometric		geometric
het_examples		het_examples
language_models		language_models
mkl-dnnl		mkl-dnnl
ps-lite		ps-lite
pstests		pstests
python		python
src		src
tests		tests
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
athena.exp		athena.exp
vldb2021_het.pdf		vldb2021_het.pdf
vldb2021_het_appendix.pdf		vldb2021_het_appendix.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HET

Installation

Evaluation on CTR and GNN tasks:

License

About

Releases

Packages

Languages

License

Hsword/Het

Folders and files

Latest commit

History

Repository files navigation

HET

Installation

Evaluation on CTR and GNN tasks:

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages