Distil Auto ML is an AutoML system that integrates with D3M
More specifically it is the TA2 system from Uncharted and Qntfy
Main repo is https://github.com/uncharted-distil/distil-auto-ml
The TA2 system can be built and started via docker-compose however several static files must be downloaded before hand.
Datasets to train on. These may be user created or many examples can be downloaded from https://datasets.datadrivendiscovery.org/d3m/datasets
To train only using the TA2 user generated datasets must be formatted in the same way as the public datasets
Static Files may be pretrained weights of a neural network model, or a simple dictionary mapping tokens to necessary ids. Pretty much anything extra needed to run a ML model within the pipelines.
To bulk download all static files within the D3M universe WARNING this may be quite large
docker-compose run distil bash
# cd /static && python3 -m d3m index download
One can also pick and choose which static files they wish to download via
python3 -m d3m primitive download -p d3m.primitives.path.of.Primitive -o /static
For more info on how static files integrate within D3M: https://datadrivendiscovery.org/v2020.11.3/tutorial.html#advanced-primitive-with-static-files
Once the static files and the dataset(s) you want to run on are downloaded
# symlink your datasets directory
ln -s ../datasets/seed_datasets_current seed_datasets_current`
# choose the dataset you want to run
export DATASET=185_baseball
# run it
docker-compose up distil
There are two testing TA3 systems also available via docker-compose:
# run the dummy-ta3 test suite
docker-compose up distil dummy-ta3
# run the simple-ta3 system, which will then be available in the browser at localhost:80
# this requires a directory named 'output' to exist, in addition to the seed_datasets_current directory
docker-compose up distil envoy simple-ta3
- Python 3.6
- Pip (Python 3.6 should come with it)
- virtualvenv
- Clone distil-auto-ml
git clone https://github.com/uncharted-distil/distil-auto-ml
- Install libraries on Linux
sudo apt-get install snappy-dev build-essential libopenblas-dev libcap-dev ffmpeg
- Install libraries on MacOS
brew install snappy cmake openblas libpcap ffmpeg
- Clone common-primitives
git clone https://gitlab.com/datadrivendiscovery/common-primitives.git
- Clone d3m-primitives
git clone https://github.com/cdbethune/d3m-primitives
- Clone d3m
git clone https://gitlab.com/datadrivendiscovery/d3m
- Clone distil-primitives
git clone https://github.com/uncharted-distil/distil-primitives
- Clone distil-primitives-contrib
git clone https://github.com/uncharted-distil/distil-primitives-contrib
- Change into the distil-auto-ml directory
cd distil-auto-ml
- To avoid package collision it is recommended to create a virtual environment
- If virtualenv is not installed. Install virtualenv now.
python3 -m pip install virtualenv
- Create the environment
python3 -m virtualenv env
- Activate the environment
source env/bin/activate
- Installing through server-requirements.txt Linux
pip install -r server-requirements.txt
- Installing through server-requirements.txt MacOS
CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install -r server-requirements.txt
- Install all the other repository dependencies IMPORTANT: if running on the CPU replace [gpu] with [cpu]
cd ..
cd d3m
pip install -e .\[gpu\]
cd ..
cd common-primitives
pip install -e .\[gpu\]
cd ..
cd distil-primitives
pip install -e .\[gpu\]
cd ..
cd d3m-primitives
pip install -e .\[gpu\]
cd ..
cd distil-primitives-contrib
pip install -e .\[gpu\]
pip install python-lzo hyppo==0.1.3 mxnet
pip install -e git+https://github.com/NewKnowledge/simon-d3m-wrapper.git#egg=SimonD3MWrapper
pip install -e git+https://gitlab.com/datadrivendiscovery/sklearn-wrap.git@dist#egg=sklearn_wrap
pip install -e git+https://github.com/usc-isi-i2/dsbox-primitives#egg=dsbox-primitives
pip install -e git+https://github.com/neurodata/primitives-interfaces#egg=jhu-primitives
# if error with enum and IntFlag try pip uninstall -y enum34
- MongoDB
Distil AutoML uses MongoDB as a backend store for it's internal hyperparameter tuning There are good instructions depending on your os from the official MongoDB Docs: https://docs.mongodb.com/manual/installation/
- Distil-auto-ml is ready for use
./run.sh
- generate pipelines
mkdir pipelines
python3 export_pipelines.sh
- Use D3M CLI to interface with distil-auto-ml
This section assumes the source has been successfully installed and the datasets have been downloaded. Launch d3m with the following arguments.
python3 d3m runtime -v {location/to/static_resources} -d {location/to/datasets/seed_datasets_current} fit-score
-r {..seed_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA_problem/problemDoc.json}
-i {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TRAIN/dataset_TRAIN/datasetDoc.json}
-t {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TEST/dataset_TEST/datasetDoc.json}
-a {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/SCORE/dataset_SCORE/datasetDoc.json}
-p {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573.json}
-O {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573_run.yaml}
Building a docker image with CPU support is accomplished by invoking the docker_build.sh script:
sudo ./docker_build.sh
Run command prompt as administrator.
./docker_build.sh
Building a docker image with GPU support is accomplished by adding the -g flag to the docker_build.sh call:
sudo ./docker_build.sh -g
Run command prompt as administrator.
./docker_build.sh -g
In the event that building the docker image fails and all of the above criteria has been met. One can invoke the docker_build.sh script again this time adding the -f flag. The -f flag forces the download and reinstall of all dependencies regardless of if they meet criteria. Note: if one is building for GPU support - remember the additional -g flag.
sudo ./docker_build.sh -f
Run command prompt as administrator.
./docker_build.sh -f