- AutoML Server automated machine learning server component that implements the D3M API.
- Primitives set of primitives created for use by Distil as steps in a D3M pipeline and included in the base D3M image.
- Primitives Addendum set of primitives created for use by Distil as steps in a D3M pipeline and not included in the base D3M image.
- Git and Git LFS Versioning softwares.
- Go programming language binaries with the
GOPATH
environment variable specified and$GOPATH/bin
in yourPATH
. - NodeJS JavaScript runtime.
- Docker platform.
- Docker Compose (optional) for managing multi-container dev environments.
- GDAL v2.4.2 or better for geospatial data access. Available as a package for most Linux distributions, and OSX through Homebrew.
mkdir -p $GOPATH/src/github.com/uncharted-distil
cd $GOPATH/src/github.com/uncharted-distil
git clone git@github.com:unchartedsoftware/distil.git
cd distil
make install
Datasets are stored using git LFS and can be pulled using the datasets.sh
script.
./datasets.sh
To add / remove a dataset modify the $datasets
variable:
declare -a datasets=("185_baseball" "LL0_acled" "22_handgeometry")
To regenerate the PANDAS dataframe parser if the api/compute/result/complex_field.peg
file is changed, run:
make peg
The application requires:
- ElasticSearch
- PostgreSQL
- TA2 Pipeline Server Stub
Docker images for each are available at the following registry:
docker.uncharted.software
sudo docker login docker.uncharted.software
---
distil-auto-ml:
image: docker.uncharted.software/distil-auto-ml
Pull docker images via Docker Compose:
./update_services.sh
Using three separate terminals:
Terminal 1 - Launch docker containers via Docker Compose:
./run_services.sh
yarn watch
The app will be accessible at localhost:8080
.
make watch
The location of the dataset directory can be changed by setting the D3MINPUTDIR
environment variable, and the location of the temporary data written out during model building can be set using the D3MOUTPUTDIR
environment variable.
The host IP address of the docker containers if not localhost can be set with DOCKER_HOST
. (i.e.export DOCKER_HOST=192.168.0.10 && make watch
.)
These are used by the other Distil services that are launched via the run_services.sh
script, and are typically set as global environment variables in .bashrc
or similar.
For the VsCode editor download and install the eslint extension. Once installed go to the editor settings (hot key ⌘⇧p -- type settings) Add the following to your settings file:
"eslint.lintTask.enable": true, // enable eslint to run
"eslint.validate": [
"vue", // tell eslint to read vue files
"html", // tell eslint to read html files
"javascript", // tell eslint to read javascript files
"typescript" // tell eslint to read typescript files
],
"eslint.workingDirectories": [{ "mode": "auto" }], // eslint will try its best to figure out the working directory of the project
At this point save your settings file and restart VsCode. If upon restarting and the linter is not working check the output (^⇧` -- OUTPUT tab -- dropdown -- ESlint)
"../repo/subpackage/file.go:10:2: cannot find package "github.com/company/package/subpackage" in any of":
- Cause: Dependencies are out of date or have not been installed
- Solution: Run
make install
to install latest dependencies.
"# pkg-config --cflags -- gdal gdal gdal gdal gdal gdal Package gdal was not found in the pkg-config search path."
- Cause: GDAL has not been installed
- Solution: Install GDAL using a package for your environment or download and build from source.
runtime error while training "joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker."
- Cause: Not enough Docker resources
- Solution: change Docker resources to recommended "CPU:10, RAM:10 gigs, Swap:2.5 gigs, Disk Image Size: 64 gigs"