-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Gremlin, an adversarial evolutionary algorithm that discovers biases or weaknesses in machine learners
Gremlin learns where a given machine learner (ML) model performs poorly via an adversarial evolutionary algorithm (EA). The EA will find the worst performing feature sets such that a practitioner can then, say, tune the training data to include more examples of those feature sets. Then the ML model can be trained again with the updated training set in the hopes that the additional examples will be sufficient for the ML to train models that perform better for those sets.
Gremlin is a 2022 R&D 100 Award Winner!
- Python 3.([789])|(10)
- LEAP https://github.com/AureumChaos/LEAP, version 0.8+
- Activate your conda or virtual environment
- cd into top-level gremlin directory
pip install .
Gremlin is essentially a thin convenience wrapper around [LEAP]
(https://github.com/AureumChaos/LEAP). Instead of writing a script in LEAP,
one would instead point the gremlin
executable at a YAML file that describes
what LEAP classes, subclasses, and functions to use, as well as other salient
run-time characteristics. gremlin
will parse the YAML file and generate a
CSV file containing the individuals from the run. This CSV file should
contain information that can be exploited to tune training data.
More information on how to create a configuration file can be found here, and a detailed documentation of the configuration parameters, as well as examples, can be found here.
Example code and configuration for a real problem can be found in examples/MNIST
.
This problem involves Gremlin discovering that one of the digits for the MNIST
training data is poorly represented.
This can be run simply by (must be in examples/MNIST
directory):
$ gremlin config/common.yml config/bygen.yml
More detailed explanations for version changes can be found in CHANGELOG
.
-
v0.6
, 3/3/23-
Allow for using Dask Client subclasses, such as SSHCluster or SlurmCluster, which should make easier to deploy on clusters, supercomputers, and in the cloud.
-
Re-organized how Dask distributed configuration is handled in YAML files.
-
The
bygen
algorithm, which is a traditional by-generational evolutionary algorithm, now supports distributed evaluations via Dask. One can also refer to theparents
in pipeline operators; e.g., this is useful for truncation selection, which needs to take the best of offspring and parents. -
Broke out how YAML configuration files are handled into separate modules. See
examples/MNIST/run.sh
for examples.
-
-
v0.5
, 2/3/23- Main installed executable now
gremlin
and notgremlin.py
. Added optionalasync.with_client
config section. Improvements made tosetup.py
.
- Main installed executable now
-
v0.4
, 9/30/22- Added config variable
async.with_client
that allows for interacting with Dask before the EA runs; e.g.,client.wait_for_workers()
orclient.upload_file()
- Replaced
imports
withpreamble
in YAML config files thus giving more flexibility for importing dependencies, but also allows for defining functions and variables that may be referred to in, say, the pipeline.
- Added config variable
-
v0.3
, 3/9/22- Add support for config variable
algorithm
that denotes if using a traditional by-generation EA or an asynchronous steady-state EA
- Add support for config variable
-
v0.2dev
, 2/17/22- revamped config system and heavily refactored/simplified code
-
v0.1dev
, 10/14/21- initial raw release
-
gremlin/
-- maingremlin
code -
examples/
-- examples for using gremlin; currently only has MNIST example
The gremlin
github repository is [https://github.com/markcoletti/gremlin]
(https://github.com/markcoletti/gremlin). main
is the release branch and
active work occurs on the develop
branch.