Code for Learned Optimizers that Scale and Generalize.
- Bazel (install)
- TensorFlow >= v1.3
- Python 2.7.x
In the top-level directory, metaopt.py
contains the code to train and test a learned optimizer. metarun.py
packages the actual training procedure into a
single file, defining and exposing many flags to tune the procedure, from selecting the optimizer type and problem set to more fine-grained hyperparameter settings.
There is no testing binary; testing can be done ad-hoc via metaopt.test_optimizer
by passing an optimizer object and a directory with a checkpoint.
The optimizer
directory contains a base trainable_optimizer.py
class and a number of extensions, including the hierarchical_rnn
optimizer used in
the paper, a coordinatewise_rnn
optimizer that more closely matches previous work, and a number of simpler optimizers to demonstrate the basic mechanics of
a learnable optimizer.
The problems
directory contains the code to build the problems that were used in the meta-training set.
metarun.py
: meta-training of a learned optimizer
The flags most relevant to meta-training are defined in metarun.py
. The default values will meta-train a HierarchicalRNN optimizer with the hyperparameter
settings used in the paper.
The trainable_optimizer
inherits from tf.train.Optimizer
, so a properly instantiated version can be used to train any model in any APIs that accept
this class. There are just 2 caveats:
-
If using the Hierarchical RNN optimizer, the apply_gradients return type must be changed (see comments inline for what exactly must be removed)
-
Care must be taken to restore the variables from the optimizer without overriding them. Optimizer variables should be loaded manually using a pretrained checkpoint and a
tf.train.Saver
with only the optimizer variables. Then, when constructing the session, ensure that any automatic variable initialization does not re-initialize the loaded optimizer variables.
- Olga Wichrowska (@olganw), Niru Maheswaranathan (@nirum)