awslabs · rahul003 · Nov 19, 2019 · Nov 19, 2019 · Nov 20, 2019
diff --git a/documentation/glossary.md b/documentation/glossary.md
@@ -0,0 +1,24 @@
+# Concepts & Glossary
+
+**Hook**: The main interface to use training. This object can be passed as a model hook/callback
+in Tensorflow and Keras. It keeps track of collections and writes output files at each step.
+- `hook = smd.Hook(out_dir="/tmp/mnist_job")`
+
+**Mode**: One of "train", "eval", "predict", or "global". Helpful for segmenting data based on the phase
+you're in. Defaults to "global".
+- `train_mode = smd.modes.TRAIN`
+
+**Collection**: A group of tensors. Each collection contains its own save configuration and regexes for
+tensors to include/exclude.
+- `collection = hook.get_collection("losses")`
+
+**SaveConfig**: A Python dict specifying how often to save losses and tensors.
+- `save_config = SaveConfig(save_interval=10)`
+
+**ReductionConfig**: Allows you to save a reduction, such as 'mean' or 'l1 norm', instead of the full tensor.
+- `reduction_config = ReductionConfig(reductions=['min', 'max', 'mean'], norms=['l1'])`
+
+**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors.
+- `trial = smd.create_trial(out_dir="/tmp/mnist_job")`
+
+**Rule**: A condition that will trigger an exception and terminate the training job early, for example a vanishing gradient.
diff --git a/documentation/pytorch.md b/documentation/pytorch.md
@@ -0,0 +1,65 @@
+# PyTorch
+
+Supported PyTorch versions: 1.2+.
+
+## Module Loss Example
+```
+import smdebug.pytorch as smd
+hook = smd.Hook(out_dir=args.out_dir)
+
+class Model(nn.Module)
+    def __init__(self):
+        super().__init__()
+        self.fc = nn.Linear(784, 10)
+
+    def forward(self, x):
+        return F.relu(self.fc(x))
+
+net = Model()
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(net.parameters(), lr=args.lr)
+
+# Register the hook and the loss
+hook.register_hook(net)
+hook.register_loss(criterion)
+
+# Training loop as usual
+for (inputs, labels) in trainloader:
+    optimizer.zero_grad()
+    outputs = net(inputs)
+    loss = criterion(outputs, labels)
+    loss.backward()
+    optimizer.step()
+```
+
+## Functional Loss Example
+```
+import smdebug.pytorch as smd
+hook = smd.Hook(out_dir=args.out_dir)
+
+class Model(nn.Module)
+    def __init__(self):
+        super().__init__()
+        self.fc = nn.Linear(784, 10)
+
+    def forward(self, x):
+        return F.relu(self.fc(x))
+
+net = Model()
+optimizer = optim.Adam(net.parameters(), lr=args.lr)
+
+# Register the hook
+hook.register_hook(net)
+
+# Training loop, recording the loss at each iteration
+for (inputs, labels) in trainloader:
+    optimizer.zero_grad()
+    outputs = net(inputs)
+    loss = F.cross_entropy(outputs, labels)
+
+    # Manually record the loss
+    hook.record_tensor_value(tensor_name="loss", tensor_value=loss)
+
+    loss.backward()
+    optimizer.step()
+```
diff --git a/documentation/summary.md b/documentation/summary.md
@@ -0,0 +1,67 @@
+# Sagemaker Debugger
+
+- [Overview](#overview)
+- [Install](#install)
+- [Example Usage](#example-usage)
+- [Concepts](#concepts)
+
+## Overview
+Sagemaker Debugger is an AWS service to automatically debug your machine learning training process.
+It helps you develop better, faster, cheaper models by catching common errors quickly.
+
+## Install
+```
+pip install smdebug
+```
+
+Requires Python 3.6+.
+
+## Example Usage
+This example uses Keras. Say your training code looks like this:
+```
+model = tf.keras.models.Sequential([ ... ])
+model.compile(
+    optimizer='adam',
+    loss='sparse_categorical_crossentropy',
+)
+model.fit(x_train, y_train, epochs=args.epochs)
+model.evaluate(x_test, y_test)
+```
+
+To use Sagemaker Debugger, simply add a callback hook:
+```
+import smdebug.tensorflow as smd
+hook = smd.KerasHook(out_dir=args.out_dir)
+
+model = tf.keras.models.Sequential([ ... ])
+model.compile(
+    optimizer='adam',
+    loss='sparse_categorical_crossentropy',
+)
+model.fit(x_train, y_train, epochs=args.epochs, callbacks=[hook])
+model.evaluate(x_test, y_test, callbacks=[hook])
+```
+
+To analyze the result of the training run, create a trial and inspect the tensors.
+```
+trial = smd.create_trial(out_dir=args.out_dir)
+print(f"Saved tensor values for {trial.tensors()}")
+print(f"Loss values were {trial.get_collection("losses").values()}")
+```
+
+
+## Concepts
+The steps to use Tornasole in any framework are:
+
+1. Create a `hook`.
+2. Register your model and optimizer with the hook.
+3. Specify the `rule` to be used.
+4. After training, create a `trial` to manually analyze the tensors.
+
+See the [glossary](https://link.com) to understand these terms better.
+
+Framework-specific details are here:
+- [Tensorflow](https://link.com)
+- [PyTorch](https://link.com)
+- [MXNet](https://link.com)
+- [XGBoost](https://link.com)
diff --git a/documentation/tensorflow.md b/documentation/tensorflow.md
@@ -0,0 +1,60 @@
+# Tensorflow
+
+Supported Tensorflow versions: 1.13, 1.14, and 1.15
+
+There are a few different hooks, based on your use case.
+```
+import smdebug.tensorflow as smd
+
+tf.keras -> smd.KerasHook()
+tf.train.MonitoredSession -> smd.SessionHook()
+tf.estimator.Estimator -> smd.EstimatorHook()
+```
+
+## Keras Example
+```
+import smdebug.tensorflow as smd
+hook = smd.KerasHook(out_dir=args.out_dir)
+
+model = tf.keras.models.Sequential([ ... ])
+model.compile(
+    optimizer='adam',
+    loss='sparse_categorical_crossentropy',
+)
+# Add the hook as a callback
+model.fit(x_train, y_train, epochs=args.epochs, callbacks=[hook])
+model.evaluate(x_test, y_test, callbacks=[hook])
+```
+
+## MonitoredSession Example
+```
+import smdebug.tensorflow as smd
+hook = smd.SessionHook(out_dir=args.out_dir)
+
+loss = tf.reduce_mean(tf.matmul(...), name="loss")
+optimizer = tf.train.AdamOptimizer(args.lr)
+
+# Wrap the optimizer
+optimizer = hook.wrap_optimizer(optimizer)
+
+# Add the hook as a callback
+sess = tf.train.MonitoredSession(hooks=[hook])
+
+sess.run([loss, ...])
+```
+
+## Estimator Example
+```
+import smdebug.tensorflow as smd
+hook = smd.EstimatorHook(out_dir=args.out_dir)
+
+train_input_fn, eval_input_fn = ...
+estimator = tf.estimator.Estimator(...)
+
+# Set the mode and pass the hook as callback
+hook.set_mode(mode=smd.modes.TRAIN)
+estimator.train(input_fn=train_input_fn, steps=args.steps, hooks=[hook])
+
+hook.set_mode(mode=smd.modes.EVAL)
+estimator.evaluate(input_fn=eval_input_fn, steps=args.steps, hooks=[hook])
+```