Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions documentation/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Concepts & Glossary

**Hook**: The main interface to use training. This object can be passed as a model hook/callback
in Tensorflow and Keras. It keeps track of collections and writes output files at each step.
- `hook = smd.Hook(out_dir="/tmp/mnist_job")`

**Mode**: One of "train", "eval", "predict", or "global". Helpful for segmenting data based on the phase
you're in. Defaults to "global".
- `train_mode = smd.modes.TRAIN`

**Collection**: A group of tensors. Each collection contains its own save configuration and regexes for
tensors to include/exclude.
- `collection = hook.get_collection("losses")`

**SaveConfig**: A Python dict specifying how often to save losses and tensors.
- `save_config = SaveConfig(save_interval=10)`

**ReductionConfig**: Allows you to save a reduction, such as 'mean' or 'l1 norm', instead of the full tensor.
- `reduction_config = ReductionConfig(reductions=['min', 'max', 'mean'], norms=['l1'])`

**Trial**: The main interface to use when analyzing a completed training job. Access collections and tensors.
- `trial = smd.create_trial(out_dir="/tmp/mnist_job")`

**Rule**: A condition that will trigger an exception and terminate the training job early, for example a vanishing gradient.
65 changes: 65 additions & 0 deletions documentation/pytorch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# PyTorch

Supported PyTorch versions: 1.2+.

## Module Loss Example
```
import smdebug.pytorch as smd
hook = smd.Hook(out_dir=args.out_dir)

class Model(nn.Module)
def __init__(self):
super().__init__()
self.fc = nn.Linear(784, 10)

def forward(self, x):
return F.relu(self.fc(x))

net = Model()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=args.lr)

# Register the hook and the loss
hook.register_hook(net)
hook.register_loss(criterion)

# Training loop as usual
for (inputs, labels) in trainloader:
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
```

## Functional Loss Example
```
import smdebug.pytorch as smd
hook = smd.Hook(out_dir=args.out_dir)

class Model(nn.Module)
def __init__(self):
super().__init__()
self.fc = nn.Linear(784, 10)

def forward(self, x):
return F.relu(self.fc(x))

net = Model()
optimizer = optim.Adam(net.parameters(), lr=args.lr)

# Register the hook
hook.register_hook(net)

# Training loop, recording the loss at each iteration
for (inputs, labels) in trainloader:
optimizer.zero_grad()
outputs = net(inputs)
loss = F.cross_entropy(outputs, labels)

# Manually record the loss
hook.record_tensor_value(tensor_name="loss", tensor_value=loss)

loss.backward()
optimizer.step()
```
67 changes: 67 additions & 0 deletions documentation/summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Sagemaker Debugger

- [Overview](#overview)
- [Install](#install)
- [Example Usage](#example-usage)
- [Concepts](#concepts)

## Overview
Sagemaker Debugger is an AWS service to automatically debug your machine learning training process.
It helps you develop better, faster, cheaper models by catching common errors quickly.

## Install
```
pip install smdebug
```

Requires Python 3.6+.

## Example Usage
This example uses Keras. Say your training code looks like this:
```
model = tf.keras.models.Sequential([ ... ])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
)
model.fit(x_train, y_train, epochs=args.epochs)
model.evaluate(x_test, y_test)
```

To use Sagemaker Debugger, simply add a callback hook:
```
import smdebug.tensorflow as smd
hook = smd.KerasHook(out_dir=args.out_dir)

model = tf.keras.models.Sequential([ ... ])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
)
model.fit(x_train, y_train, epochs=args.epochs, callbacks=[hook])
model.evaluate(x_test, y_test, callbacks=[hook])
```

To analyze the result of the training run, create a trial and inspect the tensors.
```
trial = smd.create_trial(out_dir=args.out_dir)
print(f"Saved tensor values for {trial.tensors()}")
print(f"Loss values were {trial.get_collection("losses").values()}")
```


## Concepts
The steps to use Tornasole in any framework are:

1. Create a `hook`.
2. Register your model and optimizer with the hook.
3. Specify the `rule` to be used.
4. After training, create a `trial` to manually analyze the tensors.

See the [glossary](https://link.com) to understand these terms better.

Framework-specific details are here:
- [Tensorflow](https://link.com)
- [PyTorch](https://link.com)
- [MXNet](https://link.com)
- [XGBoost](https://link.com)
60 changes: 60 additions & 0 deletions documentation/tensorflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Tensorflow

Supported Tensorflow versions: 1.13, 1.14, and 1.15

There are a few different hooks, based on your use case.
```
import smdebug.tensorflow as smd

tf.keras -> smd.KerasHook()
tf.train.MonitoredSession -> smd.SessionHook()
tf.estimator.Estimator -> smd.EstimatorHook()
```

## Keras Example
```
import smdebug.tensorflow as smd
hook = smd.KerasHook(out_dir=args.out_dir)

model = tf.keras.models.Sequential([ ... ])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
)
# Add the hook as a callback
model.fit(x_train, y_train, epochs=args.epochs, callbacks=[hook])
model.evaluate(x_test, y_test, callbacks=[hook])
```

## MonitoredSession Example
```
import smdebug.tensorflow as smd
hook = smd.SessionHook(out_dir=args.out_dir)

loss = tf.reduce_mean(tf.matmul(...), name="loss")
optimizer = tf.train.AdamOptimizer(args.lr)

# Wrap the optimizer
optimizer = hook.wrap_optimizer(optimizer)

# Add the hook as a callback
sess = tf.train.MonitoredSession(hooks=[hook])

sess.run([loss, ...])
```

## Estimator Example
```
import smdebug.tensorflow as smd
hook = smd.EstimatorHook(out_dir=args.out_dir)

train_input_fn, eval_input_fn = ...
estimator = tf.estimator.Estimator(...)

# Set the mode and pass the hook as callback
hook.set_mode(mode=smd.modes.TRAIN)
estimator.train(input_fn=train_input_fn, steps=args.steps, hooks=[hook])

hook.set_mode(mode=smd.modes.EVAL)
estimator.evaluate(input_fn=eval_input_fn, steps=args.steps, hooks=[hook])
```