TFSnippet is a set of utilities for writing and testing TensorFlow models.
The design philosophy of TFSnippet is non-interfering. It aims to provide a set of useful utilities, possible to be used along with any other TensorFlow libraries and frameworks.
TensorFlow >= 1.5
pip install git+https://github.com/thu-ml/zhusuan.git
pip install git+https://github.com/haowen-xu/tfsnippet.git
If you use tfsnippet.distributions
to obtain random samples, you
shall get enhanced tensor objects, from which you may compute the
log-likelihood by simply calling log_prob()
.
from tfsnippet.distributions import Normal
normal = Normal(0., 1.)
# The type of `samples` is :class:`tfsnippet.stochastic.StochasticTensor`.
samples = normal.sample(n_samples=100)
# You may obtain the log-likelhood of `samples` under `normal` by:
log_prob = samples.log_prob()
# You may also obtain the distribution instance back from the samples,
# such that you may fire-and-forget the distribution instance!
distribution = samples.distribution
The distributions from ZhuSuan can
be casted into a tfsnippet.distributions.Distribution
, in case we
haven't provided a wrapper for a certain ZhuSuan distribution:
from tfsnippet.distributions import as_distribution
uniform = as_distribution(zhusuan.distributions.Uniform())
# The type of `samples` is :class:`tfsnippet.stochastic.StochasticTensor`.
samples = uniform.sample(n_samples=100)
It is a common practice to iterate through a dataset by mini-batches.
The tfsnippet.dataflow
provides a unified interface for assembling
the mini-batch iterators.
from tfsnippet.dataflow import DataFlow
# Obtain a shuffled, two-array data flow, with batch-size 64.
# Any batch with samples fewer than 64 would be discarded.
flow = DataFlow.arrays(
[x, y], batch_size=64, shuffle=True, skip_incomplete=True)
for batch_x, batch_y in flow:
... # Do something with batch_x and batch_y
# You may use a threaded data flow to prefetch the mini-batches
# in a background thread. The threaded flow is a context object,
# where exiting the context would destroy the background thread.
with flow.threaded(prefetch=5) as threaded_flow:
for batch_x, batch_y in threaded_flow:
... # Do something with batch_x and batch_y
# If you use `MLToolkit <https://github.com/haowen-xu/mltoolkit>`_,
# you can even load data from a MongoDB via data flow. Suppose you
# have stored all images from ImageNet into a GridFS (of MongoDB),
# along with the labels stored as ``metadata.y``.
# You may iterate through the ImageNet in batches by:
from mltoolkit.datafs import MongoFS
fs = MongoFS('mongodb://localhost', 'imagenet', 'train')
with fs.as_flow(batch_size=64, with_names=False, meta_keys=['y'],
shuffle=True, skip_incomplete=True) as flow:
for batch_x, batch_y in flow:
... # Do something with batch_x and batch_y. batch_x is the
# raw content of images you stored into the GridFS.
After you've build the model and obtained the training operation, you may
quickly run a training-loop by using utilities from tfsnippet.scaffold
and tfsnippet.trainer
.
from tfsnippet.dataflow import DataFlow
from tfsnippet.scaffold import TrainLoop
from tfsnippet.trainer import Trainer, Evaluator, AnnealingDynamicValue
input_x = ... # the input x placeholder
input_y = ... # the input y placeholder
loss = ... # the training loss
params = tf.trainable_variables() # the trainable parameters
# We shall adopt learning-rate annealing, the initial learning rate is
# 0.001, and we would anneal it by a factor of 0.99995 after every step.
learning_rate = tf.placeholder(shape=(), dtype=tf.float32)
learning_rate_var = AnnealingDynamicValue(0.001, 0.99995)
# Build the training operation by AdamOptimizer
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.minimize(loss, var_list=params)
# Build the training data-flow
train_flow = DataFlow.arrays(
[train_x, train_y], batch_size=64, shuffle=True, skip_incomplete=True)
# Build the validation data-flow
valid_flow = DataFlow.arrays([valid_x, valid_y], batch_size=256)
with TrainLoop(params, max_epoch=max_epoch, early_stopping=True) as loop:
trainer = Trainer(loop, train_op, [input_x, input_y], train_flow,
metrics={'loss': loss})
# Anneal the learning-rate after every step by 0.99995.
trainer.anneal_after_steps(learning_rate_var, freq=1)
# Do validation and apply early-stopping after every epoch.
trainer.evaluate_after_epochs(
Evaluator(loop, loss, [input_x, input_y], valid_flow),
freq=1
)
# You may log the learning-rate after every epoch by adding a callback
# hook. Surely you may also add any other callbacks.
trainer.after_epochs.add_hook(
lambda: trainer.loop.collect_metrics(lr=learning_rate_var),
freq=1
)
# Print training metrics after every epoch.
trainer.log_after_epochs(freq=1)
# Run all the training epochs and steps.
trainer.run(feed_dict={learning_rate: learning_rate_var})
The tfsnippet.applications
package provides useful utilities to load
and use pre-trained models (among which most are third-party models).
from tfsnippet.applications import InceptionV3
# Model from https://www.tensorflow.org/tutorials/image_recognition
inception_v3 = InceptionV3()
image_data = imageio.imread('path-to-image.jpg')
class_proba = inception_v3.predict_proba([image_data])[0]
The tfsnippet.nn
package provides numerical stable implementations for
lots of advanced neural network math operations. Also, it supports both
NumPy and TensorFlow. You may obtain the math operation for a particular
backend by passing tfsnippet.nn.npyos
(for NumPy) or tfsnippet.nn.tfops
(for TensorFlow) as the first argument ops
of every math operation function.
from tfsnippet.nn import npyops, tfops
from tfsnippet.nn import log_sum_exp, log_softmax
# Compute :math:`\log \sum_{k=1}^K \exp(x_k)` by TensorFlow
log_sum_exp(tfops, x, axis=-1)
# Compute :math:`\log \frac{\exp(x_k)}{\sum_i \exp(x_i)}` by NumPy
log_softmax(npyops, logits)