Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling of neural networks #26

Merged
merged 11 commits into from
Feb 9, 2019
Merged
26 changes: 23 additions & 3 deletions emloop_tensorflow/frozen_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

import emloop as el
import tensorflow as tf
from tensorflow.python.client import timeline

from .graph_tower import GraphTower
from .model import BaseModel
Expand All @@ -29,16 +30,18 @@ class FrozenModel(el.AbstractModel):
"""

def __init__(self,
inputs: List[str], outputs: List[str], restore_from: str,
session_config: Optional[dict]=None, n_gpus: int=0, **_):
log_dir: str, inputs: List[str], outputs: List[str], restore_from: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_dir argument should be rather optional as it was previously if I am not mistaken. Of course, we should sanitize arguments similarly to this

if profile and not log_dir:  # works for both None and empty string
    raise ValueError('log_dir has to be specified with profile set to True')

session_config: Optional[dict]=None, n_gpus: int=0, profile: bool=False, **_):
"""
Initialize new :py:class:`FrozenModel` instance.

:param log_dir: path to the logging directory (wherein models should be saved)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is inaccurate as FrozenModel cannot be saved. It is solely for the profile or is it not?

:param inputs: model input names
:param outputs: model output names
:param restore_from: restore model path (either a dir or a .pb file)
:param session_config: TF session configuration dict
:param n_gpus: number of GPUs to use (either 0 or 1)
:param profile: whether profile.json should be saved to log_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param profile: whether profile.json should be saved to log_dir
:param profile: if true, profile the speed of model inference and save profile.json to the specified log_dir

"""
super().__init__(None, '', restore_from)
assert 0 <= n_gpus <= 1, 'FrozenModel can be used only with n_gpus=0 or n_gpus=1'
Expand All @@ -50,6 +53,7 @@ def __init__(self,
self._graph = tf.Graph()
if session_config:
session_config = tf.ConfigProto(**session_config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentional?

self._session = tf.Session(graph=self._graph, config=session_config)

with self._graph.as_default():
Expand All @@ -60,6 +64,10 @@ def __init__(self,
except KeyError:
self._is_training = tf.placeholder(tf.bool, [], BaseModel.TRAINING_FLAG_NAME)

self._profile = profile
self._log_dir = log_dir


def run(self, batch: el.Batch, train: bool=False, stream: el.datasets.StreamWrapper=None) -> Mapping[str, object]:
"""
Run the model with the given ``batch``.
Expand All @@ -83,7 +91,19 @@ def run(self, batch: el.Batch, train: bool=False, stream: el.datasets.StreamWrap
for output_name in self.output_names:
fetches.append(self._tower[output_name])

outputs = self._session.run(fetches=fetches, feed_dict=feed_dict)
if self._profile:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()

outputs = self._session.run(fetches=fetches, feed_dict=feed_dict,
options=run_options, run_metadata=run_metadata)

with open(path.join(self._log_dir, "profile.json"), "w") as ofile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(path.join(self._log_dir, "profile.json"), "w") as ofile:
with open(path.join(self._log_dir, 'profile.json'), 'w') as ofile:

tl = timeline.Timeline(run_metadata.step_stats)
ofile.write(tl.generate_chrome_trace_format())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto statistics from a single call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a bit tricky. Both first (warm-up) and last (possibly smaller batch) profiles may be inaccurate. So what do we want to actually save? I guess keeping last keep_profiles: int = 5 is reasonable.


else:
outputs = self._session.run(fetches=fetches, feed_dict=feed_dict)

return dict(zip(self.output_names, outputs))

Expand Down
22 changes: 19 additions & 3 deletions emloop_tensorflow/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import numpy as np
import emloop as el
import tensorflow as tf
from tensorflow.python.client import timeline

from .third_party.tensorflow.freeze_graph import freeze_graph
from .third_party.tensorflow.average_gradients import average_gradients
Expand Down Expand Up @@ -44,7 +45,7 @@ def __init__(self, # pylint: disable=too-many-arguments
dataset: Optional[el.AbstractDataset], log_dir: Optional[str], inputs: List[str], outputs: List[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the log_dir indeed optional with default None.

Copy link
Contributor Author

@bedapisl bedapisl Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in some way related to this pull request?

session_config: Optional[dict]=None, n_gpus: int=0, restore_from: Optional[str]=None,
optimizer=None, freeze=False, loss_name: str=DEFAULT_LOSS_NAME, monitor: Optional[str]=None,
restore_fallback: Optional[str]=None, clip_gradient: Optional[float]=None,
restore_fallback: Optional[str]=None, clip_gradient: Optional[float]=None, profile: bool=False,
**kwargs):
"""
Create new emloop trainable TensorFlow model.
Expand Down Expand Up @@ -82,6 +83,7 @@ def __init__(self, # pylint: disable=too-many-arguments
:param monitor: monitor signal mean and variance of the tensors which names contain the specified value
:param restore_fallback: ignored arg. (allows training from configs saved by emloop where it is added)
:param clip_gradient: limit the absolute value of the gradient; set to None for no clipping
:param profile: whether profile.json should be saved to log_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto description

:param kwargs: additional kwargs forwarded to :py:meth:`_create_model`
"""
super().__init__(dataset=dataset, log_dir=log_dir, restore_from=restore_from)
Expand All @@ -91,6 +93,7 @@ def __init__(self, # pylint: disable=too-many-arguments
self._log_dir = log_dir
self._freeze_graph = freeze
self._clip_gradient = clip_gradient
self._profile = profile
self._loss_name = loss_name
self._train_ops = []
self._graph = self._saver = None
Expand Down Expand Up @@ -223,12 +226,25 @@ def run(self, batch: el.Batch, train: bool=False, stream: el.datasets.StreamWrap
for output_name in self.output_names:
fetches.append(tower[output_name])

run_options = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more or less the same code as in FrozenModel. Can you perhaps wrap session.run to some util function which would be utilized in both classes?

run_metadata = None
if self._profile:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()

# run the computational graph for one batch and allow buffering in the meanwhile
if stream is not None:
with stream.allow_buffering:
outputs = self._session.run(fetches=fetches, feed_dict=feed_dict)
outputs = self._session.run(fetches=fetches, feed_dict=feed_dict,
options=run_options, run_metadata=run_metadata)
else:
outputs = self._session.run(fetches=fetches, feed_dict=feed_dict)
outputs = self._session.run(fetches=fetches, feed_dict=feed_dict,
options=run_options, run_metadata=run_metadata)

if self._profile:
with open(path.join(self._log_dir, "profile.json"), "w") as ofile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with open(path.join(self._log_dir, "profile.json"), "w") as ofile:
with open(path.join(self._log_dir, 'profile.json'), 'w') as ofile:

tl = timeline.Timeline(run_metadata.step_stats)
ofile.write(tl.generate_chrome_trace_format())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to overwrite the profile file on every call to run with only the statistics from this single call. Shouldn't it rather create the run_metadata variable in the constructor and consider the statistics from all the calls to run?


if train:
outputs = outputs[1:]
Expand Down
14 changes: 7 additions & 7 deletions emloop_tensorflow/tests/frozen_model_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,24 @@
def test_frozen_model_restore(tmpdir):
"""Test frozen model restoration."""
with pytest.raises(ValueError):
FrozenModel(inputs=[], outputs=[], restore_from=tmpdir) # there is no .pb file yet
FrozenModel(log_dir="/dev/null", inputs=[], outputs=[], restore_from=tmpdir) # there is no .pb file yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrozenModel(log_dir="/dev/null", inputs=[], outputs=[], restore_from=tmpdir) # there is no .pb file yet
FrozenModel(log_dir='/dev/null', inputs=[], outputs=[], restore_from=tmpdir) # there is no .pb file yet


dummy_model = TrainableModel(dataset=None, log_dir=tmpdir, **_IO, freeze=True, optimizer=_OPTIMIZER)
dummy_model.save('')

# restore from directory
FrozenModel(**_IO, restore_from=tmpdir)
FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir)
FrozenModel(log_dir='/dev/null', **_IO, restore_from=tmpdir)


# restore from file
FrozenModel(**_IO, restore_from=path.join(tmpdir, 'model.pb'))
FrozenModel(log_dir="/dev/null", **_IO, restore_from=path.join(tmpdir, 'model.pb'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrozenModel(log_dir="/dev/null", **_IO, restore_from=path.join(tmpdir, 'model.pb'))
FrozenModel(log_dir='/dev/null', **_IO, restore_from=path.join(tmpdir, 'model.pb'))


# wrong configurations
dummy_model.save('another')
with pytest.raises(ValueError):
FrozenModel(**_IO, restore_from=tmpdir) # multiple .pb files
FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir) # multiple .pb files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir) # multiple .pb files
FrozenModel(log_dir='/dev/null', **_IO, restore_from=tmpdir) # multiple .pb files


with pytest.raises(ValueError):
FrozenModel(**_IO, restore_from='/something/that/does/not/exist')
FrozenModel(log_dir="/dev/null", **_IO, restore_from='/something/that/does/not/exist')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrozenModel(log_dir="/dev/null", **_IO, restore_from='/something/that/does/not/exist')
FrozenModel(log_dir='/dev/null', **_IO, restore_from='/something/that/does/not/exist')



def test_frozen_model_misc(tmpdir):
Expand All @@ -44,7 +44,7 @@ def test_frozen_model_misc(tmpdir):
dummy_model.save('')

# restore from directory
frozen_model = FrozenModel(**_IO, restore_from=tmpdir, session_config={'allow_soft_placement': True})
frozen_model = FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir, session_config={'allow_soft_placement': True})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
frozen_model = FrozenModel(log_dir="/dev/null", **_IO, restore_from=tmpdir, session_config={'allow_soft_placement': True})
frozen_model = FrozenModel(log_dir='/dev/null', **_IO, restore_from=tmpdir, session_config={'allow_soft_placement': True})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long.


assert frozen_model.restore_fallback == 'emloop_tensorflow.FrozenModel'
assert frozen_model.input_names == _IO['inputs']
Expand All @@ -63,7 +63,7 @@ def test_frozen_model_run(tmpdir):
mainloop.run_training(None)
model.save('')

frozen_model = FrozenModel(inputs=['input'], outputs=['output'], restore_from=tmpdir)
frozen_model = FrozenModel(log_dir="/dev/null", inputs=['input'], outputs=['output'], restore_from=tmpdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
frozen_model = FrozenModel(log_dir="/dev/null", inputs=['input'], outputs=['output'], restore_from=tmpdir)
frozen_model = FrozenModel(log_dir='/dev/null', inputs=['input'], outputs=['output'], restore_from=tmpdir)


with pytest.raises(AssertionError):
frozen_model.run({}, True, None)
Expand Down