Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hparams: write hparams_config summary directly #2139

Merged
merged 8 commits into from
Apr 30, 2019

Conversation

wchargin
Copy link
Contributor

@wchargin wchargin commented Apr 22, 2019

Summary:
The Experiment object bundled HParams and Metrics with some
metadata that’s not actually used in the current UI. We don’t think that
it pulls its conceptual weight, so this commit replaces it with a direct
summary-writing operation.

This function will soon be extracted into a summary_pb2 module, as
part of a larger plan to refactor the api module. Making this
change first minimizes churn in the demo code.

Cf. #1998.

Test Plan:
Unit tests modified appropriately, and the demo still works.

wchargin-branch: hparams-experiment-writing

Summary:
This change introduces `HParam`, `Metric`, and `Experiment` classes,
which represent their proto counterparts in a more Python-friendly way.
It similarly includes a `Domain` class hierarchy, which does not
correspond to a specific proto message, but rather unifies the domain
variants defined on the `HParamInfo` proto.

The design is roughly as in the original sketch of #1998.

The primary benefit of this change is that having first-class domains
enables clients to reuse the domain information for both the experiment
summary and the underlying tuning algorithm. We don’t provide a method
to do this out of the box, because we don’t actually provide any tuners
at this time, but it’s easy to write (e.g.) a `sample_uniform` function
like the one included in this commit. Then, sampling is as easy as

```python
    hparams = {h: sample_uniform(h.domain, rng) for h in HPARAMS}
```

It is also now more convenient to reference hparam values such that
static analysis can detect potential typos, because the `HParam` objects
themselves can be declared as constants and used as keys in a dict.
Writing `hparams["dropuot"]` fails at runtime, but `hparams[HP_DROPUOT]`
fails at lint time.

As a pleasant bonus, hparam definitions are now more compact, fitting
on one line instead of several. The demo code has net fewer lines.

Manual summary writer management is still required. A future change will
introduce a Keras callback to reduce this overhead.

Test Plan:
Some unit tests included, and the demo still works.

wchargin-branch: hparams-structured-api
Summary:
A new `hparams.api.KerasCallback` class simplifies client APIs by
writing session start and end summaries automatically, with a dict of
hparams provided by the client.

This only works in TensorFlow eager mode. The stock `TensorBoard` Keras
callback works in both eager and graph modes, but to do so it must use
TensorFlow-internal symbols (`eager_mode` and `executing_eagerly` on the
context object, which we do not have access to).

Test Plan:
Unit tests included. The demo still works, generating valid data.

wchargin-branch: hparams-keras-callback
Summary:
This abstracts over the last `tf.summary.experimental.write_raw_pb` call
in the demo. The new `hp.experiment(experiment)` function works in both
eager and graph modes.

Test Plan:
Unit tests included; tested on 1.x and 2.0 nightlies. The demo continues
to work properly on the latest 2.0 nightly.

wchargin-branch: hparams-experiment-writing
tf.summary.experimental.write_raw_pb(experiment_string, step=0)
base_writer.flush()
base_writer.close()
hp.experiment(experiment)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The awkwardness of hp.experiment(experiment), or the equivalent
but worse hp.experiment(hp.Experiment(…)), is why I’d originally
proposed having experiment.write(). I understand why you don’t want
that. If you have suggestions for improvement here, I’m all ears.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern that would be most consonant with what we do for other plugins would be for hparams to have a summary_v2.py containing a function like

def hparams(experiment):
  if not isinstance(experiment, Experiment):
      raise TypeError("type(experiment) is %r" % (type(experiment),))
  raw_pb = experiment.summary_pb().SerializeToString()
  summary_scope = (
      getattr(tf.summary.experimental, 'summary_scope', None) or
      tf.summary.summary_scope)
  with summary_scope('hparams_summary'):
    return tf.summary.experimental.write_raw_pb(raw_pb)

To be like the others, that would be exposed under tb.summary.v2. We could also expose it under hparams.api I suppose, though perhaps with the name summary() or write() or something instead.

Then the callsite would look like either tb.summary.hparams(experiment) or hp.summary(experiment) - something like that.

Down the road in the writer-method style of summary writing, this would become writer.hparams(experiment).

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to just call the endpoints .hparams, because hparams really
has three kinds of summaries, and of the three kinds of summaries, the
name .hparams sounds more like session_start_pb (logging your actual
hparam values) than experiment_pb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to just call the endpoints .hparams, because hparams really
has three kinds of summaries, and of the three kinds of summaries, the
name .hparams sounds more like session_start_pb (logging your actual
hparam values) than experiment_pb.

I guess we could provide summary_v2 with hparams, session_start,
and session_end (the last two for non-Keras users), but (a) these
would look more out of place as writer methods (“session start? what
session?”), and (b) that’s not great if we’re trying to get rid of the
confusing “session” terminology.

I’ll try to think about this a bit more and get back to it tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, but IMO the pretty strong reason to use hparams would be consistency with all the other plugins, which use their plugin name as (at least a prefix to) the summary op name when they have a dedicated summary op. And I assume we would consider writer.hparams_experiment() to be too unwieldy :)

In fairness, I think part of the difficulty here is that hparams sort of invented summary formats for concepts that are more properly broader than just the hparams plugin; "experiment" and "session" are constructs that go beyond just setting hparams. I don't expect us to resolve all of that in the short term, but what I'd maybe suggest is that we can figure out a subset of the full end-state API that we're pretty comfortable with now (and which still improves on the original API). We don't necessarily need to expose a full Experiment construct with a name and user when there is nothing in TensorBoard today to actually visualize that data.

For example, a subset of the API could be something like:

  • hparams_config(hparams, metrics) to log the hparam and metric definitions (optional and if not logged, TB will infer domains and use all scalar values as metrics)
  • hparams(hparams) to log the actual values

To me that seems like it already covers virtually all of the functionality that's actually usable today in the hparams dashboard; the main exception is the session end status value, which doesn't seem like a big deal to me (and there's still the "plumbing" API for now if folks who aren't using the keras callback really want that). Even the session group name doesn't add much value since session group aggregation isn't implemented on the frontend.

This would buy us a little more time to fine tune how we'd like to capture experiment and session information in the generalized summary context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can be convinced that Experiment isn’t worth its conceptual
weight, and that we should remove it from the public API in favor of
directly writing hparams_config(hparams, metrics).

Instead of complicating the module and dependency structure more by
introducing both api and summary_v2, I would like to rename api
to summary_v2, containing:

  • hparams(hparams) to log a session start summary;
  • hparams_config(hparams, metrics) to log an experiment summary;
  • HParam, Metric, and domain descriptors, as currently;
  • KerasCallback, as currently.

But KerasCallback has a hard (module-time) dependency on tf.keras,
so this would preclude summary_v2.hparams in notf builds, which is not
great. The two options that I see right now are:

  1. omit the callback from summary_v2 and instead put it in api
    (which would need to exist after all), and also let api forward
    the other summary_v2 symbols; or

  2. move the callback into a separate callback module.

Option (1) has the downside that I can’t rename api to __init__ as
I’d hoped, because you again wouldn’t be able to import summary_v2
without depending on tf.keras. Option (2) has the downside of being
ugly.

I guess option (1) is less bad. At least in that case, the vast majority
of users could just use api and not worry about summary_v2 unless
they need a notf build. I’ll implement it and see if I can come up with
anything better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. We’re comfortable with this. Explicitly:

  • hparams.api:
    • is the one-stop user-facing shop, to be imported as hp
    • re-exports * from hparams.summary_v2
    • re-exports * from hparams.callback
    • transitively depends on tf.keras
  • hparams.summary_v2:
    • contains hparams(concrete_hparam_values)
    • contains hparams_config(hparams, metrics)
    • contains HParam, Metric, and domain descriptors
    • depends only on api_pb2, summary, etc.
    • is the only public API for notf users
  • hparams.callback:
    • contains KerasCallback
    • depends on summary_v2
    • depends on tf.keras
  • nothing is added to tb.summary.v2.*
  • nothing is currently added to tf.summary; in the future, either
    tf.summary or a unified TensorBoard writer may depend on
    summary_v2 but not callback to avoid a tf.keras dependency

with tf.compat.v1.Session() as sess:
with tf.compat.v2.summary.create_file_writer(logdir).as_default() as w:
with tf.compat.v2.summary.record_if(True):
tf.contrib.summary.initialize()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I couldn’t find a way to get the V1 test case to work without
calling into tf.contrib. If there is one, I’d love to hear it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sess.run(w.init()) should do the trick, if the problem was the writer not being initialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that does it. Thanks!

@wchargin wchargin requested a review from nfelt April 22, 2019 22:11
@wchargin wchargin marked this pull request as ready for review April 22, 2019 22:11
wchargin added a commit that referenced this pull request Apr 22, 2019
Summary:
Depends on #2126, #2130, and #2139. Cf. #1998.

Test Plan:
None. These code snippets aren’t meant to be literally executable out of
context; that’s what the demo is for.

wchargin-branch: hparams-api-docs
tf.summary.experimental.write_raw_pb(experiment_string, step=0)
base_writer.flush()
base_writer.close()
hp.experiment(experiment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern that would be most consonant with what we do for other plugins would be for hparams to have a summary_v2.py containing a function like

def hparams(experiment):
  if not isinstance(experiment, Experiment):
      raise TypeError("type(experiment) is %r" % (type(experiment),))
  raw_pb = experiment.summary_pb().SerializeToString()
  summary_scope = (
      getattr(tf.summary.experimental, 'summary_scope', None) or
      tf.summary.summary_scope)
  with summary_scope('hparams_summary'):
    return tf.summary.experimental.write_raw_pb(raw_pb)

To be like the others, that would be exposed under tb.summary.v2. We could also expose it under hparams.api I suppose, though perhaps with the name summary() or write() or something instead.

Then the callsite would look like either tb.summary.hparams(experiment) or hp.summary(experiment) - something like that.

Down the road in the writer-method style of summary writing, this would become writer.hparams(experiment).

What do you think?

base_writer.flush()
base_writer.close()
hp.experiment(experiment)
base_writer.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird IMO to .close() a context-manager-returned object within the context. I might have expected this to fail actually since the context is supposed to .flush() the writer at the context exit, but I guess .flush() is a no-op on a writer that's already closed.

I think you can just leave out the .close(). The important part in terms of functionality is .flush() which is already implicit in the context exit (or could be made explicit). There isn't a real need to explicitly close the writer here since it will be closed when it goes out of scope anyway, and I believe that should actually happen at context exit anyway if you drop the as clause.

Ditto in the tests that also call .close() like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; I wasn’t sure what was idiomatic. Thanks.

else:
self.fail("No summary data found")

@test_util.run_v2_only("Requires eager summary writing semantics.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on #2130 re: how you could run this in v1 also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

with tf.compat.v1.Session() as sess:
with tf.compat.v2.summary.create_file_writer(logdir).as_default() as w:
with tf.compat.v2.summary.record_if(True):
tf.contrib.summary.initialize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sess.run(w.init()) should do the trick, if the problem was the writer not being initialized?

logdir = os.path.join(self.get_temp_dir(), "logs")
with tf.compat.v1.Session() as sess:
with tf.compat.v2.summary.create_file_writer(logdir).as_default() as w:
with tf.compat.v2.summary.record_if(True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line shouldn't be necessary with the V2 summary API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. I guess always_record_summaries was required in the
V1 contrib APIs, and so I assumed that record_if was necessary here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a feature of the TF 2.0 APIs that you no longer have to do that :)

if event.WhichOneof("what") != "summary":
continue
self.assertEqual(event.summary, summary_pb)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but you can simplify this to just return here and then skip the else clause and self.fail() unconditionally. It's one line shorter and avoids introducing a slightly more novel language feature when there's a more standard alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No longer needed.)

tensorboard/plugins/hparams/api_test.py Show resolved Hide resolved
@wchargin wchargin changed the base branch from wchargin-hparams-keras-callback to master April 26, 2019 17:44
wchargin-branch: hparams-experiment-writing

# Conflicts:
#	tensorboard/plugins/hparams/BUILD
#	tensorboard/plugins/hparams/api.py
#	tensorboard/plugins/hparams/api_test.py
#	tensorboard/plugins/hparams/hparams_demo.py
wchargin-branch: hparams-experiment-writing
wchargin-branch: hparams-experiment-writing
@wchargin wchargin changed the title hparams: add V2-style top-level experiment writing hparams: write hparams_config summary directly Apr 26, 2019
wchargin added a commit that referenced this pull request Apr 27, 2019
Summary:
Fixes an unintentional omission from #2130: `api.py` (not just its test)
actually does currently depend on TensorFlow.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 27, 2019
Summary:
We’ll shortly reinstate `api` as a wrapper over `summary_v2` (in a
separate commit to help Git track the moves).

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 27, 2019
Summary:
We’ll shortly extract the Keras callback out of `summary_v2` such that
`summary_v2` does not need to depend on TensorFlow. It will continue to
be exposed from `api`.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 27, 2019
Summary:
The `summary_v2` module no longer has any fundamental dependency on
TensorFlow. Its implementation still depends on `summary`, which does
directly reference the TensorFlow proto copies.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
@wchargin
Copy link
Contributor Author

Ready for re-review when you get a chance.

@wchargin wchargin requested a review from nfelt April 29, 2019 21:14
class Experiment(object):
"""A top-level experiment description.
def hparams_config(hparams, metrics, time_created_secs=None):
"""Write a top-level experiment description.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's maybe worth tweaking this language a little now to reflect the new naming?

wchargin-branch: hparams-experiment-writing
wchargin-branch: hparams-experiment-writing
@wchargin wchargin merged commit b124e66 into master Apr 30, 2019
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
Fixes an unintentional omission from #2130: `api.py` (not just its test)
actually does currently depend on TensorFlow.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
We’ll shortly reinstate `api` as a wrapper over `summary_v2` (in a
separate commit to help Git track the moves).

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
We’ll shortly extract the Keras callback out of `summary_v2` such that
`summary_v2` does not need to depend on TensorFlow. It will continue to
be exposed from `api`.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
The `summary_v2` module no longer has any fundamental dependency on
TensorFlow. Its implementation still depends on `summary`, which does
directly reference the TensorFlow proto copies.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
Fixes an unintentional omission from #2130: `api.py` (not just its test)
actually does currently depend on TensorFlow.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
We’ll shortly reinstate `api` as a wrapper over `summary_v2` (in a
separate commit to help Git track the moves).

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
We’ll shortly extract the Keras callback out of `summary_v2` such that
`summary_v2` does not need to depend on TensorFlow. It will continue to
be exposed from `api`.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request Apr 30, 2019
Summary:
The `summary_v2` module no longer has any fundamental dependency on
TensorFlow. Its implementation still depends on `summary`, which does
directly reference the TensorFlow proto copies.

Part of a reorganization to match the structure described here:
<#2139 (comment)>

wchargin-branch: hparams-api-refactor
wchargin added a commit that referenced this pull request May 7, 2019
Summary:
This summarizes APIs introduced in #2126, #2130, #2139, and #2188. See
also #1998.

Test Plan:
None. These code snippets aren’t meant to be literally executable out of
context; that’s what the demo is for.

wchargin-branch: hparams-api-docs
@wchargin wchargin deleted the wchargin-hparams-experiment-writing branch May 9, 2019 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants