uploader: add `ServerInfo` protos and logic #2878

wchargin · 2019-11-02T00:41:44Z

Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:

The frontend can tell the uploader which backend server to connect
to, rather than requiring a hard-coded endpoint in the uploader.
The frontend can tell the uploader how to generate experiment URLs,
rather than requiring the backend server to provide this information
(which it can’t, really, in general).
The frontend can check whether the uploader client is recent enough
and instruct the end user to update if it’s not.
The frontend can warn the user about transient issues in case the
service is down, degraded, under maintenance, etc.

An endpoint https://tensorboard.dev/api/uploader on the server will
provide this information.

Test Plan:
Unit tests suffice.

wchargin-branch: uploader-serverinfo-protos

Summary: This commit adds an RPC definition by which the uploader can connect to the frontend web server at the start of an upload session. This resolves a number of outstanding issues: - The frontend can tell the uploader which backend server to connect to, rather than requiring a hard-coded endpoint in the uploader. - The frontend can tell the uploader how to generate experiment URLs, rather than requiring the backend server to provide this information (which it can’t, really, in general). - The frontend can check whether the uploader client is recent enough and instruct the end user to update if it’s not. - The frontend can warn the user about transient issues in case the service is down, degraded, under maintenance, etc. An endpoint `https://tensorboard.dev/api/uploader` on the server will provide this information. Test Plan: Unit tests suffice. wchargin-branch: uploader-serverinfo-protos

wchargin · 2019-11-02T00:42:42Z

Googlers: see http://cl/278027294 for server implementation.

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

wchargin-branch: uploader-serverinfo-protos wchargin-source: f4f3624d7a37f1f2229970b0e80c28052f8b000a

nfelt · 2019-11-04T23:55:10Z

tensorboard/uploader/proto/server_info.proto

+  // soon be unsupported, or if the server is experiencing transient issues.
+  VERDICT_WARN = 2;
+  // The client should cease further communication with the server and abort
+  // operation.


Maybe mention that there should be an accompanying details in this case?

nfelt · 2019-11-04T23:58:37Z

tensorboard/uploader/proto/server_info.proto

+}
+
+message ApiServer {
+  // Address of a gRPC server. For example: "dns:///api.tensorboard.dev:443".


Did we establish if we need the dns:/// prefix? It would be good if we can point to a more explicit reference for this that makes it clear whether the scheme is included. The canonical references seems to be https://github.com/grpc/grpc/blob/master/doc/naming.md

Per that we could explicitly state that this is a gRPC server URI, which is probably a bit better than the more generic "Address" and endpoint?

Thanks for the reference! Looks like those docs say that dns is the
default scheme, which is consistent for all our observations except for
some Google-specific internal tools. That seems fine: changed to call
this a “gRPC server URI” and link to those docs, and changed the example
to omit the dns:/// prefix (like the actual server response will).

nfelt · 2019-11-04T23:59:42Z

tensorboard/uploader/proto/server_info.proto

+  string template = 1;
+  // Placeholder string that should be replaced with an actual experiment ID.
+  // (See docs for `template` field.)
+  string placeholder = 2;


Maybe id_placeholder or something just to hedge against a future in which we might have some other placeholder, like an alpha/beta/stable channel indicator or something fancy.

nfelt · 2019-11-05T00:01:50Z

tensorboard/uploader/proto/server_info.proto

+}
+
+message ExperimentUrlFormat {
+  // Template string for experiment URLs. All occurrences of the value of the


Do we want these to be absolute or relative?

Relative has the advantage that it allows a single frontend to be exposed across multiple domain names and clients would generate the appropriate URLs. Of course, we could I suppose extract this from the incoming /api/uploader request and do the substitution server-side; I could see an argument for either approach.

Another possibility would be adding a template placeholder for the base URL, which if included is basically the same as a relative URL, but could be omitted in the actual template to allow the frontend to return an absolute URL if it wanted to avoid ambiguity (e.g. if wanted to do that on prod).

Yeah, I was doing the expansion server side based on the requested host.
If we used relative URLs, then the client would need to pass around the
origin with the ServerInfoResponse everywhere—the latter would no
longer contain all the needed information. Not the end of the world, but
the server-side handling (http://cl/278027294) seemed simple to me.

SGTM, thanks for the explanation.

nfelt · 2019-11-05T00:05:35Z

tensorboard/uploader/server_info.py

+    )
+
+
+def create_server_info(frontend_origin, api_endpoint):


If this is only used in tests, I think it's preferable to keep it in test code or utilities.

It’s used outside of tests (as of #2879), when you directly specify an
endpoint. The idea is:

tensorboard dev --origin=https://tensorboard.dev --api_endpoint=
(normal prod usage; these are the default options)

tensorboard dev --origin=localhost:8080 (for development, running
both a local frontend and a local backend)

tensorboard dev --api_endpoint=localhost:10000 (for development,
running only a local backend)

Historically, it’s been possible to test the uploader by spinning up a
local gRPC server without having to also spin up a frontend. I wanted to
preserve that ability, because it’s convenient for testing flows that
don’t require a frontend (upload → delete, upload → export, etc.)

Ok, I didn't check follow-on changes for usage.

If the goal is to allow locally overriding the API endpoint though, it seems better to me to just have --api_endpoint override whatever's returned from the frontend's ServerInfo endpoint, rather using it to construct a new ServerInfo without hitting the frontend at all. As-is, it's semantically not really right to always use /experiment/ as the URL pattern here, since that should be determined by the frontend.

Or concretely, if I were running local storzy but against the hosted-tensorboard-dev Spanner, I might want to be able to run tensorboard dev --origin=https://hosted-tensorboard-dev.appspot.com --api_endpoint=localhost:10000 and I would argue that should use the experiment pattern and other metadata fetched from the origin still.

If it would seem weird if specifying a local --api_endpoint but not specifying --origin for it to default to pinging tensorboard.dev, we could change the default for --origin so that it only defaults to tensorboard.dev when --api_endpoint is not passed, and otherwise it defaults to empty and then only in that case do we locally construct the ServerInfo (and maybe in that case we should just use a dummy experiment URL pattern).

Agreed; I also wasn’t fully content with the flags formulation in #2879.
Your approach sounds fine to me: I’ll revise that PR accordingly.

nfelt · 2019-11-05T00:29:40Z

tensorboard/uploader/server_info_test.py

+
+
+class _Ipv6CompatibleWsgiServer(simple_server.WSGIServer):
+  address_family = _localhost()[0]


It's a wee bit ugly IMO to have this resolved at class definition time. Optional, but we could instead pass the address_family into __init__ explicitly and set it on the instance then (prior to calling superclass init which is what constructs the socket). We'd have to explicitly construct the server rather than using make_server() but it's only 3 lines anyway.

https://github.com/python/cpython/blob/2.7/Lib/wsgiref/simple_server.py#L147

OK: overriding address_family directly on the class like this is
common (e.g., this Stack Overflow question), but I’ll grant that
it’s weird for the right-hand side to not be a compile-time constant
expression.

For some reason, TCPServer (the ancestor class that actually consumes
address_family) explicitly says not to override its initializer,
so even though I can’t directly see what might go wrong if we should
ignore this directive, I’ve gone ahead and just created the class
dynamically. Done.

I'm about 95% sure that the directive May be extended, do not override is a poorly worded instruction that subclasses that define __init__() should be sure to call super().__init__() and thus "extend" __init__(), as opposed to if they didn't call super() which would be "overriding".

Evidence for this would be that otherwise "extended" makes no sense, and also that plenty of subclasses of TCPServer exist that define their own __init__() including the werkzeug dev server we currently use, as well as some examples in stdlib like https://github.com/python/cpython/blob/484edbf9bf1a9e6bae0fcb10a0c165b89ea79295/Lib/logging/config.py#L869-L884

Manually constructing the type on the fly is... new to me :P But if you'd rather not touch __init__ then it will do.

Ah—not “This class may be extended; do not override its initializer” but
rather ”You may extend this initializer, as long as you do not override
(= fail to invoke) its superclass implementation.” Yeah, that at least
makes some sense. I had also noted that TCPServer’s base class has the
same directive, and of course TCPServer defines an initializer, so
this is more evidence for your idea.

Either approach is fine with me, then, so I’ll keep it as is.

nfelt · 2019-11-05T00:36:12Z

tensorboard/BUILD

 )

+py_library(
+    name = "expect_requests_installed",


We should take an explicit dep on this in our setup.py now that it's a direct dependency, right?

Yes—thanks!

wchargin-branch: uploader-serverinfo-protos wchargin-source: d11627f29373cb13ececc774071c3cce365a7136

nfelt · 2019-11-07T21:12:40Z

tensorboard/uploader/proto/server_info.proto

+}
+
+message ExperimentUrlFormat {
+  // Template string for experiment URLs. All occurrences of the value of the


SGTM, thanks for the explanation.

nfelt · 2019-11-07T21:25:23Z

tensorboard/uploader/server_info_test.py

+
+
+class _Ipv6CompatibleWsgiServer(simple_server.WSGIServer):
+  address_family = _localhost()[0]


I'm about 95% sure that the directive May be extended, do not override is a poorly worded instruction that subclasses that define __init__() should be sure to call super().__init__() and thus "extend" __init__(), as opposed to if they didn't call super() which would be "overriding".

Evidence for this would be that otherwise "extended" makes no sense, and also that plenty of subclasses of TCPServer exist that define their own __init__() including the werkzeug dev server we currently use, as well as some examples in stdlib like https://github.com/python/cpython/blob/484edbf9bf1a9e6bae0fcb10a0c165b89ea79295/Lib/logging/config.py#L869-L884

Manually constructing the type on the fly is... new to me :P But if you'd rather not touch __init__ then it will do.

nfelt · 2019-11-07T21:39:25Z

tensorboard/uploader/server_info.py

+    )
+
+
+def create_server_info(frontend_origin, api_endpoint):


Ok, I didn't check follow-on changes for usage.

If the goal is to allow locally overriding the API endpoint though, it seems better to me to just have --api_endpoint override whatever's returned from the frontend's ServerInfo endpoint, rather using it to construct a new ServerInfo without hitting the frontend at all. As-is, it's semantically not really right to always use /experiment/ as the URL pattern here, since that should be determined by the frontend.

Or concretely, if I were running local storzy but against the hosted-tensorboard-dev Spanner, I might want to be able to run tensorboard dev --origin=https://hosted-tensorboard-dev.appspot.com --api_endpoint=localhost:10000 and I would argue that should use the experiment pattern and other metadata fetched from the origin still.

If it would seem weird if specifying a local --api_endpoint but not specifying --origin for it to default to pinging tensorboard.dev, we could change the default for --origin so that it only defaults to tensorboard.dev when --api_endpoint is not passed, and otherwise it defaults to empty and then only in that case do we locally construct the ServerInfo (and maybe in that case we should just use a dummy experiment URL pattern).

Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

Summary: This commit adds an RPC definition by which the uploader can connect to the frontend web server at the start of an upload session. This resolves a number of outstanding issues: - The frontend can tell the uploader which backend server to connect to, rather than requiring a hard-coded endpoint in the uploader. - The frontend can tell the uploader how to generate experiment URLs, rather than requiring the backend server to provide this information (which it can’t, really, in general). - The frontend can check whether the uploader client is recent enough and instruct the end user to update if it’s not. - The frontend can warn the user about transient issues in case the service is down, degraded, under maintenance, etc. An endpoint `https://tensorboard.dev/api/uploader` on the server will provide this information. Test Plan: Unit tests suffice. wchargin-branch: uploader-serverinfo-protos

Summary: We’ve deployed production servers that support the handshake protocol specified in tensorflow#2878 and implemented on the client in tensorflow#2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

Summary: This commit adds an RPC definition by which the uploader can connect to the frontend web server at the start of an upload session. This resolves a number of outstanding issues: - The frontend can tell the uploader which backend server to connect to, rather than requiring a hard-coded endpoint in the uploader. - The frontend can tell the uploader how to generate experiment URLs, rather than requiring the backend server to provide this information (which it can’t, really, in general). - The frontend can check whether the uploader client is recent enough and instruct the end user to update if it’s not. - The frontend can warn the user about transient issues in case the service is down, degraded, under maintenance, etc. An endpoint `https://tensorboard.dev/api/uploader` on the server will provide this information. Test Plan: Unit tests suffice. wchargin-branch: uploader-serverinfo-protos

Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake

googlebot added the cla: yes label Nov 2, 2019

wchargin requested a review from nfelt November 2, 2019 00:47

wchargin added the core:uploader label Nov 2, 2019

wchargin added 3 commits November 2, 2019 03:21

[update diffbase]

443d941

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

[update patch]

f8872ac

wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a

[update patch]

37da71c

wchargin-branch: uploader-serverinfo-protos wchargin-source: f4f3624d7a37f1f2229970b0e80c28052f8b000a

nfelt reviewed Nov 5, 2019

View reviewed changes

[update patch]

ea59ed3

wchargin-branch: uploader-serverinfo-protos wchargin-source: d11627f29373cb13ececc774071c3cce365a7136

nfelt approved these changes Nov 7, 2019

View reviewed changes

wchargin merged commit 80a2a4f into master Nov 7, 2019

wchargin deleted the wchargin-uploader-serverinfo-protos branch November 9, 2019 02:21

wchargin mentioned this pull request Nov 21, 2019

uploader: enable handshake by default #2958

Merged

wchargin mentioned this pull request Nov 25, 2019

TensorBoard 2.0.2 #2970

Merged



		class _Ipv6CompatibleWsgiServer(simple_server.WSGIServer):
		address_family = _localhost()[0]

uploader: add ServerInfo protos and logic #2878

uploader: add ServerInfo protos and logic #2878

Uh oh!

Conversation

wchargin commented Nov 2, 2019

Uh oh!

wchargin commented Nov 2, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

uploader: add `ServerInfo` protos and logic #2878

uploader: add `ServerInfo` protos and logic #2878