-
Notifications
You must be signed in to change notification settings - Fork 1.7k
uploader: add ServerInfo protos and logic
#2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:
- The frontend can tell the uploader which backend server to connect
to, rather than requiring a hard-coded endpoint in the uploader.
- The frontend can tell the uploader how to generate experiment URLs,
rather than requiring the backend server to provide this information
(which it can’t, really, in general).
- The frontend can check whether the uploader client is recent enough
and instruct the end user to update if it’s not.
- The frontend can warn the user about transient issues in case the
service is down, degraded, under maintenance, etc.
An endpoint `https://tensorboard.dev/api/uploader` on the server will
provide this information.
Test Plan:
Unit tests suffice.
wchargin-branch: uploader-serverinfo-protos
|
Googlers: see http://cl/278027294 for server implementation. |
wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a
wchargin-branch: uploader-serverinfo-protos wchargin-source: 16d41149a8ce9e2e816df874217a7da10d83810a
wchargin-branch: uploader-serverinfo-protos wchargin-source: f4f3624d7a37f1f2229970b0e80c28052f8b000a
| // soon be unsupported, or if the server is experiencing transient issues. | ||
| VERDICT_WARN = 2; | ||
| // The client should cease further communication with the server and abort | ||
| // operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that there should be an accompanying details in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
|
|
||
| message ApiServer { | ||
| // Address of a gRPC server. For example: "dns:///api.tensorboard.dev:443". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we establish if we need the dns:/// prefix? It would be good if we can point to a more explicit reference for this that makes it clear whether the scheme is included. The canonical references seems to be https://github.com/grpc/grpc/blob/master/doc/naming.md
Per that we could explicitly state that this is a gRPC server URI, which is probably a bit better than the more generic "Address" and endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference! Looks like those docs say that dns is the
default scheme, which is consistent for all our observations except for
some Google-specific internal tools. That seems fine: changed to call
this a “gRPC server URI” and link to those docs, and changed the example
to omit the dns:/// prefix (like the actual server response will).
| string template = 1; | ||
| // Placeholder string that should be replaced with an actual experiment ID. | ||
| // (See docs for `template` field.) | ||
| string placeholder = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe id_placeholder or something just to hedge against a future in which we might have some other placeholder, like an alpha/beta/stable channel indicator or something fancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
|
|
||
| message ExperimentUrlFormat { | ||
| // Template string for experiment URLs. All occurrences of the value of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want these to be absolute or relative?
Relative has the advantage that it allows a single frontend to be exposed across multiple domain names and clients would generate the appropriate URLs. Of course, we could I suppose extract this from the incoming /api/uploader request and do the substitution server-side; I could see an argument for either approach.
Another possibility would be adding a template placeholder for the base URL, which if included is basically the same as a relative URL, but could be omitted in the actual template to allow the frontend to return an absolute URL if it wanted to avoid ambiguity (e.g. if wanted to do that on prod).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was doing the expansion server side based on the requested host.
If we used relative URLs, then the client would need to pass around the
origin with the ServerInfoResponse everywhere—the latter would no
longer contain all the needed information. Not the end of the world, but
the server-side handling (http://cl/278027294) seemed simple to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, thanks for the explanation.
| ) | ||
|
|
||
|
|
||
| def create_server_info(frontend_origin, api_endpoint): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used in tests, I think it's preferable to keep it in test code or utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s used outside of tests (as of #2879), when you directly specify an
endpoint. The idea is:
tensorboard dev --origin=https://tensorboard.dev --api_endpoint=
(normal prod usage; these are the default options)tensorboard dev --origin=localhost:8080(for development, running
both a local frontend and a local backend)tensorboard dev --api_endpoint=localhost:10000(for development,
running only a local backend)
Historically, it’s been possible to test the uploader by spinning up a
local gRPC server without having to also spin up a frontend. I wanted to
preserve that ability, because it’s convenient for testing flows that
don’t require a frontend (upload → delete, upload → export, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I didn't check follow-on changes for usage.
If the goal is to allow locally overriding the API endpoint though, it seems better to me to just have --api_endpoint override whatever's returned from the frontend's ServerInfo endpoint, rather using it to construct a new ServerInfo without hitting the frontend at all. As-is, it's semantically not really right to always use /experiment/ as the URL pattern here, since that should be determined by the frontend.
Or concretely, if I were running local storzy but against the hosted-tensorboard-dev Spanner, I might want to be able to run tensorboard dev --origin=https://hosted-tensorboard-dev.appspot.com --api_endpoint=localhost:10000 and I would argue that should use the experiment pattern and other metadata fetched from the origin still.
If it would seem weird if specifying a local --api_endpoint but not specifying --origin for it to default to pinging tensorboard.dev, we could change the default for --origin so that it only defaults to tensorboard.dev when --api_endpoint is not passed, and otherwise it defaults to empty and then only in that case do we locally construct the ServerInfo (and maybe in that case we should just use a dummy experiment URL pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; I also wasn’t fully content with the flags formulation in #2879.
Your approach sounds fine to me: I’ll revise that PR accordingly.
|
|
||
|
|
||
| class _Ipv6CompatibleWsgiServer(simple_server.WSGIServer): | ||
| address_family = _localhost()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a wee bit ugly IMO to have this resolved at class definition time. Optional, but we could instead pass the address_family into __init__ explicitly and set it on the instance then (prior to calling superclass init which is what constructs the socket). We'd have to explicitly construct the server rather than using make_server() but it's only 3 lines anyway.
https://github.com/python/cpython/blob/2.7/Lib/wsgiref/simple_server.py#L147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK: overriding address_family directly on the class like this is
common (e.g., this Stack Overflow question), but I’ll grant that
it’s weird for the right-hand side to not be a compile-time constant
expression.
For some reason, TCPServer (the ancestor class that actually consumes
address_family) explicitly says not to override its initializer,
so even though I can’t directly see what might go wrong if we should
ignore this directive, I’ve gone ahead and just created the class
dynamically. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about 95% sure that the directive May be extended, do not override is a poorly worded instruction that subclasses that define __init__() should be sure to call super().__init__() and thus "extend" __init__(), as opposed to if they didn't call super() which would be "overriding".
Evidence for this would be that otherwise "extended" makes no sense, and also that plenty of subclasses of TCPServer exist that define their own __init__() including the werkzeug dev server we currently use, as well as some examples in stdlib like https://github.com/python/cpython/blob/484edbf9bf1a9e6bae0fcb10a0c165b89ea79295/Lib/logging/config.py#L869-L884
Manually constructing the type on the fly is... new to me :P But if you'd rather not touch __init__ then it will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah—not “This class may be extended; do not override its initializer” but
rather ”You may extend this initializer, as long as you do not override
(= fail to invoke) its superclass implementation.” Yeah, that at least
makes some sense. I had also noted that TCPServer’s base class has the
same directive, and of course TCPServer defines an initializer, so
this is more evidence for your idea.
Either approach is fine with me, then, so I’ll keep it as is.
| ) | ||
|
|
||
| py_library( | ||
| name = "expect_requests_installed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take an explicit dep on this in our setup.py now that it's a direct dependency, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes—thanks!
wchargin-branch: uploader-serverinfo-protos wchargin-source: d11627f29373cb13ececc774071c3cce365a7136
| } | ||
|
|
||
| message ExperimentUrlFormat { | ||
| // Template string for experiment URLs. All occurrences of the value of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, thanks for the explanation.
|
|
||
|
|
||
| class _Ipv6CompatibleWsgiServer(simple_server.WSGIServer): | ||
| address_family = _localhost()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about 95% sure that the directive May be extended, do not override is a poorly worded instruction that subclasses that define __init__() should be sure to call super().__init__() and thus "extend" __init__(), as opposed to if they didn't call super() which would be "overriding".
Evidence for this would be that otherwise "extended" makes no sense, and also that plenty of subclasses of TCPServer exist that define their own __init__() including the werkzeug dev server we currently use, as well as some examples in stdlib like https://github.com/python/cpython/blob/484edbf9bf1a9e6bae0fcb10a0c165b89ea79295/Lib/logging/config.py#L869-L884
Manually constructing the type on the fly is... new to me :P But if you'd rather not touch __init__ then it will do.
| ) | ||
|
|
||
|
|
||
| def create_server_info(frontend_origin, api_endpoint): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I didn't check follow-on changes for usage.
If the goal is to allow locally overriding the API endpoint though, it seems better to me to just have --api_endpoint override whatever's returned from the frontend's ServerInfo endpoint, rather using it to construct a new ServerInfo without hitting the frontend at all. As-is, it's semantically not really right to always use /experiment/ as the URL pattern here, since that should be determined by the frontend.
Or concretely, if I were running local storzy but against the hosted-tensorboard-dev Spanner, I might want to be able to run tensorboard dev --origin=https://hosted-tensorboard-dev.appspot.com --api_endpoint=localhost:10000 and I would argue that should use the experiment pattern and other metadata fetched from the origin still.
If it would seem weird if specifying a local --api_endpoint but not specifying --origin for it to default to pinging tensorboard.dev, we could change the default for --origin so that it only defaults to tensorboard.dev when --api_endpoint is not passed, and otherwise it defaults to empty and then only in that case do we locally construct the ServerInfo (and maybe in that case we should just use a dummy experiment URL pattern).
Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake
Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake
Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:
- The frontend can tell the uploader which backend server to connect
to, rather than requiring a hard-coded endpoint in the uploader.
- The frontend can tell the uploader how to generate experiment URLs,
rather than requiring the backend server to provide this information
(which it can’t, really, in general).
- The frontend can check whether the uploader client is recent enough
and instruct the end user to update if it’s not.
- The frontend can warn the user about transient issues in case the
service is down, degraded, under maintenance, etc.
An endpoint `https://tensorboard.dev/api/uploader` on the server will
provide this information.
Test Plan:
Unit tests suffice.
wchargin-branch: uploader-serverinfo-protos
Summary: We’ve deployed production servers that support the handshake protocol specified in tensorflow#2878 and implemented on the client in tensorflow#2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake
Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:
- The frontend can tell the uploader which backend server to connect
to, rather than requiring a hard-coded endpoint in the uploader.
- The frontend can tell the uploader how to generate experiment URLs,
rather than requiring the backend server to provide this information
(which it can’t, really, in general).
- The frontend can check whether the uploader client is recent enough
and instruct the end user to update if it’s not.
- The frontend can warn the user about transient issues in case the
service is down, degraded, under maintenance, etc.
An endpoint `https://tensorboard.dev/api/uploader` on the server will
provide this information.
Test Plan:
Unit tests suffice.
wchargin-branch: uploader-serverinfo-protos
Summary: We’ve deployed production servers that support the handshake protocol specified in #2878 and implemented on the client in #2879. This commit enables that protocol by default. Test Plan: Running `bazel run //tensorboard -- dev list` still properly connects and prints valid URLs. Re-running with the TensorBoard version patched to `2.0.0a0` (in `version/version.py`) properly causes a handshake failure. Setting `--origin` to point to a non-prod frontend properly connects to the appropriate backend. Setting `--api_endpoint` to point to a non-prod backend connects directly, skipping the handshake, and printing `https://tensorboard.dev` URLs. Specifying both `--origin` and `--api_endpoint` performs the handshake and overrides the backend server only, printing URLs corresponding to the listed frontend. Running `git grep api.tensorboard.dev` no longer finds any code results. As a double check, building the Pip package and running it in a new virtualenv still has a working `tensorboard dev upload` flow. wchargin-branch: uploader-handshake
Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:
to, rather than requiring a hard-coded endpoint in the uploader.
rather than requiring the backend server to provide this information
(which it can’t, really, in general).
and instruct the end user to update if it’s not.
service is down, degraded, under maintenance, etc.
An endpoint
https://tensorboard.dev/api/uploaderon the server willprovide this information.
Test Plan:
Unit tests suffice.
wchargin-branch: uploader-serverinfo-protos