Skip to content

Conversation

@rileyajones
Copy link
Contributor

@rileyajones rileyajones commented Aug 28, 2023

Motivation for features / changes

#6552

Technical description of changes

The port scanning logic originally added in #1851 does not seem to work anymore. WerkzeugServer no longer raises an Error when attempting to bind to a port which is in used and instead handles the issue silently.

The current Werkzeug version is 2.3.7 but I had 2.3.4 installed when I first verified this. I tried installing 2.2.0 and the issues was not present so I assume it was introduced in 2.3.0.
https://pypi.org/project/Werkzeug

Detailed steps to verify changes work correctly (as executed by you)

  1. Start TensorBoard, note the port it is bound to (it should be 6006)
  2. In a separate terminal start TensorBoard again.
  3. TensorBoard should successfully bind to a different port (this time 6007)

@rileyajones rileyajones requested a review from nfelt August 28, 2023 19:35
@rileyajones rileyajones marked this pull request as ready for review August 28, 2023 20:22
Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this! I wonder a little whether this may be quite slow in some cases (e.g. if connecting to the "test" socket will time out or otherwise fail in a less-than-clean way) but I suppose users can always pass a hardcoded port or script it (as you had in the issue) as a workaround.

The current Werkzeug version is 3.3.7

Just checking, I assume you mean 2.3.7? (that's the latest I can find)

I looked at the Werkzeug code and this commit looks like probably the culprit (moves the sys.exit() call introduced in this commit from a codepath I believe we don't call into one that we do).

Presumably Werkzeug is fine with this change being unfriendly to our use case, since they don't want people using their WSGI server for non-development purposes anyway 😒

for port in range(base_port, base_port + max_attempts):
subflags = argparse.Namespace(**vars(flags))
subflags.port = port
if is_port_in_use(port):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the port-in-use check here won't respect the should_scan logic (based on flags.port).

Can we instead move it closer to where the logic previously happened (when binding to the socket would raise the exception)? I.e. move it into WerkzeugServer.__init__() before the call to the superclass init (which is where we'll attempt binding the socket) and make it raise TensorBoardPortInUseError, which will then trigger the existing except logic on the lines below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This portion of the code is a bit complicated so while I think I've made the suggested change, please let me know if I've missed something.

@rileyajones rileyajones requested a review from nfelt August 29, 2023 01:02
@rileyajones
Copy link
Contributor Author

rileyajones commented Aug 29, 2023

Just checking, I assume you mean 2.3.7? (that's the latest I can find)

I do mean 2.3.7, I've update the description.

@rileyajones rileyajones merged commit 26f5e33 into tensorflow:master Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants