Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server explains the reasons to the worker if no tasks are available #1582

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

maximmasiutin
Copy link
Contributor

For example, if the worker's machine had too low memory, that worker was not assigned any task by the server, without any explanation. Now the explanation is given.

@zungur
Copy link
Collaborator

zungur commented Mar 19, 2023

I think sri hash and worker version updates also accompany any changes made to the worker.py

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

Better to use a StrEnum https://docs.python.org/3/library/enum.html#enum.StrEnum I think than a scattering of arbitrary variables. Makes it more readable and less error prone in future coding.

@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

There are already the info and error fields in the reply. Please use the info field. It will be handled correctly.

@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

There is also the issue that the messages depend on the run that is being handled, which is suppressed in the feedback.

In general I think most messages are uninteresting to the worker. If we want to give some feedback then I think we should restrict ourselves to messages for maxmemory and minthreads, since these depend on settings the worker has control over.

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

Oh I see, the PR intent is to include all related reasons for skipping a task, not just one...

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

In general I think most messages are uninteresting to the worker. If we want to give some feedback then I think we should restrict ourselves to messages for maxmemory and minthreads, since these depend on settings the worker has control over.

Most of these look like worker-controllable problems. The main exception is the per-run core limit, and the lesser exception is the STC limit. But even the STC limit can be controlled by the worker's owner duplicating the worker, and in the per-run core limit, it's still nice to know that it's the server's fault, so to speak, not the worker.

@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

I think there are only two serious misconfigurations a worker can do that may cause it to get no runs. Setting minthreads > 1 or setting maxmemory too low. To me it seems most useful to give feedback for these.

Anyway if we really want reasons for all runs then I would not use booleans but counts so that we can see how many runs where affected by a particular reason.

@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

I would also make the server generate the info message. In that way the worker does not need to be changed.

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

I had mostly completed this code before seeing the two prior comments, and decided to finish it, if for no other reason than as exercise:

maximmasiutin#4

Refactors the present PR with a Flag Enum, greatly improving both maintainability and readability, imo, while more or less preserving the same functionality as desired by Maxim.

I think there are only two serious misconfigurations a worker can do that may cause it to get no runs. Setting minthreads > 1 or setting maxmemory too low. To me it seems most useful to give feedback for these.

It seems to me that every one of the following messages has considerable utility to a worker-operator:

__messages = {MachineLimit: "This user has reached the max machines limit.",
              LowThreads:   "An available run requires more than CONCURRENCY threads."
              HighThreads:  "An available run requires less than MIN_THREADS threads."
              LowMemory:    "An available run requires more than MAX_MEMORY memory."
              NoBinary:     "This worker has exceeded its GitHub API limit, and has no local binary for an availabe run."
              SkipSTC:      "An available run is at STC, requiring less than CONCURRENCY threads due to cutechess issues. Consider splitting this worker. See Discord."
              ServerSide:   "Server error or no active runs. Try again shortly."
             }

(Obviously some are more common than others, e.g. MIN_THREADS, but that just makes clear messages all the more important in such rare cases.)

Anyway if we really want reasons for all runs then I would not use booleans but counts so that we can see how many runs where affected by a particular reason.

Hm.... interesting idea. I don't really see the added benefit of having a count, rather than simply a boolean, but it could certainly be done.

I would also make the server generate the info message. In that way the worker does not need to be changed.

Well, I wrote my PR-to-this-PR before seeing this comment, but, at any rate, the error messages refer to worker-side command line options, so I'm not certain that referring to worker command line options in the server code is ideal, from a maintenance perspective. (On the other hand, I explicitly duplicated the Flag enum in both files, which is similar in terms of duplicating on either side, but still better imo because the duplicated code is meaningful to both sides, whereas the worker command line options are only meaningful to the worker, not the server.)

Use an Enum to greatly improve both maintainability and readability

Untested
@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

Well, I wrote my PR-to-this-PR before seeing this comment, but, at any rate, the error messages refer to worker-side command line options, so I'm not certain that referring to worker command line options in the server code is ideal, from a maintenance perspective. (On the other hand, I explicitly duplicated the Flag enum in both files, which is similar in terms of duplicating on either side, but still better imo because the duplicated code is meaningful to both sides, whereas the worker command line options are only meaningful to the worker, not the server.)

The info messages refer to variables which are both meaningful for the worker and for server and which the worker can set via command line options or directly via the config file.

I think from a software engineering point of view it is much better that the worker does not control the info messages. Otherwise any server change requires a coordinated worker change.

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

Otherwise any server change requires a coordinated worker change.

At the present time, I'm not sure that can be avoided? And I guess your general principle here is that the server changes more often than the worker?

@vdbergh
Copy link
Contributor

vdbergh commented Mar 19, 2023

Otherwise any server change requires a coordinated worker change.

At the present time, I'm not sure that can be avoided? And I guess your general principle here is that the server changes more often than the worker?

Yes. Moreover worker changes are more difficult since they need to be tested on different architectures (@ppigazzini does a great job on that).

I think if the server sets the info field then the worker will do the right thing (that's the intention anyway).

If the worker needs to be adapted then it is preferable that the adaptation is generic so that it only has to be done once.

@maximmasiutin
Copy link
Contributor Author

@vdbergh Thank you very much by accepting my proposal and writing proper code! Your objections were more than reasonable, I also thought about implementing them since I didn't like lots of ifs and Booleans. I would not touch my code since you are already wrote the correct code.

@maximmasiutin
Copy link
Contributor Author

maximmasiutin commented Mar 19, 2023 via email

Refactor maxim's request task error messages
@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

I think if the server sets the info field then the worker will do the right thing (that's the intention anyway).

It does not do anything except print a generic "no dice try again" message. Printing anything from the server will require updating the worker.

That said, as you say, it certainly be done once in a generic way -- simply print whatever messages the server sends.

@maximmasiutin
Copy link
Contributor Author

Now this pull request contains the updates made by @dubslow via https://github.com/maximmasiutin/fishtest/pull/4/commits

@maximmasiutin
Copy link
Contributor Author

There are pros and cons of formatting the message on the worker vs formatting it on the server.

If the message is formed by the server, than the worker will not need updates on new messages. If the message is formed by the worker, it can figure exactly the reason, can display language-specific messages, and has various options on how to proceed.

Therefore, I propose to combine both methods. The server sends kind of a dictionary of dictionaries which combines "code", "message" and optional values, like:

{
  {code: "worker_low_memory", message: "The MAX_MEMORY parameter is too low"},
  {code: "worker_core_limit_reached", message: "Exceeded the limit of 2048 cores set by the server", core_limit: 2048}
}

The client at the very least will just combine the messages and display, but will be able to also handle them properly if needed. The only thing to be careful about is not to catch deserialization exploits, i.e. we should write such a code that will not be affected by deserialization attack exploits.

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

Duplicating my discord comment:

If the message is formed by the worker, it can figure exactly the reason, can display language-specific messages, and has various options on how to proceed.

That's exactly the stuff that has to be updated in the worker every time we add a new reason or change existing reasons serverside. Vdbergh specifically wants to not update the worker when the server is updated. And I kind of agree, with more thought.

For now, I'll convert my recent commit to sending pre-formatted strings from the server to the worker. If we truly find something in the future that needs worker-side interpretation, it would be easy enough to add that then.

@dubslow
Copy link
Contributor

dubslow commented Mar 19, 2023

I've pushed a new version which moves all message interpretation from the worker to the server. Now the worker looks almost like master, only now it prints whatever string the server sends.

maximmasiutin#5

It still only does boolean error reporting, but it's not so hard (more or less trivial, I daresay) to add counting on top of that if we really want it.

dubslow and others added 2 commits March 19, 2023 05:51
Per vdbergh's comments

Still untested
Remove all message interpretation from worker to server
@maximmasiutin
Copy link
Contributor Author

That's exactly the stuff that has to be updated in the worker every time we add a new reason or change

No, @dubslow - the worker can just enumerate (foreach) and display all the messages. No update on the worker is needed on new message with my approach proposed in the comment #1582 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants