-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler enhancements #7703
Scheduler enhancements #7703
Commits on Nov 30, 2021
-
Use cgroup limits in worker memory calculations
Worker processes may have memory limitations imposed by Systemd. But /proc/meminfo shows the entire system memory regardless of these limits. This results in the scheduler believing the worker has the entire system memory avaliable and the worker being allocated too many tasks. This change attempts to read cgroup memory limits for the worker process. It supports cgroups v1 and v2, and compares cgroup limits against the system memory and returns the most conservative values to prevent the worker from being allocated too many tasks and potentially triggering an OOM event.
Configuration menu - View commit details
-
Copy full SHA for e2a1ca7 - Browse repository at this point
Copy the full SHA e2a1ca7View commit details -
Report memory used and swap used in worker res
Attempting to report "memory used by other processes" in the MemReserved field fails to take into account the fact that the system's memory used includes memory used by ongoing tasks. To properly account for this, worker should report the memory and swap used, then the scheduler that is aware of the memory requirements for a task can determine if there is sufficient memory available for a task.
Configuration menu - View commit details
-
Copy full SHA for c4f4617 - Browse repository at this point
Copy the full SHA c4f4617View commit details -
Use a float to represent GPU utilization
Before this change workers can only be allocated one GPU task, regardless of how much of the GPU resources that task uses, or how many GPUs are in the system. This makes GPUUtilization a float which can represent that a task needs a portion, or multiple GPUs. GPUs are accounted for like RAM and CPUs so that workers with more GPUs can be allocated more tasks. A known issue is that PC2 cannot use multiple GPUs. And even if the worker has multiple GPUs and is allocated multiple PC2 tasks, those tasks will only run on the first GPU. This could result in unexpected behavior when a worker with multiple GPUs is assigned multiple PC2 tasks. But this should not suprise any existing users who upgrade, as any existing users who run workers with multiple GPUs should already know this and be running a worker per GPU for PC2. But now those users have the freedom to customize the GPU utilization of PC2 to be less than one and effectively run multiple PC2 processes in a single worker. C2 is capable of utilizing multiple GPUs, and now workers can be customized for C2 accordingly.
Configuration menu - View commit details
-
Copy full SHA for 93e4656 - Browse repository at this point
Copy the full SHA 93e4656View commit details -
Permit workers to override resource table
In an environment with heterogenious worker nodes, a universal resource table for all workers does not allow effective scheduling of tasks. Some workers may have different proof cache settings, changing the required memory for different tasks. Some workers may have a different count of CPUs per core-complex, changing the max parallelism of PC1. This change allows workers to customize these parameters with environment variables. A worker could set the environment variable PC1_MIN_MEMORY for example to customize the minimum memory requirement for PC1 tasks. If no environment variables are specified, the resource table on the miner is used, except for PC1 parallelism. If PC1_MAX_PARALLELISM is not specified, and FIL_PROOFS_USE_MULTICORE_SDR is set, PC1_MAX_PARALLELSIM will automatically be set to FIL_PROOFS_MULTICORE_SDR_PRODUCERS + 1.
Configuration menu - View commit details
-
Copy full SHA for 4ef8543 - Browse repository at this point
Copy the full SHA 4ef8543View commit details -
Configuration menu - View commit details
-
Copy full SHA for 36868a8 - Browse repository at this point
Copy the full SHA 36868a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for b961e1a - Browse repository at this point
Copy the full SHA b961e1aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c9a2ff4 - Browse repository at this point
Copy the full SHA c9a2ff4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d52d85 - Browse repository at this point
Copy the full SHA 6d52d85View commit details -
Configuration menu - View commit details
-
Copy full SHA for f25efec - Browse repository at this point
Copy the full SHA f25efecView commit details -
Configuration menu - View commit details
-
Copy full SHA for a597b07 - Browse repository at this point
Copy the full SHA a597b07View commit details -
Configuration menu - View commit details
-
Copy full SHA for 001ecbb - Browse repository at this point
Copy the full SHA 001ecbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for cf20b0b - Browse repository at this point
Copy the full SHA cf20b0bView commit details -
worker: Typo in resources cmd usage
Co-authored-by: Aayush Rajasekaran <arajasek94@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 330cfc3 - Browse repository at this point
Copy the full SHA 330cfc3View commit details