-
Notifications
You must be signed in to change notification settings - Fork 3
Implement resource_dict for file executor #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -9,14 +9,19 @@ | |||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
| from executorlib.standalone.thread import RaisingThread | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||
| from executorlib.standalone.cache.queue import execute_with_pysqa | ||||||||||||||||||||||||||||||
| except ImportError: | ||||||||||||||||||||||||||||||
| # If pysqa is not available fall back to executing tasks in a subprocess | ||||||||||||||||||||||||||||||
| execute_with_pysqa = execute_in_subprocess | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| class FileExecutor(ExecutorBase): | ||||||||||||||||||||||||||||||
| def __init__( | ||||||||||||||||||||||||||||||
| self, | ||||||||||||||||||||||||||||||
| cache_directory: str = "cache", | ||||||||||||||||||||||||||||||
| cores_per_worker: int = 1, | ||||||||||||||||||||||||||||||
| cwd: Optional[str] = None, | ||||||||||||||||||||||||||||||
| execute_function: callable = execute_in_subprocess, | ||||||||||||||||||||||||||||||
| resource_dict: Optional[dict] = None, | ||||||||||||||||||||||||||||||
| execute_function: callable = execute_with_pysqa, | ||||||||||||||||||||||||||||||
| terminate_function: Optional[callable] = None, | ||||||||||||||||||||||||||||||
| config_directory: Optional[str] = None, | ||||||||||||||||||||||||||||||
| backend: Optional[str] = None, | ||||||||||||||||||||||||||||||
|
|
@@ -26,14 +31,24 @@ def __init__( | |||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Args: | ||||||||||||||||||||||||||||||
| cache_directory (str, optional): The directory to store cache files. Defaults to "cache". | ||||||||||||||||||||||||||||||
| resource_dict (dict): A dictionary of resources required by the task. With the following keys: | ||||||||||||||||||||||||||||||
| - cores (int): number of MPI cores to be used for each function call | ||||||||||||||||||||||||||||||
| - cwd (str/None): current working directory where the parallel python task is executed | ||||||||||||||||||||||||||||||
|
Comment on lines
+34
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove documentation for deprecated parameters. The docstring still contains entries for the removed parameters Apply this diff to fix the docstring: resource_dict (dict): A dictionary of resources required by the task. With the following keys:
- cores (int): number of MPI cores to be used for each function call
- cwd (str/None): current working directory where the parallel python task is executed
execute_function (callable, optional): The function to execute tasks. Defaults to execute_in_subprocess.
- cores_per_worker (int, optional): The number of CPU cores per worker. Defaults to 1.
terminate_function (callable, optional): The function to terminate the tasks.
- cwd (str, optional): current working directory where the parallel python task is executedAlso applies to: 33-34 |
||||||||||||||||||||||||||||||
| execute_function (callable, optional): The function to execute tasks. Defaults to execute_in_subprocess. | ||||||||||||||||||||||||||||||
| cores_per_worker (int, optional): The number of CPU cores per worker. Defaults to 1. | ||||||||||||||||||||||||||||||
| terminate_function (callable, optional): The function to terminate the tasks. | ||||||||||||||||||||||||||||||
| cwd (str, optional): current working directory where the parallel python task is executed | ||||||||||||||||||||||||||||||
| config_directory (str, optional): path to the config directory. | ||||||||||||||||||||||||||||||
| backend (str, optional): name of the backend used to spawn tasks. | ||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||
| super().__init__() | ||||||||||||||||||||||||||||||
| default_resource_dict = { | ||||||||||||||||||||||||||||||
| "cores": 1, | ||||||||||||||||||||||||||||||
| "cwd": None, | ||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||
| if resource_dict is None: | ||||||||||||||||||||||||||||||
| resource_dict = {} | ||||||||||||||||||||||||||||||
| resource_dict.update( | ||||||||||||||||||||||||||||||
| {k: v for k, v in default_resource_dict.items() if k not in resource_dict} | ||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||
|
Comment on lines
+43
to
+51
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider simplifying the resource dictionary initialization. While the current implementation is correct, it could be more concise. Consider this simpler implementation: - default_resource_dict = {
- "cores": 1,
- "cwd": None,
- }
- if resource_dict is None:
- resource_dict = {}
- resource_dict.update(
- {k: v for k, v in default_resource_dict.items() if k not in resource_dict}
- )
+ resource_dict = {
+ "cores": 1,
+ "cwd": None,
+ **(resource_dict or {})
+ }📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||||
| if execute_function == execute_in_subprocess and terminate_function is None: | ||||||||||||||||||||||||||||||
| terminate_function = terminate_subprocess | ||||||||||||||||||||||||||||||
| cache_directory_path = os.path.abspath(cache_directory) | ||||||||||||||||||||||||||||||
|
|
@@ -45,8 +60,7 @@ def __init__( | |||||||||||||||||||||||||||||
| "future_queue": self._future_queue, | ||||||||||||||||||||||||||||||
| "execute_function": execute_function, | ||||||||||||||||||||||||||||||
| "cache_directory": cache_directory_path, | ||||||||||||||||||||||||||||||
| "cores_per_worker": cores_per_worker, | ||||||||||||||||||||||||||||||
| "cwd": cwd, | ||||||||||||||||||||||||||||||
| "resource_dict": resource_dict, | ||||||||||||||||||||||||||||||
| "terminate_function": terminate_function, | ||||||||||||||||||||||||||||||
| "config_directory": config_directory, | ||||||||||||||||||||||||||||||
| "backend": backend, | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -50,8 +50,7 @@ def execute_tasks_h5( | |||||||||||||||||
| future_queue: queue.Queue, | ||||||||||||||||||
| cache_directory: str, | ||||||||||||||||||
| execute_function: callable, | ||||||||||||||||||
| cores_per_worker: int = 1, | ||||||||||||||||||
| cwd: Optional[str] = None, | ||||||||||||||||||
| resource_dict: dict, | ||||||||||||||||||
| terminate_function: Optional[callable] = None, | ||||||||||||||||||
| config_directory: Optional[str] = None, | ||||||||||||||||||
| backend: Optional[str] = None, | ||||||||||||||||||
|
|
@@ -62,9 +61,10 @@ def execute_tasks_h5( | |||||||||||||||||
| Args: | ||||||||||||||||||
| future_queue (queue.Queue): The queue containing the tasks. | ||||||||||||||||||
| cache_directory (str): The directory to store the HDF5 files. | ||||||||||||||||||
| cores_per_worker (int): The number of cores per worker. | ||||||||||||||||||
| resource_dict (dict): A dictionary of resources required by the task. With the following keys: | ||||||||||||||||||
| - cores (int): number of MPI cores to be used for each function call | ||||||||||||||||||
| - cwd (str/None): current working directory where the parallel python task is executed | ||||||||||||||||||
|
Comment on lines
+64
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Improve docstring formatting for The docstring for Consider applying this change: - resource_dict (dict): A dictionary of resources required by the task. With the following keys:
- - cores (int): number of MPI cores to be used for each function call
- - cwd (str/None): current working directory where the parallel python task is executed
+ resource_dict (dict): A dictionary of resources required by the task with the following keys:
+ - **cores** (int): Number of MPI cores to be used for each function call.
+ - **cwd** (Optional[str]): Current working directory where the parallel Python task is executed.📝 Committable suggestion
Suggested change
|
||||||||||||||||||
| execute_function (callable): The function to execute the tasks. | ||||||||||||||||||
| cwd (str/None): current working directory where the parallel python task is executed | ||||||||||||||||||
| terminate_function (callable): The function to terminate the tasks. | ||||||||||||||||||
| config_directory (str, optional): path to the config directory. | ||||||||||||||||||
| backend (str, optional): name of the backend used to spawn tasks. | ||||||||||||||||||
|
|
@@ -97,16 +97,15 @@ def execute_tasks_h5( | |||||||||||||||||
| memory_dict=memory_dict, | ||||||||||||||||||
| file_name_dict=file_name_dict, | ||||||||||||||||||
| ) | ||||||||||||||||||
| resource_dict = task_dict["resource_dict"].copy() | ||||||||||||||||||
| if "cores" not in resource_dict: | ||||||||||||||||||
| resource_dict["cores"] = cores_per_worker | ||||||||||||||||||
| if "cwd" not in resource_dict: | ||||||||||||||||||
| resource_dict["cwd"] = cwd | ||||||||||||||||||
| task_resource_dict = task_dict["resource_dict"].copy() | ||||||||||||||||||
| task_resource_dict.update( | ||||||||||||||||||
| {k: v for k, v in resource_dict.items() if k not in task_resource_dict} | ||||||||||||||||||
| ) | ||||||||||||||||||
|
Comment on lines
+100
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handle missing At line 100, the code assumes that Apply this diff to safely handle cases where -task_resource_dict = task_dict["resource_dict"].copy()
+task_resource_dict = task_dict.get("resource_dict", {}).copy()Additionally, ensure that the subsequent code accounts for the possibility of an empty 📝 Committable suggestion
Suggested change
|
||||||||||||||||||
| task_key, data_dict = serialize_funct_h5( | ||||||||||||||||||
| fn=task_dict["fn"], | ||||||||||||||||||
| fn_args=task_args, | ||||||||||||||||||
| fn_kwargs=task_kwargs, | ||||||||||||||||||
| resource_dict=resource_dict, | ||||||||||||||||||
| resource_dict=task_resource_dict, | ||||||||||||||||||
| ) | ||||||||||||||||||
| if task_key not in memory_dict.keys(): | ||||||||||||||||||
| if task_key + ".h5out" not in os.listdir(cache_directory): | ||||||||||||||||||
|
|
@@ -115,12 +114,12 @@ def execute_tasks_h5( | |||||||||||||||||
| process_dict[task_key] = execute_function( | ||||||||||||||||||
| command=_get_execute_command( | ||||||||||||||||||
| file_name=file_name, | ||||||||||||||||||
| cores=cores_per_worker, | ||||||||||||||||||
| cores=task_resource_dict["cores"], | ||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ensure At line 117, accessing Consider providing a default value or adding a check: Option 1: Provide a default value for - cores=task_resource_dict["cores"],
+ cores=task_resource_dict.get("cores", 1),Option 2: Add a check and raise an explicit error if if "cores" not in task_resource_dict:
raise KeyError("'cores' key is missing in task_resource_dict")
cores = task_resource_dict["cores"] |
||||||||||||||||||
| ), | ||||||||||||||||||
| task_dependent_lst=[ | ||||||||||||||||||
| process_dict[k] for k in future_wait_key_lst | ||||||||||||||||||
| ], | ||||||||||||||||||
| resource_dict=resource_dict, | ||||||||||||||||||
| resource_dict=task_resource_dict, | ||||||||||||||||||
| config_directory=config_directory, | ||||||||||||||||||
| backend=backend, | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,19 +4,19 @@ | |
| import shutil | ||
| import unittest | ||
|
|
||
| from executorlib.standalone.cache.spawner import ( | ||
| execute_in_subprocess, | ||
| terminate_subprocess, | ||
| ) | ||
| from executorlib.standalone.thread import RaisingThread | ||
|
|
||
| try: | ||
| from executorlib import FileExecutor | ||
| from executorlib.cache.shared import execute_tasks_h5 | ||
| from executorlib.standalone.cache.spawner import ( | ||
| execute_in_subprocess, | ||
| terminate_subprocess, | ||
| ) | ||
|
|
||
| skip_h5io_test = False | ||
| skip_h5py_test = False | ||
| except ImportError: | ||
| skip_h5io_test = True | ||
| skip_h5py_test = True | ||
|
|
||
|
|
||
| def my_funct(a, b): | ||
|
|
@@ -28,18 +28,18 @@ def list_files_in_working_directory(): | |
|
|
||
|
|
||
| @unittest.skipIf( | ||
| skip_h5io_test, "h5io is not installed, so the h5io tests are skipped." | ||
| skip_h5py_test, "h5py is not installed, so the h5py tests are skipped." | ||
| ) | ||
| class TestCacheExecutorSerial(unittest.TestCase): | ||
| def test_executor_mixed(self): | ||
| with FileExecutor() as exe: | ||
| with FileExecutor(execute_function=execute_in_subprocess) as exe: | ||
| fs1 = exe.submit(my_funct, 1, b=2) | ||
| self.assertFalse(fs1.done()) | ||
| self.assertEqual(fs1.result(), 3) | ||
| self.assertTrue(fs1.done()) | ||
|
|
||
| def test_executor_dependence_mixed(self): | ||
| with FileExecutor() as exe: | ||
| with FileExecutor(execute_function=execute_in_subprocess) as exe: | ||
| fs1 = exe.submit(my_funct, 1, b=2) | ||
| fs2 = exe.submit(my_funct, 1, b=fs1) | ||
| self.assertFalse(fs2.done()) | ||
|
|
@@ -48,7 +48,9 @@ def test_executor_dependence_mixed(self): | |
|
|
||
| def test_executor_working_directory(self): | ||
| cwd = os.path.join(os.path.dirname(__file__), "executables") | ||
| with FileExecutor(cwd=cwd) as exe: | ||
| with FileExecutor( | ||
| resource_dict={"cwd": cwd}, execute_function=execute_in_subprocess | ||
| ) as exe: | ||
| fs1 = exe.submit(list_files_in_working_directory) | ||
| self.assertEqual(fs1.result(), os.listdir(cwd)) | ||
|
|
||
|
|
@@ -72,7 +74,7 @@ def test_executor_function(self): | |
| "future_queue": q, | ||
| "cache_directory": cache_dir, | ||
| "execute_function": execute_in_subprocess, | ||
| "cores_per_worker": 1, | ||
| "resource_dict": {"cores": 1, "cwd": None}, | ||
| "terminate_function": terminate_subprocess, | ||
| }, | ||
| ) | ||
|
|
@@ -113,7 +115,7 @@ def test_executor_function_dependence_kwargs(self): | |
| "future_queue": q, | ||
| "cache_directory": cache_dir, | ||
| "execute_function": execute_in_subprocess, | ||
| "cores_per_worker": 1, | ||
| "resource_dict": {"cores": 1, "cwd": None}, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consider refactoring duplicate resource configurations. The same resource configuration class TestCacheExecutorSerial(unittest.TestCase):
+ @classmethod
+ def setUpClass(cls):
+ super().setUpClass()
+ cls.default_resource_dict = {"cores": 1, "cwd": None}
def test_executor_function(self):
# ... existing code ...
kwargs={
"future_queue": q,
"cache_directory": cache_dir,
"execute_function": execute_in_subprocess,
- "resource_dict": {"cores": 1, "cwd": None},
+ "resource_dict": self.default_resource_dict,
"terminate_function": terminate_subprocess,
}Also applies to: 157-157 |
||
| "terminate_function": terminate_subprocess, | ||
| }, | ||
| ) | ||
|
|
@@ -154,7 +156,7 @@ def test_executor_function_dependence_args(self): | |
| "future_queue": q, | ||
| "cache_directory": cache_dir, | ||
| "execute_function": execute_in_subprocess, | ||
| "cores_per_worker": 1, | ||
| "resource_dict": {"cores": 1, "cwd": None}, | ||
| "terminate_function": terminate_subprocess, | ||
| }, | ||
| ) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the search results, I can generate the final response as there are clear inconsistencies found in the codebase.
Documentation and code still contain references to old resource_dict keys
Several files still use the old keys and need to be updated:
notebooks/examples.ipynb: Contains multiple references tocores_per_workerandgpus_per_workerin documentation and code examplesREADME.md: Usescores_per_workerin documentationexecutorlib/__init__.py: Still documents the parameter ascores_per_workerandgpus_per_workerin docstringsexecutorlib/standalone/inputcheck.py: Contains function and error messages usinggpus_per_workertests/test_shared_input_check.py: Test cases usegpus_per_workerWhile some files like
executorlib/interactive/executor.pyhave been updated to use the newcoresandgpus_per_corekeys, there are still many places where the old terminology is used, which could lead to confusion.🔗 Analysis chain
Ensure all references to old
resource_dictkeys are updatedThe keys in
resource_dicthave been updated fromcores_per_workertocoresand fromgpus_per_workertogpus_per_core. Please verify that all references to the old keys have been updated throughout the codebase and documentation to prevent inconsistencies.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
Length of output: 3352