Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate folder to use cluster folder apis where relevant #1116

Merged
merged 68 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bc875fc
add folder http endpoint
jlewitt1 Aug 5, 2024
02473ff
enforce write only access to folder http endpoint
jlewitt1 Aug 5, 2024
e829646
add folder http endpoint
jlewitt1 Aug 5, 2024
4da0c4d
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 6, 2024
6e33577
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
ba45073
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
d1b4d98
cluster folder API tests
jlewitt1 Aug 6, 2024
4cf2141
add folder http endpoint
jlewitt1 Aug 5, 2024
0d24ea4
add folder http endpoint
jlewitt1 Aug 5, 2024
1722452
enforce write only access to folder http endpoint
jlewitt1 Aug 5, 2024
39dfcd3
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
b64671c
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
cd16185
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 6, 2024
111c97f
migrate folder to use cluster folder apis where relevant
jlewitt1 Aug 6, 2024
314dcb3
add folder http endpoint
jlewitt1 Aug 5, 2024
ae3127d
add folder http endpoint
jlewitt1 Aug 5, 2024
03640e4
enforce write only access to folder http endpoint
jlewitt1 Aug 5, 2024
d88da6c
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
f0bbccb
add folder http endpoint
jlewitt1 Aug 5, 2024
41f5e30
enforce write only access to folder http endpoint
jlewitt1 Aug 5, 2024
3b8ceec
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
6897ffc
add folder http endpoint
jlewitt1 Aug 5, 2024
2949eae
add folder http endpoint
jlewitt1 Aug 5, 2024
1356dc1
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
47b7e7f
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
4039885
cluster folder API tests
jlewitt1 Aug 6, 2024
61ca459
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
9794d5a
add folder http endpoint
jlewitt1 Aug 5, 2024
e1fceb0
migrate folder to use cluster folder apis where relevant
jlewitt1 Aug 6, 2024
025839a
add folder http endpoint
jlewitt1 Aug 5, 2024
162b2fa
add folder http endpoint
jlewitt1 Aug 5, 2024
a37cdf6
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
a4c7c31
cluster folder API tests
jlewitt1 Aug 6, 2024
a0f1595
add folder methods to rh.cluster
jlewitt1 Aug 5, 2024
8583d38
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 6, 2024
dd1312d
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 6, 2024
23ac500
Merge branch 'folder-http-endpoint' into cluster-folder-methods
jlewitt1 Aug 6, 2024
4474b1e
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 6, 2024
02bed52
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 6, 2024
9189969
Merge branch 'test-cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 6, 2024
a573fd2
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 6, 2024
5e8e668
Merge remote-tracking branch 'origin/use-cluster-folder-apis' into us…
jlewitt1 Aug 6, 2024
d13f270
Merge branch 'test-cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 6, 2024
ca83d03
add folder http endpoint
jlewitt1 Aug 5, 2024
59f0594
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 8, 2024
a69c37c
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 8, 2024
08a71e3
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 8, 2024
f0340ec
Merge branch 'test-cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 8, 2024
dbf99be
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 8, 2024
5c0a60d
Merge branch 'test-cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 8, 2024
bbd72e7
add folder http endpoint
jlewitt1 Aug 5, 2024
092875f
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 10, 2024
5cb7f7e
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 10, 2024
f1125f6
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 10, 2024
e7d67eb
Merge branch 'test-cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 10, 2024
e5e16e7
add folder http endpoint
jlewitt1 Aug 5, 2024
bfe2bc9
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 10, 2024
7f31f51
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 10, 2024
e1f5a09
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 10, 2024
b9775d6
Merge branch 'cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 10, 2024
667f5a5
Merge branch 'cluster-folder-methods' into test-cluster-folder-methods
jlewitt1 Aug 10, 2024
ebb8cd7
Merge branch 'cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 10, 2024
fbe82a6
Merge branch 'folder-http-endpoint' into folder-endpoint-access
jlewitt1 Aug 10, 2024
88e36d7
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 11, 2024
39dce2e
Merge branch 'cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 11, 2024
95046a0
Merge branch 'folder-endpoint-access' into cluster-folder-methods
jlewitt1 Aug 11, 2024
fd26926
Merge branch 'cluster-folder-methods' into use-cluster-folder-apis
jlewitt1 Aug 12, 2024
afc8b31
Merge remote-tracking branch 'origin/main' into use-cluster-folder-apis
jlewitt1 Aug 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 141 additions & 89 deletions runhouse/resources/folders/folder.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ def local_path(self):
else:
return None

@property
def _use_http_endpoint(self):
"""Whether to use system APIs to perform folder operations on the cluster via HTTP."""
return isinstance(self.system, Cluster) and not self.system.on_this_cluster()

def mv(self, system, path: Optional[str] = None, overwrite: bool = True) -> None:
"""Move the folder to a new filesystem or cluster.

Expand All @@ -142,31 +147,39 @@ def mv(self, system, path: Optional[str] = None, overwrite: bool = True) -> None
if path is None:
raise ValueError("A destination path must be specified.")

dest_path = Path(path).expanduser()
src_path = Path(self.path).expanduser()
src_path = Path()
dest_path = Path(path)

if not src_path.exists():
raise FileNotFoundError(f"The source path {src_path} does not exist.")
if self._use_http_endpoint:
self.system._folder_mv(
path=src_path, dest_path=dest_path, overwrite=overwrite
)
else:
dest_path = dest_path.expanduser()
src_path = src_path.expanduser()

if system == self.DEFAULT_FS:
if dest_path.exists():
raise FileExistsError(
f"The destination path {dest_path} already exists."
)
if not src_path.exists():
raise FileNotFoundError(f"The source path {src_path} does not exist.")

if system == self.DEFAULT_FS:
if dest_path.exists():
raise FileExistsError(
f"The destination path {dest_path} already exists."
)

# Create the destination directory if it doesn't exist
dest_path.parent.mkdir(parents=True, exist_ok=overwrite)
# Create the destination directory if it doesn't exist
dest_path.parent.mkdir(parents=True, exist_ok=overwrite)

# Move the directory
shutil.move(str(src_path), str(dest_path))
# Move the directory
shutil.move(str(src_path), str(dest_path))

# Update the path attribute
self.path = str(dest_path)
self.system = self.DEFAULT_FS
# Update the path attribute
self.path = str(dest_path)
self.system = self.DEFAULT_FS

else:
# TODO [JL] support moving to other systems
raise NotImplementedError(f"System {system} not supported for local mv")
else:
# TODO [JL] support moving to other systems
raise NotImplementedError(f"System {system} not supported for local mv")

def to(
self,
Expand Down Expand Up @@ -287,10 +300,14 @@ def _to_data_store(

def mkdir(self):
"""Create the folder in specified file system if it doesn't already exist."""
path = Path(self.path).expanduser()
if not path.exists():
path.mkdir(parents=True, exist_ok=True)
logger.info(f"Folder created in path: {path}")
if self._use_http_endpoint:
self.system._folder_mkdir(self.path)
else:
path = Path(self.path).expanduser()
if not path.exists():
path.mkdir(parents=True, exist_ok=True)

logger.debug(f"Folder created in path: {self.path}")

def _to_cluster(self, dest_cluster, path=None):
"""Copy the folder from a file or cluster source onto a destination cluster."""
Expand Down Expand Up @@ -512,35 +529,52 @@ def ls(self, full_paths: bool = True, sort: bool = False) -> List:
sort (Optional[bool]): Whether to sort the folder contents by time modified.
Defaults to ``False``.
"""
paths = [p for p in Path(self.path).expanduser().iterdir()]
if self._use_http_endpoint:
self.system._folder_ls(self.path)
else:
path = Path(self.path).expanduser()
paths = [p for p in path.iterdir()]

# Sort the paths by modification time if sort is True
if sort:
paths.sort(key=lambda p: p.stat().st_mtime, reverse=True)
# Sort the paths by modification time if sort is True
if sort:
paths.sort(key=lambda p: p.stat().st_mtime, reverse=True)

# Convert paths to strings and format them based on full_paths
if full_paths:
return [str(p.resolve()) for p in paths]
else:
return [p.name for p in paths]
# Convert paths to strings and format them based on full_paths
if full_paths:
return [str(p.resolve()) for p in paths]
else:
return [p.name for p in paths]

def resources(self, full_paths: bool = False):
"""List the resources in the *RNS* folder.
"""List the resources in the folder.

Example:
>>> resources = my_folder.resources()
"""
try:
resources = [
path for path in self.ls() if (Path(path) / "config.json").exists()
]
except FileNotFoundError:
return []

if full_paths:
return [self.rns_address + "/" + Path(path).stem for path in resources]
else:
return [Path(path).stem for path in resources]
if self._use_http_endpoint:
paths = self.system._folder_ls(path=self.path, full_paths=full_paths)
resources = [
path
for path in paths
if self.system._folder_exists(path=f"{path}/config.json")
]
return resources
else:
resources = [
path for path in self.ls() if (Path(path) / "config.json").exists()
]
if full_paths:
return [
self.rns_address + "/" + Path(path).stem for path in resources
]
else:
return [Path(path).stem for path in resources]

except Exception as e:
logger.error(f"Error listing resources in folder: {e}")
return []

@property
def rns_address(self):
Expand Down Expand Up @@ -667,6 +701,11 @@ def get(self, name, mode="rb", encoding=None):
Example:
>>> contents = my_folder.get(file_name)
"""
if self._use_http_endpoint:
return self.system._folder_get(
path=Path(self.path) / name, mode=mode, encoding=encoding
)

with self.open(name, mode=mode, encoding=encoding) as f:
return f.read()

Expand All @@ -682,8 +721,11 @@ def exists_in_system(self):
Example:
>>> exists_on_system = my_folder.exists_in_system()
"""
full_path = Path(self.path).expanduser()
return full_path.exists() and full_path.is_dir()
if self._use_http_endpoint:
return self.system._folder_exists(self.path)
else:
full_path = Path(self.path).expanduser()
return full_path.exists() and full_path.is_dir()

def rm(self, contents: list = None, recursive: bool = True):
"""Delete a folder from the file system. Optionally provide a list of folder contents to delete.
Expand All @@ -696,24 +738,26 @@ def rm(self, contents: list = None, recursive: bool = True):
Example:
>>> my_folder.rm()
"""
folder_path = Path(self.path).expanduser()

if contents:
Folder._delete_contents(contents, folder_path, recursive)
if self._use_http_endpoint:
self.system._folder_rm(self.path, contents, recursive)
else:
if recursive:
shutil.rmtree(folder_path)
folder_path = Path(self.path).expanduser()
if contents:
jlewitt1 marked this conversation as resolved.
Show resolved Hide resolved
Folder._delete_contents(contents, folder_path, recursive)
else:
if folder_path.is_dir():
for item in folder_path.iterdir():
if item.is_file():
item.unlink()
else:
raise ValueError(
f"Folder {item} found in {folder_path}, recursive is set to False"
)
if recursive:
shutil.rmtree(folder_path)
else:
folder_path.unlink()
if folder_path.is_dir():
for item in folder_path.iterdir():
if item.is_file():
item.unlink()
else:
raise ValueError(
f"Folder {item} found in {folder_path}, recursive is set to False"
)
else:
folder_path.unlink()

def put(self, contents, overwrite=False, mode: str = "wb"):
"""Put given contents in folder.
Expand All @@ -729,47 +773,55 @@ def put(self, contents, overwrite=False, mode: str = "wb"):
Example:
>>> my_folder.put(contents={"filename.txt": data})
"""
self.mkdir()

full_path = str(Path(self.path).expanduser())
# Handle lists of resources just for convenience
if isinstance(contents, list):
for resource in contents:
self.put(resource, overwrite=overwrite)
return

if isinstance(contents, Folder):
if contents.path is None: # Should only be the case when Folder is created
contents.path = os.path.join(full_path, contents.name)
return

if not isinstance(contents, dict):
raise TypeError(
"`contents` argument must be a dict mapping filenames to file-like objects"
if self._use_http_endpoint:
self.system._folder_put(
path=self.path, contents=contents, overwrite=overwrite, mode=mode
)

if overwrite is False:
# Check if files exist and raise an error if they do
existing_files = set(os.listdir(full_path))
intersection = existing_files.intersection(set(contents.keys()))
if intersection:
raise FileExistsError(
f"File(s) {intersection} already exist(s) at path: {full_path}. "
f"Cannot save them with overwrite={overwrite}."
else:
self.mkdir()
full_path = str(Path(self.path).expanduser())
if isinstance(contents, Folder):
if (
contents.path is None
): # Should only be the case when Folder is created
contents.path = os.path.join(full_path, contents.name)
return

if not isinstance(contents, dict):
raise TypeError(
"`contents` argument must be a dict mapping filenames to file-like objects"
)

for filename, file_obj in contents.items():
file_obj = self._serialize_file_obj(file_obj)
file_path = Path(full_path) / filename
if not overwrite and file_path.exists():
raise FileExistsError(f"File {file_path} already exists.")
if overwrite is False:
# Check if files exist and raise an error if they do
existing_files = set(os.listdir(full_path))
intersection = existing_files.intersection(set(contents.keys()))
if intersection:
raise FileExistsError(
f"File(s) {intersection} already exist(s) at path: {full_path}. "
f"Cannot save them with overwrite={overwrite}."
)

try:
with open(file_path, mode) as f:
f.write(file_obj)
for filename, file_obj in contents.items():
file_obj = self._serialize_file_obj(file_obj)
file_path = Path(full_path) / filename
if not overwrite and file_path.exists():
raise FileExistsError(f"File {file_path} already exists.")

except Exception as e:
raise RuntimeError(f"Failed to write {filename} to {file_path}: {e}")
try:
with open(file_path, mode) as f:
f.write(file_obj)

except Exception as e:
raise RuntimeError(
f"Failed to write {filename} to {file_path}: {e}"
)

@staticmethod
def _serialize_file_obj(file_obj):
Expand Down
2 changes: 1 addition & 1 deletion runhouse/servers/http/http_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,8 +312,8 @@ def folder_put(
self,
path: Union[str, Path],
contents: Union[Dict[str, Any], Resource, List[Resource]],
mode: str,
overwrite: bool,
mode: str,
serialization: str,
):
folder_params = FolderPutParams(
Expand Down
7 changes: 5 additions & 2 deletions runhouse/servers/http/http_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ class FolderGetParams(FolderParams):

class FolderPutParams(FolderParams):
contents: Optional[Any]
mode: Optional[str] = None
overwrite: Optional[bool] = False
mode: Optional[str] = None
serialization: Optional[str] = None


Expand Down Expand Up @@ -429,7 +429,10 @@ def folder_put(
with open(file_path, mode) as f:
f.write(file_obj)
except Exception as e:
HTTPException(status_code=500, detail=f"Failed to write file: {str(e)}")
HTTPException(
status_code=500,
detail=f"Failed to write file with mode '{mode}': {str(e)}",
)

return Response(output_type=OutputType.SUCCESS)

Expand Down
3 changes: 1 addition & 2 deletions tests/test_resources/test_clusters/test_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -949,8 +949,7 @@ def test_cluster_mkdir_mv_and_rm(self, cluster):
# Delete folder contents and directory itself
cluster._folder_rm(path="~/new-folder", recursive=True)

folder_contents: list = cluster._folder_ls(path="~")
assert "new-folder" not in [os.path.basename(f) for f in folder_contents]
assert not cluster._folder_exists(path="~/new-folder")

@pytest.mark.level("release")
@pytest.mark.clustertest
Expand Down
Loading