Skip to content

Commit

Permalink
apacheGH-38704: [C++] Implement Azure FileSystem Move() via Azure Dat…
Browse files Browse the repository at this point in the history
…aLake Storage Gen 2 API (apache#39904)

### Rationale for this change

We need to move directories and files via the `arrow::FileSystem` interface.

### What changes are included in this PR?

 - A few filesystem error reporting improvements
 - A helper class to deal with Azure Storage leases [1]
 - The `Move()` implementation that can move files and directories within the same container on storage accounts with Hierarchical Namespace Support enabled
 - Lots of tests

[1]: https://learn.microsoft.com/en-us/rest/api/storageservices/lease-blob

### Are these changes tested?

Yes, by existing and a huge number of tests added by this PR. The test code introduced here should be extracted to a reusable test module that we can use to test move in other file system implementations.

### Are there any user-facing changes?

No breaking changes, only new functionality.
* Closes: apache#38704

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
  • Loading branch information
felipecrv authored and dgreiss committed Feb 17, 2024
1 parent cfd615d commit b8b20b6
Show file tree
Hide file tree
Showing 7 changed files with 1,216 additions and 53 deletions.
725 changes: 695 additions & 30 deletions cpp/src/arrow/filesystem/azurefs.cc

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,25 @@ class ARROW_EXPORT AzureFileSystem : public FileSystem {

Status DeleteFile(const std::string& path) override;

/// \brief Move / rename a file or directory.
///
/// There are no files immediately at the root directory, so paths like
/// "/segment" always refer to a container of the storage account and are
/// treated as directories.
///
/// If `dest` exists but the operation fails for some reason, `Move`
/// guarantees `dest` is not lost.
///
/// Conditions for a successful move:
/// 1. `src` must exist.
/// 2. `dest` can't contain a strict path prefix of `src`. More generally,
/// a directory can't be made a subdirectory of itself.
/// 3. If `dest` already exists and it's a file, `src` must also be a file.
/// `dest` is then replaced by `src`.
/// 4. All components of `dest` must exist, except for the last.
/// 5. If `dest` already exists and it's a directory, `src` must also be a
/// directory and `dest` must be empty. `dest` is then replaced by `src`
/// and its contents.
Status Move(const std::string& src, const std::string& dest) override;

Status CopyFile(const std::string& src, const std::string& dest) override;
Expand Down
499 changes: 476 additions & 23 deletions cpp/src/arrow/filesystem/azurefs_test.cc

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions cpp/src/arrow/filesystem/util_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,21 @@ Status PathNotFound(std::string_view path) {
.WithDetail(StatusDetailFromErrno(ENOENT));
}

Status IsADir(std::string_view path) {
return Status::IOError("Is a directory: '", path, "'")
.WithDetail(StatusDetailFromErrno(EISDIR));
}

Status NotADir(std::string_view path) {
return Status::IOError("Not a directory: '", path, "'")
.WithDetail(StatusDetailFromErrno(ENOTDIR));
}

Status NotEmpty(std::string_view path) {
return Status::IOError("Directory not empty: '", path, "'")
.WithDetail(StatusDetailFromErrno(ENOTEMPTY));
}

Status NotAFile(std::string_view path) {
return Status::IOError("Not a regular file: '", path, "'");
}
Expand Down
6 changes: 6 additions & 0 deletions cpp/src/arrow/filesystem/util_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,15 @@ Status CopyStream(const std::shared_ptr<io::InputStream>& src,
ARROW_EXPORT
Status PathNotFound(std::string_view path);

ARROW_EXPORT
Status IsADir(std::string_view path);

ARROW_EXPORT
Status NotADir(std::string_view path);

ARROW_EXPORT
Status NotEmpty(std::string_view path);

ARROW_EXPORT
Status NotAFile(std::string_view path);

Expand Down
7 changes: 7 additions & 0 deletions cpp/src/arrow/util/io_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,13 @@ std::shared_ptr<StatusDetail> StatusDetailFromErrno(int errnum) {
return std::make_shared<ErrnoDetail>(errnum);
}

std::optional<int> ErrnoFromStatusDetail(const StatusDetail& detail) {
if (detail.type_id() == kErrnoDetailTypeId) {
return checked_cast<const ErrnoDetail&>(detail).errnum();
}
return std::nullopt;
}

#if _WIN32
std::shared_ptr<StatusDetail> StatusDetailFromWinError(int errnum) {
if (!errnum) {
Expand Down
3 changes: 3 additions & 0 deletions cpp/src/arrow/util/io_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

#include <atomic>
#include <memory>
#include <optional>
#include <string>
#include <utility>
#include <vector>
Expand Down Expand Up @@ -264,6 +265,8 @@ std::string WinErrorMessage(int errnum);

ARROW_EXPORT
std::shared_ptr<StatusDetail> StatusDetailFromErrno(int errnum);
ARROW_EXPORT
std::optional<int> ErrnoFromStatusDetail(const StatusDetail& detail);
#if _WIN32
ARROW_EXPORT
std::shared_ptr<StatusDetail> StatusDetailFromWinError(int errnum);
Expand Down

0 comments on commit b8b20b6

Please sign in to comment.