Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fixing extra download for filesystem's localization #254

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

oandreeva-nv
Copy link
Contributor

To avoid unnecessary copy during localization, I added fetch_subdir parameter to LocalizePath: If specified, will only download provided sub directory, otherwise all subdirectories will be downloaded. Does not affect files individual files, located under path.

[WIP] tag due to the following. When we agree on the proposed solution, I will extend it to GCS and AS.

What was seen before:

root@oandreeva-dt:/opt/tritonserver# tree /tmp
/tmp
|-- folderDkrpXO
|   |-- 1
|   |   `-- model.pt
|   |-- 2
|   |   `-- model.pt
|   |-- 3
|   |   `-- model.pt
|   |-- config.pbtxt
|   `-- output0_labels.txt
|-- folderVuS3x0
|   |-- 1
|   |   `-- model.graphdef
|   |-- 2
|   |   `-- model.graphdef
|   |-- 3
|   |   `-- model.graphdef
|   |-- config.pbtxt
|   `-- output0_labels.txt
|-- folderfQ1yf3
|   |-- 1
|   |   `-- model.pt
|   |-- 2
|   |   `-- model.pt
|   |-- 3
|   |   `-- model.pt
|   |-- config.pbtxt
|   `-- output0_labels.txt
`-- folderwxZ6hr
    |-- 1
    |   `-- model.graphdef
    |-- 2
    |   `-- model.graphdef
    |-- 3
    |   `-- model.graphdef
    |-- config.pbtxt
    `-- output0_labels.txt

With this fix:

    root@oandreeva-dt:/opt/tritonserver# tree /tmp
/tmp
|-- folder9sfzV0
|   |-- 1
|   |   `-- model.pt
|   |-- config.pbtxt
|   `-- output0_labels.txt
|-- folderHBt0pT
|   |-- 3
|   |   `-- model.pt
|   |-- config.pbtxt
|   `-- output0_labels.txt
|-- folderZbfFyR
|   |-- 3
|   |   `-- model.graphdef
|   |-- config.pbtxt
|   `-- output0_labels.txt
|-- folderryK9pq
|   |-- 1
|   |   `-- model.graphdef
|   |-- config.pbtxt
|   `-- output0_labels.txt

Comment on lines 146 to 156
/// Create an object representing a local copy of a path.
/// \param path The path of the directory or file.
/// \param fetch_subdir If specified, will only download provided
/// sub directory, otherwise all subdirectories will be downloaded.
/// Does not affect files individual files, located under `path`.
/// \param localized Returns the LocalizedPath object
/// representing the local copy of the path.
/// \return Error status
Status LocalizePath(
const std::string& path, std::shared_ptr<LocalizedPath>* localized);
const std::string& path, const std::string& fetch_subdir,
std::shared_ptr<LocalizedPath>* localized);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interface is insufficiently generic for functions at the level of LocalizePath(), because LocalizePath() should be available to any callers who want to localize any path on the cloud, but fetch_subdir imposes a limitation on the caller to either fetch all sub-directories within the path or only the sub-directory specified. The caller may not "choose" which sub-directories to fetch or not fetch. let alone choosing sub-directories within a sub-directory.

I think there are many different generic design choices. For example:

  1. Use bool recursive instead of string fetch_subdir. The caller can choose not fetching any sub-directories with recursive = false or all sub-directories with recursive = true. Then, the problem can be accomplished by first fetching the given model path with recursive = false (i.e. /repo/model) and then the model path with version with recursive = true (i.e. /repo/model/1). However, this will require the LocalizedPath object to be able to join the model path and model version path localization objects into one localization object.
  2. Allow the caller to specify a list of sub-paths to localize (or not localize). The "complicated" structure of sub-paths allows for fetching different combinations of sub-directories in a single call, which eliminates the need for joining model path and model version path localization objects, but the "complicated" structure is less intuitive for the caller to use and may not be easier to implement than the first design choice.

However, the interface accomplishes exactly what we want to solve at this time. I will defer to @GuanLuo and @nnshah1 to decide.

Copy link
Contributor Author

@oandreeva-nv oandreeva-nv Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this will require the LocalizedPath object to be able to join the model path and model version path localization objects into one localization object.

Do we? If I understand correctly, versioned path is set here:

core/src/backend_model.cc

Lines 113 to 114 in a23786c

const auto version_path =
JoinPath({localized_model_path, std::to_string(version)});

so instead of

const auto version_path =
      JoinPath({localized_model_path, std::to_string(version)});

we would do something like:

 std::shared_ptr<LocalizedPath> version_path;
  RETURN_IF_ERROR(LocalizePath(path/to/version, recursive=false, &version_path));

or am I missing something?

Copy link
Contributor

@kthui kthui Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking something like:

std::string model_path = "/repo/model_dir";
std::string model_version_path = "/repo/model_dir/1";

std::shared_ptr<LocalizedPath> model_localized;
std::shared_ptr<LocalizedPath> model_version_localized;
LocalizePath(model_path, recursive=false, &model_localized);
LocalizePath(model_version_path, recursive=true, &model_version_localized);

model_localized->Include(model_version_localized);

The idea behind LocalizedPath::Include() is to join

model_tmp
|
`--- config.pbtxt

and

model_version_tmp
|
`--- model.py

to

model_tmp
|
`--- config.pbtxt
|
`--- 1
     |
     `--- model.py

on the local disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually am missing something. Even though I like option 1 suggestion, localization in this case will happen to another directory. Meaning, at current stage every call to LocalizePath creates a new temporary directory. Thus if we separately will try to load model and its version, they will be copied in 2 different directories. So unless we want to change that, option 1 is not the best option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be ok to only implement the base case above, and throw an error if the caller is trying to join/include anything more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on offline discussion, with @kthui, we can add one more parameter location to control where to localize provided path, i.e. something like

LocalizePath(model_version_path, recursive-true/false, location=<string>, &model_version_localized);

Then we can control where version directory will be localized , and if location is empty, we create a new, temporary directory.

This will also help with the follow up ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, at current stage every call to LocalizePath creates a new temporary directory. Thus if we separately will try to load model and its version, they will be copied in 2 different directories. So unless we want to change that, option 1 is not the best option.

I am thinking on the same direction as LocalizedPath::Include() that if we want to have a generic way to localizing directories (with customization on fetching subset of contents / recurrsiveness), we want to make LocalizedPath a "filler" class: as on calling fs::LocalizePath, it returns an fully localized object (current) or empty object (with local root path assigned). Then if user want to customize the localization, they can call:

LocalizedPath::Include("sub/path/content", recursive);

Although I think it then the process is convoluted that there is a iterative calls of GetDirectorySubdirs/Files and LocalizedPath::Include

Copy link
Contributor

@GuanLuo GuanLuo Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is really an internal API (unless we want to expose file system API in the future) so we are free to modify the interface as frequent as we want and don't need to worry about being generic. If localizing model directory is somewhat specific, we may just introduce a LocalizeModelDirectory method to fit the specific need.

Because when you look at what to be localized for a model version, it is actually "everything in the directory except for the unwanted version subdirectories (integer-named directories)", that is tricky to express it with some generic language.

@oandreeva-nv
Copy link
Contributor Author

oandreeva-nv commented Sep 12, 2023

@kthui, @GuanLuo and @nnshah1. It is ready for re-review.

I didn't implement LocalizedPath::Include, since it can be done as a separate follow-up PR with rename for Linux and MoveFile for Windows with concerns regarding

convoluted that there is a iterative calls of GetDirectorySubdirs/Files and LocalizedPath::Include

properly considered. As of now, I think specifying mount_dir of a model's localized path when calling LocalizePath( version_path, true /*recursive*/, localized_model_dir->Path(), &localized_version_dir) should be sufficient. To make sure that localized content for version persists, I add localized_version_dir to localized_model_dir->other_localized_path:
localized_model_dir->other_localized_path.push_back(localized_version_dir);

Change in LocalFileSystem::MakeDirectory : I've added one more allowed errno, i.e. File Exists, since it make sence to use LocalFileSystem::MakeDirectory in LocalizePath, and for multiple versions of the same model, creation of a folder with the same name (i.e. model name) occurs. To clarify, when we use mkdtemp then instead of model names folderXXXX is used, so they are differentiable. With mount directory specified, a folder with model's name will be created on the first call of LocalizePath . Thus, when the next version of this model is being processed, this folder will already exist.

Regarding environmental variable. Current name is TRITON_AWS_MOUNT_DIRECTORY. I think it would make sense to create filesystem type-specific env names for gcs and as as well. Open for discussion.
I thought about adding the ability to specify this location in credentials file, but I think since local mount directory is not really a credential, it is not the best location + code's logic reads contents into S3Credential struct, it would be strange to have mount_dir member in this file, thus it would need some other wrapping structure + more code change. We can do it later upon user request.

Finally, current requirements for mount directory:

  1. Should exist before
  2. Should not contain same elements (folders and files) as remote. Otherwise, upon stopping server, all same contents will be deleted from the local directory. Let me know if this is a reasonable ask.

Current look: bucket is mounted to /tmp/aws:

/tmp
`-- aws
    |-- graphdef_float32_float32_float32
    |   |-- 1
    |   |   `-- model.graphdef
    |   |-- 3
    |   |   `-- model.graphdef
    |   |-- config.pbtxt
    |   `-- output0_labels.txt
    `-- libtorch_float32_float32_float32
        |-- 1
        |   `-- model.pt
        |-- 3
        |   `-- model.pt
        |-- config.pbtxt
        `-- output0_labels.txt

@oandreeva-nv oandreeva-nv requested a review from kthui September 12, 2023 03:03
{
std::shared_ptr<FileSystem> fs;
RETURN_IF_ERROR(fsm_.GetFileSystem(path, fs));
return fs->LocalizePath(path, localized);
return fs->LocalizePath(path, recursive, "", localized);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this (L565-L567) can potentially be shortened to

retrurn LocalizePath(path, recursive, "", localized)

GuanLuo
GuanLuo previously approved these changes Sep 12, 2023
Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

@@ -363,7 +364,8 @@ GCSFileSystem::ReadTextFile(const std::string& path, std::string* contents)

Status
GCSFileSystem::LocalizePath(
const std::string& path, std::shared_ptr<LocalizedPath>* localized)
const std::string& path, const bool recursive, const std::string& mount_dir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to implement the new API in other cloud file system in this PR as well? If not, should return error if !mount_dir.empty()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to first get an agreement on the implementation for s3. I'll populate it to GCS and AS in the next commit for easier review

// Only allow the error due to parent directory does not exist
// if 'recursive' is requested
// Return success if directory already exists
if (errno == EEXIST) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a function argument on whether file exist is allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll add it

kthui
kthui previously approved these changes Sep 12, 2023
/// \param path The path of the directory or file.
/// \param recursive If true, will fetch all sub-directories in
/// the provided path.
/// \param mount_dir If specified, will use provided local directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "If not empty" rather than "If specified"? I do not think the caller can skip this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify, thank you for suggestion

@oandreeva-nv oandreeva-nv dismissed stale reviews from kthui and GuanLuo via 6903e1c September 13, 2023 02:02
@oandreeva-nv oandreeva-nv changed the title [WIP] Adding ability to specify subdirectory to download for LocalizePath Adding ability to specify subdirectory to download for LocalizePath Sep 13, 2023
if (recursive) {
for (const auto& directory_item : blob_prefixes) {
const auto& local_path = JoinPath({dest, BaseName(directory_item)});
int status = mkdir(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the FileSystem wrapped API be used here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, for Windows. good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Comment on lines +457 to +461
tmp_folder = mount_dir.empty() ? std::string(env_mount_dir) : mount_dir;
tmp_folder =
JoinPath({tmp_folder, path.substr(path.find_last_of('/') + 1)});
RETURN_IF_ERROR(triton::core::MakeDirectory(
tmp_folder, true /*recursive*/, true /*allow_dir_exist*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oandreeva-nv just checking my understanding, I see your comment saying:

Finally, current requirements for mount directory:

  1. Should exist before

But in the code I'm seeing calls to MakeDirectory(), which I'm inferring would create the directory if it doesn't exist.

So to clarify, is the mount directory required to exist before server startup? Or will Triton create it on demand, similar to tmp folder workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify this in a follow-up commit. This PR would need more de-bugging and testing

Comment on lines +451 to +453
const char* env_mount_dir = std::getenv("TRITON_AZURE_MOUNT_DIRECTORY");
std::string tmp_folder;
if (mount_dir.empty() && env_mount_dir == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's already a follow-up ticket, I'd like to see use of some helper function like this one that safely checks if the env var is set or is nullptr, and just returns a std::string. This helper could probably just take an env var name as an argument to condense the usage.

Currently this if/else is OK, but if the conditions were to be tweaked in the future, there is some risk for std::string(env_mount_dir) when it is a nullptr.

@oandreeva-nv oandreeva-nv changed the title Adding ability to specify subdirectory to download for LocalizePath [WIP] Fixing extra download for filesystem's localization Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants