Skip to content

Check model files when cached model folder exists #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Check model files when cached model folder exists #602

wants to merge 4 commits into from

Conversation

djdongjin
Copy link
Contributor

When loading a module with a TF-Hub link. The resolver only checks if the model folder exists but not checks if the model files exist inside the folder. The situation happens quite possible, since usually the module is cached in /tmp. So it's likely that the system only delete model files but keep the model folder, which causes errors like in #575.

@djdongjin
Copy link
Contributor Author

One thing is that if the fix is correct, I didn't explicitly delete the incomplete existing folder. I don't know if the downloading will cover the existing folder. So please let me know if such an action is needed.

@djdongjin
Copy link
Contributor Author

Anyone willing to review? Thanks!

@akhorlin
Copy link
Collaborator

akhorlin commented Jun 5, 2020

Sorry for the delay Jin. Thank you for taking a look into this issue.

Instead checking for presence of specific files, we could just check whether the directory is empty or not:

if tf_v1.gfile.Exists(module_dir) and tf_v1.gfile.ListDirectory(module_dir):

Also even with the check adjusted it looks Rename() below would fail with "already exists", so we probably should also add a removal of module_dir if it's empty there:

if not tf_v1.gfile.ListDirectory(module_dir):
tf_v1.gfile.DeleteRecursively(module_dir)

Let's also add tests to cover the case of trying to download into an empty directory.

@djdongjin
Copy link
Contributor Author

@akhorlin Thanks for the detailed reply! I will modify accordingly.

One question is that if it's possible for the system to delete some files (e.g. the model files, due to large size) but leave some tiny files (metadata) in a /tmp folder. If this is not an issue, then your solution will work great.

@akhorlin
Copy link
Collaborator

akhorlin commented Jun 5, 2020 via email

@djdongjin
Copy link
Contributor Author

okay, maybe we can check if the folder is empty first and add test accordingly. I think this should solve most of the corner cases encountered by me and others in the issue mentioned above.

@akhorlin
Copy link
Collaborator

akhorlin commented Jun 5, 2020 via email

@djdongjin
Copy link
Contributor Author

I did check it when it issue happened. The folder is indeed empty at that time.

@djdongjin
Copy link
Contributor Author

@akhorlin I changed the checking logic as mentioned above. Can you help me on adding a test for this case?

I looked into the resolver_test.py but it seems no existing testcase uses a TF-Hub link (thus no need to download). My intuition is that I should use a TF-Hub link to download a module first and then use test_utils.get_test_data_path(module_name) to obtain the path. Then I can delete content inside the folder and reload the model to check if the loading succeeds.

@akhorlin
Copy link
Collaborator

For testing, I think we can follow the pattern in resolver_test.py. They are indeed not e2e tests but they can exercise the logic in question. So the test could look something like:

  1. Create empty target dir
  2. Call atomic_download() with a test-specific download_fn that would create a test file.
  3. Check that the test file was created.

We do have internal e2e tests that use the handle directly. So we could add something there as well once the changes are in.

@djdongjin
Copy link
Contributor Author

@akhorlin I added a test. Please check : )

@@ -380,9 +380,12 @@ def atomic_download(handle,
overwrite=False)
# Must test condition again, since another process could have created
# the module and deleted the old lock file since last test.
if tf_v1.gfile.Exists(module_dir):
if tf_v1.gfile.Exists(module_dir) and \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line continuations with \ are not allowed in Google's py style-guide, lets do

if (tf_v1.gfile.Exists(module_dir) and
tf_v1.gfile.ListDirectory(module_dir)):
...

os.path.join(module_dir, "file"), "content", False)

# Delete existing folder and create an empty one.
if tf_v1.gfile.Exists(module_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be the case sinec get_temp_dir() should return a unique directory. May be just self.assertFalse(tf_v1.gfile.Exists(...))

@akhorlin
Copy link
Collaborator

Overall looks good. Thank you for looking into this issue. I added a few minor comments, once resolved I can integrate the change into our repo.

@djdongjin
Copy link
Contributor Author

I fixed the comments. Please check : )

@akhorlin
Copy link
Collaborator

Looks good. I will work on integrating this change early next week.

@akhorlin
Copy link
Collaborator

The change has been integrated in c0610ab. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants