Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if repo or file exists #1591

Merged
merged 10 commits into from
Aug 28, 2023

Conversation

martinbrose
Copy link
Contributor

@martinbrose martinbrose changed the title 36 check if repo or file exists check if repo or file exists Aug 14, 2023
@martinbrose martinbrose changed the title check if repo or file exists Check if repo or file exists Aug 14, 2023
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work @martinbrose ! I've carefully reviewed you PR and added some comments to it. Most of them are cosmetic changes to format the docstring as expected.

I have also added a suggestion for to tests the new feature on private repos.

Finally, you also need to add repo_exists and file_exists as root methods in huggingface_hub. All main methods are accessible either with a custom HfApi client or at root level. To do so, you need to:

  • define the root methods here
  • add them to this list in __init__.py
  • run make style (which will run python utils/check_static_imports.py --update)
  • commit your changes

Please let me know if you have any questions! 🤗

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
Comment on lines 167 to 180
@retry_endpoint
def test_repo_exists(self):
self.assertTrue(self._api.repo_exists("gpt2"))
self.assertFalse(self._api.repo_exists("repo-that-does-not-exist")) # missing repo

@retry_endpoint
def test_file_exists(self):
self.assertTrue(self._api.file_exists("config.json", "gpt2"))
self.assertFalse(self._api.file_exists("config.json", "repo-that-does-not-exist")) # missing repo
self.assertFalse(self._api.file_exists("file-does-not-exist", "gpt2")) # missing file
self.assertFalse(
self._api.file_exists("config.json", "gpt2", revision="revision-that-does-not-exist")
) # missing revision

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've asked, here is a suggestion to test private repos. I would create a new class HfApiRepoFileExistsTest to test your two new methods so that tests are self-contained. In the setup of the tests, you create a private repo + upload a dumb file to it. Then you can try repo/file existence in various cases. Finally, you delete the repo in the tearDown method. All of this runs on the staging environment.

class HfApiRepoFileExistsTest(HfApiCommonTest):
    def setUp(self) -> None:
        super().setUp()
        self.repo_id = self._api.create_repo(repo_name(), private=True).repo_id
        self._api.upload_file(repo_id=self.repo_id, path_in_repo="file.txt", path_or_fileobj=b"content")

    def tearDown(self) -> None:
        self._api.delete_repo(self.repo_id)
        return super().tearDown()

    @retry_endpoint
    def test_repo_exists(self):
        self.assertTrue(self._api.repo_exists(self.repo_id))
        self.assertTrue(self._api.repo_exists(self.repo_id, token=False))  # private repo
        self.assertFalse(self._api.repo_exists("repo-that-does-not-exist"))  # missing repo

    @retry_endpoint
    def test_file_exists(self):
        self.assertTrue(self._api.file_exists(self.repo_id, "file.txt"))
        self.assertFalse(self._api.file_exists("repo-that-does-not-exist", "file.txt"))  # missing repo
        self.assertFalse(self._api.file_exists(self.repo_id, "file-does-not-exist"))  # missing file
        self.assertFalse(
            self._api.file_exists(self.repo_id, "file.txt", revision="revision-that-does-not-exist")
        )  # missing revision
        self.assertFalse(self._api.file_exists(self.repo_id, "file.txt", token=False))  # private repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in addition to the test I have added to class HfApiEndpointsTest(HfApiCommonTest)?

Currently, some of the tests in HfApiRepoFileExistsTest(HfApiCommonTest) are failing...
I believe I don't understand all the above-proposed tests completely yet.
I need to read up on what behavior private repos should have and get a fresh look at this in a few days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests would be a replacement to the tests you've written. I haven't changed much of the logic compared to what you did except:

  • move them to a standalone class. It's better than extending existing classes. It allows to do pytest tests/test_hf_api.py -k HfApiRepoFileExistsTest to run only those ones.
  • use HfApiCommonTest => used to setup a HfApi client pointing to the staging environment. The user token is already set for this one (TOKEN). It is a copy of the real Hub that we use only for tests. You can see it as a playground in which even if you do something bad, it's not a problem.
  • add a setUp method: this one is run before each test. It creates a new repo on the staging Hub + upload a first file to it.
  • add a tearDown method: this one is run after each test. It deletes the temporary repo we've created.
  • finally, redefine the tests you've written. I have added 2 cases to check that file_exists and repo_exists are returning false when ran on a private repo without the accurate token.

Copy link
Contributor Author

@martinbrose martinbrose Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd like to clarify. In the tests you propose for test_repo_exists, you propose:

self.assertTrue(self._api.repo_exists(self.repo_id, token=False))  # private repo

Maybe I'm confusing something here, but shouldn't that be a self.assertFalse?
Because we are providing a false token, so we shouldn't see the private repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm confusing something here, but shouldn't that be a self.assertFalse?

Yes totally! My bad 😁

src/huggingface_hub/hf_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/hf_api.py Show resolved Hide resolved
@martinbrose
Copy link
Contributor Author

Many thanks for the detailed review points! I'll attend to them in the next few days.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 16, 2023

The documentation is not available anymore as the PR was closed or merged.

@martinbrose
Copy link
Contributor Author

Hi @Wauplin,

I believe I addressed all of your review points.
The test cases were a hard nut to crack, but I think I got there in the end...

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Well done for the tests, those are really important! 👍 🎉

@Wauplin Wauplin merged commit c862d10 into huggingface:main Aug 28, 2023
@martinbrose martinbrose deleted the 36-check-if-repo-or-file-exists branch September 7, 2023 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants