Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Avoid creating repository when it exists on remote #900

Merged
merged 17 commits into from
Jun 15, 2022
Merged

FIX Avoid creating repository when it exists on remote #900

merged 17 commits into from
Jun 15, 2022

Conversation

merveenoyan
Copy link
Contributor

@merveenoyan merveenoyan commented Jun 10, 2022

This PR partially fixes #672.

I observed that the following error pops up when I create a repository using huggingface API create_repo and then have a local repository cloning from the URL using a token (here)
ValueError: No space_sdk provided. create_repo expects space_sdk to be one of ['gradio', 'streamlit', 'static'] when repo_type is 'space'
So I noticed the following block was causing the problem:

if token is not None:
                whoami_info = self.client.whoami(token)
                user = whoami_info["name"]
                valid_organisations = [org["name"] for org in whoami_info["orgs"]]

                if namespace is not None:
                    repo_id = f"{namespace}/{repo_id}"
                repo_url += repo_id

                scheme = urlparse(repo_url).scheme
                repo_url = repo_url.replace(f"{scheme}://", f"{scheme}://user:{token}@")

                if namespace == user or namespace in valid_organisations:
                    self.client.create_repo(
                        repo_id=repo_id,
                        token=token,
                        repo_type=self.repo_type,
                        exist_ok=True,
                        private=self.private,
                    )
            else:
                if namespace is not None:
                    repo_url += f"{namespace}/"
                repo_url += repo_id

Only name being valid shouldn't be enough to create a repository, we have to check if the repository exists on remote so I added a small check on that (thanks @muellerzr for pointing out to particular HfApi function that does it).

With this, space_sdk error should be gone. I'll add test once @LysandreJik approves my logic in here.

Also this is separate but IDK if it's good to have an attribute clone_from and the clone_from() function. I can refactor couple of things I saw there that's not good to me as well.

@merveenoyan merveenoyan requested review from LysandreJik and removed request for LysandreJik June 10, 2022 13:47
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 10, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also tests? :D

try:
_ = HfApi().repo_info(f"{namespace}/{repo_id}")
except HTTPError:
self.client.create_repo(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very confused, why would clone_from create a repo remotely under any circumstances?

Copy link
Contributor Author

@merveenoyan merveenoyan Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because link given to clone_from might not be valid (there might not be a repository in remote under given namespace). Maybe it's better if I put a warning saying the remote repository doesn't exist instead. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's complicated. I really don't think it's appropriate for clone_from to create anything. It should raise an error if the remote repository doesn't exist. Fixing it would be backward incompatible, but I think we can think of it as a bugfix. WDYT @LysandreJik ?

Copy link
Contributor Author

@merveenoyan merveenoyan Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there are many tests that will fail and we'll break backwards compatibility if we change behavior of clone_from to only clone remote repository, I also thought that it was only used to clone existing repository when creating repository on local. pinging @LysandreJik about why the design is that way 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ok apparently you left a comment but I didn't see it bcs I didn't refresh the page lol)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i agree that Repository clone_from shouldn't create a repo)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can't find at the moment the past discussion with a potential explicit Repository.create() -> Repository)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also think that clone_from shouldn't create the repository if it doesn't exist, it isn't really its place to do so. I wouldn't say removing it is a bugfix as when I implemented it I shared it as such, but we can deprecate it for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we merge this and then deprecate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jun 10, 2022

Super weird thing
I can do following (which was my original problem) with no issue.

api = HfApi()

temp_repo_url = api.create_repo(
            repo_id="merve/temp", token=token, 
            repo_type="space", space_sdk="gradio"
        )
repo = Repository("./dummy", clone_from=temp_repo_url,
        use_auth_token=token
        )

But when I run the test I've written (below) I still get the same error.

def test_init_clone_from(self):
        temp_repo_url = self._api.create_repo(
            repo_id=f"{self.REPO_NAME}-temp", token=self._token, 
            repo_type="space", space_sdk="gradio"
        )
        repo = Repository(WORKING_REPO_DIR, clone_from=temp_repo_url,
        use_auth_token=self._token
        )
        self._api.delete_repo(
                repo_id=f"{self.REPO_NAME}-temp", token=self._token
            )

I have other stuff to do so will leave this for monday.

@osanseviero
Copy link
Contributor

"I can do following (which was my original problem) with no issue."
Are you sure you don't have installed from your git branch? This should not work as Repository calls create_repo without the sdk which was the origin of the issue

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jun 10, 2022

@osanseviero yup I'm sure (I uninstalled and installed again)
What I'm testing here is that it should bypass creating the repository remotely given there's a repository remotely already (which we check with HfApi.repo_info()) it shouldn't care about space_sdk at all.

When I test this in a separate script, the repo is created remotely and is cloned well (when we instantiate Repository) that I can see there's README file of space which is the behavior we want here.

@osanseviero
Copy link
Contributor

This seems to be passing the tests now 😄

Copy link
Contributor

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge and deprecate in a separate PR.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Only left a comment regarding space_sdk.

try:
_ = HfApi().repo_info(f"{namespace}/{repo_id}")
except HTTPError:
self.client.create_repo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

token=token,
repo_type=self.repo_type,
exist_ok=True,
private=self.private,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this also need a space_sdk in case the repository in question is a space? i.e., if the repo is a space that doesn't exist, it'll try to create it but fail at doing so.

As we're deprecating, maybe that's intended and we keep that use-case as it is (i.e., not working)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will raise a ValueError in case someone tries to create Space that doesn't exist because otherwise I have to add space_sdk as a class attribute to Repository (we're later deprecating the repository creation in clone_from anyway so there's no point to add that). I'll add two tests, one for model repo and one for Space repo to see if the error is raised. We can merge tomorrow ^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, thank you @merveenoyan!

@merveenoyan
Copy link
Contributor Author

Please do not review I know what the problem with tests is :D

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jun 14, 2022

I added three tests:

  • test_init_clone_from() checks that when space exists it's cloned well
  • test_clone_from_space() makes sure the space is not created if it doesn't exist
  • test_clone_from_model() makes sure that model is created if it doesn't exist.

@merveenoyan
Copy link
Contributor Author

merveenoyan commented Jun 14, 2022

Tests are not passing because moon-landing staging is down 😅

@LysandreJik
Copy link
Member

@merveenoyan it seems you're using an outdated version of black. Could you updated to the one required by huggingface_hub and run the code quality fix? Should be something like the following, from the root of your clone:

pip install -U -e .[quality]
make style
make quality

@SBrandeis, @XciD, would you have why we're getting errors 500 on the test_init_clone_from test? It seems like creating a space using create_repo doesn't work and throws an error 500. Reproducible code example:

temp_repo_url = create_repo(
    repo_id=f"temp",
    token=token,
    repo_type="space",
    space_sdk="streamlit"
)

I tried with both streamlit, gradio. However, static seems to work just fine!

@XciD
Copy link
Member

XciD commented Jun 14, 2022

could this be related to new spaces infra @christophe-rannou

@christophe-rannou
Copy link

@LysandreJik I just created a repo using the CLI with success. Maybe there's an issue with your token ?

@LysandreJik
Copy link
Member

@christophe-rannou the token is that of the CI, we're using the same token for all CI. Were you on the production endpoint or the staging endpoint?

@christophe-rannou
Copy link

Production one, it's the staging one that is not working ?

@XciD
Copy link
Member

XciD commented Jun 14, 2022

Production one, it's the staging one that is not working ?

Seems related to CI, will investigate.

@merveenoyan
Copy link
Contributor Author

This one fails HfApiEndpointsTest.test_create_update_and_delete_space_repo with 500 like it was failing for my PR, see below where the error comes from (due to picking other SDKs for others)

for sdk in SPACES_SDK_TYPES:
            SPACE_REPO_NAME = space_repo_name(sdk)
            self._api.create_repo(
                repo_id=SPACE_REPO_NAME,
                token=self._token,
                repo_type=REPO_TYPE_SPACE,
                space_sdk=sdk,
            )
            res = self._api.update_repo_visibility(
                repo_id=SPACE_REPO_NAME,
                token=self._token,
                private=True,
                repo_type=REPO_TYPE_SPACE,
            )
            self.assertTrue(res["private"])
            res = self._api.update_repo_visibility(
                repo_id=SPACE_REPO_NAME,
                token=self._token,
                private=False,
                repo_type=REPO_TYPE_SPACE,
            )

@LysandreJik
Copy link
Member

LysandreJik commented Jun 15, 2022

This exact test fails on main since #898 was merged, I would offer to skip that test until this is resolved, and merge this PR. Ok for you @merveenoyan ?

cf https://github.com/huggingface/huggingface_hub/runs/6882266625?check_suite_focus=true

@merveenoyan
Copy link
Contributor Author

@LysandreJik sure!

@merveenoyan merveenoyan merged commit c9fb53e into huggingface:main Jun 15, 2022
@osanseviero
Copy link
Contributor

Was this supposed to be merged? As @LysandreJik mentioned, I think we should add the flag/decorator to skip the test so the tests are not failing ( @unittest.SkipTest for example)

@LysandreJik
Copy link
Member

No problem @osanseviero, I opened a PR to fix the problem there: #908

But now we have another problem with our CI because of the new datasets release, so waiting until that is fixed to merge it.

@XciD
Copy link
Member

XciD commented Jun 15, 2022

About the spaces bug, we found it.
Waiting for https://github.com/huggingface/moon-landing/pull/3252/commits/f2e8068348e225ba1f07c5d27a38843253e23dcb to be merged.

@XciD
Copy link
Member

XciD commented Jun 15, 2022

About the spaces bug, we found it. Waiting for huggingface/moon-landing@f2e8068 to be merged.

Fixed

TristanThrush pushed a commit that referenced this pull request Jun 19, 2022
* fix for spaces

* fix for spaces

* removed creating repository and added warning

* revert my changes

* added tests

* removed debugger 😐

* fixed repository removal

* Added tests and error

* import pytest

* fixed tests

* fixed tests

* style

* removed repo removal

* make style

* fixed test with Lysandres patch

* added fix
adrinjalali pushed a commit that referenced this pull request Jun 21, 2022
* added autoeval fields to repocard types

* modified test, linted

* Fix typo (#902)

* Refine 404 errors (#878)

* Add errors

* Style

* Review

* hf_hub_download support

* Errors for hf_hub_download

* Typo

* Handle 401 error

* Tests for 401 error

* Typo

* Review comments

* added documentation for the repocard change in repocard, updated metadata_eval_result, added documentation

* 🏗 Use `hub-ci` for tests (#898)

* 🏗 Use `hub-ci` for tests

cc @XciD

* 🩹 Also update URL for staging mode

* ✅ 401 is raised when the user is not authenticated

* 🗑 Deprecare `identical_ok`

* Longer deprecation period

* ✅ Fix the last failing test

* Warning match docstring

* FIX Avoid creating repository when it exists on remote (#900)

* fix for spaces

* fix for spaces

* removed creating repository and added warning

* revert my changes

* added tests

* removed debugger 😐

* fixed repository removal

* Added tests and error

* import pytest

* fixed tests

* fixed tests

* style

* removed repo removal

* make style

* fixed test with Lysandres patch

* added fix

* Remove deprecations (#910)

* Remove deprecations

* Typo

* Update src/huggingface_hub/README.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add request ID to all requests (#909)

* Add request ID to all requests

* Typo

* Typo

* Review comments

* Update src/huggingface_hub/utils/_errors.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Invert deprecation for create_repo (#912)

* Invert deprecation for create_repo

* Set to 0.10

* Constant was accidentally removed during deprecation transition (#913)

Co-authored-by: Lyle Nel <lyle@owlin.com>

* Update src/huggingface_hub/repocard_types.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update repocard_types.py

* Update repocard_types.py

* reorded metadata_eval_request_docs, added metrics_config and metrics_verified to tests

Co-authored-by: lsb <leebutterman@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Simon Brandeis <33657802+SBrandeis@users.noreply.github.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lyle Nel <lyle-nel@users.noreply.github.com>
Co-authored-by: Lyle Nel <lyle@owlin.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repository does not work with Spaces
8 participants