Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make deleted Dandisets private #57

Merged
merged 1 commit into from
Aug 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 35 additions & 13 deletions src/backups2datalad/datasetter.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

import anyio
from anyio.abc import AsyncResource
from dandi.consts import EmbargoStatus, dandiset_metadata_file
from dandi.consts import DANDISET_ID_REGEX, EmbargoStatus, dandiset_metadata_file
from dandi.exceptions import NotFoundError
from datalad.api import clone
from ghrepo import GHRepo
Expand Down Expand Up @@ -90,25 +90,47 @@
if d.embargo_status is EmbargoStatus.OPEN
else "private"
)
if not dandiset_ids and self.config.gh_org is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will delete only when ran on all, not specific dandisets . Should be fine for the majority of the use cases but I am afraid it might cause confusion since I will forget such peculiarity and might need/try to run on specific one(s). Is it too convoluted to make it also work if dandiset_ids were provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it too convoluted to make it also work if dandiset_ids were provided?

In that case, what deletions would you expect to be detected? Do you want all recently-deleted Dandisets to be marked private or only those that are included in the given list of IDs? For the latter option, note that requesting a backup of a nonexistent or deleted Dandiset currently results in an error, and I currently don't see a decent way to turn that error into some "Dandiset was deleted" marker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about "latter".

interesting. Ok -- so if we error upon explicit specification of a deleted dataset, then I guess it is ok as now. Thank you!

extant = {d.identifier for d, _ in report.results}
extant.update(d.identifier for d in report.failed)
for sub_info in await superds.get_subdatasets(result_xfm="relpaths"):
d = sub_info["path"]

Check warning on line 97 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L97

Added line #L97 was not covered by tests
if (
re.fullmatch(DANDISET_ID_REGEX, d)
and d not in extant
and (exclude is None or not exclude.search(d))
and sub_info.get("gitmodule_github-access-status", "public")
== "public"
):
log.info(

Check warning on line 105 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L105

Added line #L105 was not covered by tests
"Dandiset %s has been deleted; making GitHub backup private",
d,
)
await self.manager.edit_github_repo(

Check warning on line 109 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L109

Added line #L109 was not covered by tests
GHRepo(self.config.gh_org, d),
private=True,
)
access_status[d] = "private"

Check warning on line 113 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L113

Added line #L113 was not covered by tests
if to_save:
log.debug("Committing superdataset")
superds.assert_no_duplicates_in_gitmodules()
msg = await self.get_superds_commit_message(superds, to_save)
await superds.save(message=msg, path=to_save)
if access_status:
for did, access in access_status.items():
await superds.set_repo_config(
f"submodule.{did}.github-access-status",
access,
file=".gitmodules",
)
await superds.commit_if_changed(
"[backups2datalad] Update github-access-status keys in .gitmodules",
paths=[".gitmodules"],
check_dirty=False,
)
superds.assert_no_duplicates_in_gitmodules()
log.debug("Superdataset committed")
if access_status:
log.debug("Ensuring github-access-status in .gitmodules is up-to-date")

Check warning on line 122 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L122

Added line #L122 was not covered by tests
for did, access in access_status.items():
await superds.set_repo_config(

Check warning on line 124 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L124

Added line #L124 was not covered by tests
f"submodule.{did}.github-access-status",
access,
file=".gitmodules",
)
await superds.commit_if_changed(

Check warning on line 129 in src/backups2datalad/datasetter.py

View check run for this annotation

Codecov / codecov/patch

src/backups2datalad/datasetter.py#L129

Added line #L129 was not covered by tests
"[backups2datalad] Update github-access-status keys in .gitmodules",
paths=[".gitmodules"],
check_dirty=False,
)
if report.failed:
raise RuntimeError(
f"Backups for {quantify(len(report.failed), 'Dandiset')} failed"
Expand Down