Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] improve DataHUB sync UX #309

Open
Brilator opened this issue Dec 4, 2024 · 4 comments
Open

[Feature Request] improve DataHUB sync UX #309

Brilator opened this issue Dec 4, 2024 · 4 comments
Assignees
Labels
Type: Bug Something is not working, and it is confirmed by maintainers to be a bug.

Comments

@Brilator
Copy link
Member

Brilator commented Dec 4, 2024

From our recent interactions with users working on their ARCs, the major challenges keep being related to Git and Git-LFS.
Especially when sync times are long, users tend to interrupt (and later restart) the processes. Especially when ARCitect crashes completely, which sometimes happens with very many changes pushed at once.

There's also some general file handling in a git repository, which coders usually know to avoid, but which is not fluent for people new to or not knowing they're using git.

Consequently, troubleshooting and really guessing how and why people run into (git) errors is frustratingly complicated.

My approaches towards solving include:

  • asking the users to start the ARC completely from scratch
  • asking the users to remove the .git; init and try uploading again
  • re-directing users to ARC commander (which again needs arc and git installed)
  • oftentimes for large repos: completely taking over for the users, basically doing it via git push taking care of lfs by hand
    • in this case they have to give me access to their machine / server + to their ARC in the Hub

Those are always "quick" fixes that (a) are frustrating for the users and (b) require too much 1:1 support, i.e. will not be sustainable for more users.

I'm wondering how we can improve this, e.g.

  1. It might help to collect / log and timestamp the git stdout / stderr somewhere to help understand, what users did
  2. "Expectation management", i.e. give an approximate feedback how long a push might take based on files size + bandwidth.
  3. Run (large) git push processes in the background (e.g. over night). Although this might also create more confusions, we should be save, since a new commit (parallel git process on same ARC) cannot be executed.
  4. Better align ARCitect and ARC commander. E.g. a default new branch with access token created in ARCitect is called git.nfdi4plants.org, while ARC commander by default pushes to origin. So switching between the systems is not fluent.
@github-actions github-actions bot added the Status: Needs Triage This item is up for investigation. label Dec 4, 2024
@Brilator
Copy link
Member Author

Brilator commented Dec 4, 2024

To get some hands-on experience, I'd strongly recommend to do what some users simply do:

  1. Preferably on Windows: seems to create some extra git burdens
  2. Create a large (>1TB) ARC
  3. Add very many files: amount of files not only size seems to matter
  4. Add files with cryptic, special character file names
  5. Try to commit and push in one go (one commit, one sync) via ARCitect

@ZimmerD
Copy link

ZimmerD commented Dec 5, 2024

I can second this. While the behavior on small changes (a low number of changed files) seems to be quite stable, I frequently encounter problems when working with a high number of changes, ranging between approximately 500 and 10,000 changed files.

To illustrate one of the issues:
I am currently trying to commit such an ARC, and while the commit seemingly finished (the dialog was closable via the OK button), I still have high disk usage by my ARCitect instance:
image

and cannot push to the hub. In the commit dialog, the button "Abort merge" showed up, though its implication is obvious to me:
image

@JonasLukasczyk JonasLukasczyk self-assigned this Dec 9, 2024
@JonasLukasczyk JonasLukasczyk added Type: Bug Something is not working, and it is confirmed by maintainers to be a bug. and removed Status: Needs Triage This item is up for investigation. labels Dec 9, 2024
@JonasLukasczyk
Copy link
Collaborator

The best way to debug such issues is to run the git commands manually and see where the actual bottlenecks appear. Is there already a similar ARC for which I can reproduce this behavior?

I also don't understand why git seems to think that there is a merge process happening.

@ZimmerD
Copy link

ZimmerD commented Dec 9, 2024

I just invited you to a private ARC which I could not commit and push using the ARCitect (see screenshots above). However, I was successfully able to commit and push my changes with the ARCommander.
I believe you should be able to reproduce the behaviour locally after pulling and soft resetting the last commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something is not working, and it is confirmed by maintainers to be a bug.
Projects
Status: No status
Development

No branches or pull requests

3 participants