Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some scalar clones are putting GVFS packs in local object dir instead of shared object dir #645

Closed
2 tasks done
derrickstolee opened this issue Apr 29, 2024 · 6 comments · Fixed by #660
Closed
2 tasks done

Comments

@derrickstolee
Copy link
Collaborator

derrickstolee commented Apr 29, 2024

Setup

  • Which version of microsoft/git are you using? Is it 32-bit or 64-bit?
$ scalar diagnose
Collecting diagnostic info

git version 2.44.0.vfs.0.2
cpu: x86_64
built from commit: 105fa1d4cd50575fc1e499d0e4159a5e5b62bf7d
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
Repository root: C:/mono_scalar/src
Cache Server: https://wbp-adoprx.microsoft.engineering/ba47400f-eebe-4850-a8bf-ef694ef81414
Local Cache: c:\.scalarCache/id_ba47400f-eebe-4850-a8bf-ef694ef81414

Available space on 'C:/': 33.43 GiB

Are you using Scalar or VFS for Git?

Scalar.

  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?

Windows Server 2022. 64-bit.

  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

The machines with this issue are self-hosted ADO build agents intended to run pipeline tasks that require a 1JS Git enlistment.

Details

On these machines, running scalar clone during image creation will cause the prefetch packfiles and GVFS protocol blob packs to be installed in the local .git/objects/pack directory instead of the shared object cache. The gvfs.sharedCache config option is set in .git/config and the folder it points to exists.

(There was some time where I considered that alt_odb_usable() was not registering the cache directory as being valid, but all evidence points to implying that Git can create that directory, so it shouldn't be blocked by something like that.)

  • If the problem was occurring with a specific repository, can you specify
    the repository?

    • Office Monorepo
@AtOMiCNebula
Copy link
Member

The agents in question are actually OMR-based (and technically Server 2022 instead of Win11). We have 1JS agents too, but haven't dug in there because they aren't impacted like our OMR agents are.

@AtOMiCNebula
Copy link
Member

I figured this one out. When running scalar clone initially, we clone by specifying a path like c:\foo, with the problem itself being the lowercase c. The lowercase c persists itself all the way through into the repo's config:

[gvfs]
  sharedCache = c:\\.scalarCache/id_ba47400f-eebe-4850-a8bf-ef694ef81414

This becomes problematic in this code, which uses strcmp instead of something like stricmp:

git/gvfs-helper-client.c

Lines 326 to 331 in b68812e

for (odb = the_repository->objects->odb->next; odb; odb = odb->next) {
if (!strcmp(odb->path, gvfs_shared_cache_pathname.buf)) {
gh_client__chosen_odb = odb;
return;
}
}

odb->path above uses C: instead of c:, the comparison doesn't match, and it falls back to .git/objects.

@dscho
Copy link
Member

dscho commented May 10, 2024

@AtOMiCNebula great find! This strcmp() should be replaced with fspathcmp(). Would you mind opening a PR that makes it so?

@derrickstolee
Copy link
Collaborator Author

One thing @AtOMiCNebula pointed out is that we want to make sure that users in this state are getting into a better state when the config is parsed correctly.

A possible direction to be as safe as possible is to update the incremental-repack maintenance step to move objects from .git/objects into the shared object cache. Move packs wholesale to keep their prefetch-<timestamp>... names. This could be a helpful feature, anyway, since the local object store isn't being repacked in Scalar clones with a shared object cache.

@derrickstolee
Copy link
Collaborator Author

I created #660 to fix this.

dscho pushed a commit that referenced this issue Jun 21, 2024
Resolves #645.

When on Windows, these paths may differ only by case in the config but
also correspond to the same paths on disk. Use fspathcmp() instead.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
dscho added a commit that referenced this issue Jun 21, 2024
…660)

Resolves #645.

When on Windows, these paths may differ only by case in the config but
also correspond to the same paths on disk. Use fspathcmp() instead.

---

* [X] This change only applies to interactions with Azure DevOps and the
      GVFS Protocol.
@dscho
Copy link
Member

dscho commented Jun 21, 2024

A possible direction to be as safe as possible is to update the incremental-repack maintenance step to move objects from .git/objects into the shared object cache.

My major concern about this strategy is that .git/objects can very well contain objects that have not originated from the remote repository, and one single ill-considered re-creation of the shared object cache might very well wipe out objects that will now be irretrievably lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants