Verify commit-graph and multi-pack-index after writing #658

derrickstolee · 2019-01-02T20:55:42Z

Run the git commit-graph verify and git multi-pack-index verify commands after any command that would change those files. If these fail, then delete the corrupt file and rewrite.

We've had issues with users data in the past, and this gives us a way to automatically detect and repair these scenarios. The immediate rewrite should work since we will regenerate from the other Git data. Issues we've seen in the past are related to trusting the content in these files and carrying that data forward to future versions of the file.

derrickstolee · 2019-01-02T22:11:45Z

@jrbriggs pointed out that this would be better to do after every write to these files. Limits the window where there could be problems.

jamill · 2019-01-03T05:41:22Z

@jrbriggs pointed out that this would be better to do after every write to these files. Limits the window where there could be problems.

I like this suggestion. It also keeps our maintenance tasks more deterministic, in that it will run in response to generating these files, instead of at a different set of (seemingly random) times.

Do we have any ideas on what is causing the corruption? For instance, are we writing out corrupt files or, or is something else corrupting them after we generate them? If we are generating corrupt files, then this approach should let us identify it right away. If something else is corrupting them, then we might not pick it up as quickly...

derrickstolee · 2019-01-03T13:10:16Z

Do we have any ideas on what is causing the corruption? For instance, are we writing out corrupt files or, or is something else corrupting them after we generate them? If we are generating corrupt files, then this approach should let us identify it right away. If something else is corrupting them, then we might not pick it up as quickly...

In the cases where I can get in contact with the user, the only ways the file could have been corrupted were on-disk bit flips or in-memory bit flips. We literally were validating the data right before sending it (via a buffer) to disk. Those users that ran memcheck found issues with their RAM.

The perf machine recently had an issue that was caught by the git multi-pack-index repack command, but the problem may have existed much earlier. I'm not sure what caused it, but we could have noticed right away had we verified right afterward.

jrbriggs · 2019-01-03T13:33:02Z

If corruption occurs strictly after a write (e.g., midx verify would have passed immediately after the write, but would have failed a day later), I don't think it's within the scope of this project to fix. If we can detect corrupt data was written (e.g., a bit flip in memory was persisted to disk), we should mitigate.

GVFS/GVFS.Common/Git/GitProcess.cs

jeschu1 · 2019-01-03T15:09:51Z

@derrickstolee Is it safe to assume the verify / write should be quick tasks?

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs

jrbriggs

Naming nits, and a question about the pattern.

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs

jrbriggs · 2019-01-03T15:10:56Z

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs

-                this.GetPackFilesInfo(out int afterCount, out long afterSize, out hasKeep);
+                this.GetPackFilesInfo(out int afterCount, out long afterSize, out hasKeep);
+
+                GitProcess.Result verifyResult2 = this.RunGitCommand((process) => process.VerifyMultiPackIndex(this.Context.Enlistment.GitObjectsRoot));


verifyAfterRepackResult?

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs

jrbriggs · 2019-01-03T15:13:29Z

GVFS/GVFS.Common/Maintenance/PostFetchStep.cs

+                    metadata["TryDeleteFileResult"] = this.Context.FileSystem.TryDeleteFile(multiPackIndexPath);
+                    activity.RelatedError(metadata, "multi-pack-index is corrupt after write. Deleting and rewriting.");
+
+                    this.RunGitCommand((process) => process.WriteMultiPackIndex(this.Context.Enlistment.GitObjectsRoot));


Same comment about writing the midx without validating. This would still leave enlistments broken if it was corrupted. Deletion without rewrite would leave the enlistment functional, but slow. What are your thoughts on how we should handle this?

What if we were a big more aggressive and had WriteMultiPackIndex do the verification / retry. This way we can be sure the Write/Verify are always paired together.

I agree it seems like it would be good to know if the rewrite was successful. Would it make sense to make the Verification a warning if we succeed on the second try?

I guess the primary question to answer is if we think regenerating the file will fix the corruption?

The examples we've seen all have the problem that the multi-pack-index or commit-graph files were corrupted, and the problem persisted only because we continued to trust that data when we write a new one. Deleting the file and rewriting (based on the .pack and .idx files) has always fixed the issue.

If the rewrite did not fix the issue, then we would see failures from the same enlistment multiple times in a row. I plan to set alerts for any user hitting this even once, so we would be ready to intervene if this happens repeatedly.

Makes sense. So we expect the validate/rewrite cycle to fix the issue in all cases we have seen so far? With monitors in place, I'm happy with this pattern.

Yes, this will automatically fix all cases we have seen so far.

GVFS/GVFS.UnitTests/Maintenance/PostFetchStepTests.cs

derrickstolee · 2019-01-03T16:03:11Z

@derrickstolee Is it safe to assume the verify / write should be quick tasks?

They take about as long as it would take to write them from scratch. I've measured 11-16 seconds.

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs

GVFS/GVFS.Common/Maintenance/PostFetchStep.cs

derrickstolee requested a review from jeschu1 January 2, 2019 20:55

derrickstolee force-pushed the verify-maintenance branch 2 times, most recently from 7a22c4c to cd2f2fe Compare January 2, 2019 21:17

derrickstolee added 3 commits January 3, 2019 09:22

Verify files after writing them

8e01fbd

Verify and rewrite during PostFetchStep

7942b9e

Verify and rewrite during PackfileMaintenanceStep

0772173

derrickstolee force-pushed the verify-maintenance branch from cd2f2fe to 0772173 Compare January 3, 2019 14:23

derrickstolee changed the title ~~[RFC] Add VerifyObjectStoreStep~~ [RFC] Verify commit-graph and multi-pack-index after writing Jan 3, 2019

derrickstolee commented Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Git/GitProcess.cs Outdated Show resolved Hide resolved

jeschu1 reviewed Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs Show resolved Hide resolved

jrbriggs suggested changes Jan 3, 2019

View reviewed changes

jeschu1 reviewed Jan 3, 2019

View reviewed changes

GVFS/GVFS.UnitTests/Maintenance/PostFetchStepTests.cs Show resolved Hide resolved

Clean up PackfileMaintenanceStep around verify results

fcd7ec9

jrbriggs reviewed Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs Outdated Show resolved Hide resolved

jrbriggs approved these changes Jan 3, 2019

View reviewed changes

jeschu1 reviewed Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs Outdated Show resolved Hide resolved

jeschu1 suggested changes Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs Outdated Show resolved Hide resolved

jeschu1 suggested changes Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PackfileMaintenanceStep.cs Outdated Show resolved Hide resolved

jeschu1 reviewed Jan 3, 2019

View reviewed changes

GVFS/GVFS.Common/Maintenance/PostFetchStep.cs Outdated Show resolved Hide resolved

jeschu1 approved these changes Jan 3, 2019

View reviewed changes

Move rewrite methods to GitMaintenanceStep

e698810

derrickstolee force-pushed the verify-maintenance branch from ed8d7d9 to e698810 Compare January 3, 2019 17:48

derrickstolee changed the title ~~[RFC] Verify commit-graph and multi-pack-index after writing~~ Verify commit-graph and multi-pack-index after writing Jan 3, 2019

derrickstolee merged commit a8e17e0 into microsoft:master Jan 3, 2019

jrbriggs added this to the S147 milestone Feb 7, 2019

jrbriggs added the affects: reliability label Feb 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify commit-graph and multi-pack-index after writing #658

Verify commit-graph and multi-pack-index after writing #658

derrickstolee commented Jan 2, 2019 •

edited

Loading

derrickstolee commented Jan 2, 2019

jamill commented Jan 3, 2019

derrickstolee commented Jan 3, 2019

jrbriggs commented Jan 3, 2019

jeschu1 commented Jan 3, 2019

jrbriggs left a comment

jrbriggs Jan 3, 2019

jrbriggs Jan 3, 2019

jeschu1 Jan 3, 2019

jrbriggs Jan 3, 2019

derrickstolee Jan 3, 2019

jrbriggs Jan 3, 2019

derrickstolee Jan 3, 2019

derrickstolee commented Jan 3, 2019

Verify commit-graph and multi-pack-index after writing #658

Verify commit-graph and multi-pack-index after writing #658

Conversation

derrickstolee commented Jan 2, 2019 • edited Loading

derrickstolee commented Jan 2, 2019

jamill commented Jan 3, 2019

derrickstolee commented Jan 3, 2019

jrbriggs commented Jan 3, 2019

jeschu1 commented Jan 3, 2019

jrbriggs left a comment

Choose a reason for hiding this comment

jrbriggs Jan 3, 2019

Choose a reason for hiding this comment

jrbriggs Jan 3, 2019

Choose a reason for hiding this comment

jeschu1 Jan 3, 2019

Choose a reason for hiding this comment

jrbriggs Jan 3, 2019

Choose a reason for hiding this comment

derrickstolee Jan 3, 2019

Choose a reason for hiding this comment

jrbriggs Jan 3, 2019

Choose a reason for hiding this comment

derrickstolee Jan 3, 2019

Choose a reason for hiding this comment

derrickstolee commented Jan 3, 2019

derrickstolee commented Jan 2, 2019 •

edited

Loading