Skip to content

Conversation

@derrickstolee
Copy link

This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data.

This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows:

  1. Construct a list of midx_info structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add. Keep track of the orig_pack_int_id for these packs.

  2. Construct the list of object entries. The pack_int_id we use here corresponds to the orig_pack_int_id for the pack we are using.

  3. Sort the packs by name.

  4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order.

  5. Determine the new_pack_int_id for each struct midx_info by tracking how many are dropped by that point in the list.

  6. Construct a new permutation array that maps from orig_pack_int_id to new_pack_int_id. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value.

  7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary.

  8. Write the midx as usual, tracking that we have packs.nr - drop_count packs to write.

  9. When writing the object offsets, use packs.perm to translate from the orig_pack_int_id to new_pack_int_id.

While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in microsoft/git:master.

Copy link

@jeffhostetler jeffhostetler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee derrickstolee merged commit ab10b47 into microsoft:vfs-2.20.1 Jan 4, 2019
derrickstolee added a commit to microsoft/VFSForGit that referenced this pull request Jan 7, 2019
… fix and trace2 v3

* [PR 107 in microsoft/git](microsoft/git#107) includes a fix for the way we permute the set of packs during a `git multi-pack-index expire` command. Specifically, this fixes an issue when we expire packs and add packs to the multi-pack-index at the same time (and the order of those packs).

* [PR 100 in microsoft/git](microsoft/git#100) includes updates to trace2.
dscho pushed a commit that referenced this pull request Feb 27, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data.

This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows:

1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add.  Keep track of the `orig_pack_int_id` for these packs.

2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using.

3. Sort the packs by name.

4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order.

5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list.

6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value.

7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary.

8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write.

9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`.

While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho pushed a commit that referenced this pull request Mar 29, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data.

This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows:

1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add.  Keep track of the `orig_pack_int_id` for these packs.

2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using.

3. Sort the packs by name.

4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order.

5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list.

6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value.

7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary.

8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write.

9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`.

While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho pushed a commit that referenced this pull request May 25, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data.

This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows:

1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add.  Keep track of the `orig_pack_int_id` for these packs.

2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using.

3. Sort the packs by name.

4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order.

5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list.

6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value.

7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary.

8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write.

9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`.

While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
dscho pushed a commit that referenced this pull request May 27, 2019
This replaces #106. That PR was complicated and difficult to understand because we didn't use structured data, but instead relied on our simple arrays and overloaded that data.

This is a bigger change, but results in code that is (hopefully) easier to understand. The new flow for writing a multi-pack-index is as follows:

1. Construct a list of `midx_info` structs that contain the details of the packs. This list starts with the packs in the existing midx, followed by the new packs to add.  Keep track of the `orig_pack_int_id` for these packs.

2. Construct the list of object entries. The `pack_int_id` we use here corresponds to the `orig_pack_int_id` for the pack we are using.

3. Sort the packs by name.

4. If we have packs to drop, identify where they are in the list of packs. We can use the sorted nature of the list to know we will find them in the correct order.

5. Determine the `new_pack_int_id` for each `struct midx_info` by tracking how many are dropped by that point in the list.

6. Construct a new permutation array that maps from `orig_pack_int_id` to `new_pack_int_id`. If the pack is expired, then the value used here is invalid and will error if any object tries to use that value.

7. Count the length of the pack names we will write, and modify the length to be properly aligned if necessary.

8. Write the midx as usual, tracking that we have `packs.nr - drop_count` packs to write.

9. When writing the object offsets, use `packs.perm` to translate from the `orig_pack_int_id` to `new_pack_int_id`.

While this PR is just one giant commit, I will peel parts across multiple commits for upstream. These will be interleaved with the commits already in `microsoft/git:master`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants