-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-5758 vos: Add garbage collection at container level #4997
Conversation
Since DTX entries can still reference objects that are being deletect by vos_obj_delete, we need to pass a container handle when removing them so that any DTX references can be removed upon deletion. This change adds a container level garbage collection heap so we can meet this requirement. When a container is destroyed, the collection of that container will move any outstanding items to the pool level heap. Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
d2328fa
to
4130002
Compare
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4997/1/execution/node/72/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
added also are added to container heaps Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test with memcheck completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4997/3/execution/node/896/log |
the bag Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
an existing container if there are outstanding items to be collected. Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Also add a test for close and open container Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4997/7/execution/node/924/log |
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
do not go to container lists I believe the PR that ensures no orphaned subtrees Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4997/9/execution/node/909/log |
Test stage Unit Test with memcheck completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-4997/9/execution/node/910/log |
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Failure is due to DAOS-7016 |
* the list and give another container a turn next time. | ||
*/ | ||
cont = d_list_pop_entry(&pool->vp_gc_cont, struct vos_container, | ||
vc_gc_link); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a container is destroyed? The vos_container will be freed but still linked?
Instead of passing 'coh' to GC, Is there any way to make DTX to skip the entries belong to deleted objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do d_list_del in cont_free (when last reference is dropped). In such case, the entries in the container can't be aggregated until the container is destroyed or reopened. When it's destroyed, the entries will be moved to the pool bins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good to me in general, few minor questions.
Changing GC to solve the problem is ok to me, but looks not the most efficient way, since DTX will have to do commit for deleted object (before GC reclaiming them)? I think a better way is to have DTX taking proper actions on the entries belongs to deleted object. Let's see if @Nasf-Fan have any good idea.
@@ -611,6 +618,8 @@ discard_1(void **state) | |||
ds.td_agg_epr.epr_lo, ds.td_iod_size); | |||
aggregate_basic(arg, &ds, 0, NULL); | |||
} | |||
|
|||
cleanup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the purpose of this call for each discard/aggregation test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to assert that the aggregation entries get added to the pool heap. Running the garbage collector after each aggregation test allows me to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but most aggregation/discard test cases won't add anything to GC. Maybe only object/key punch involved tests adds things to GC.
* the list and give another container a turn next time. | ||
*/ | ||
cont = d_list_pop_entry(&pool->vp_gc_cont, struct vos_container, | ||
vc_gc_link); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
@@ -459,9 +459,6 @@ vos_obj_delete(daos_handle_t coh, daos_unit_oid_t oid) | |||
rc = umem_tx_end(umm, rc); | |||
if (rc) | |||
goto out; | |||
|
|||
/* NB: noop for full-stack mode */ | |||
gc_wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing purposes mainly. It's only for unit tests and I wanted the gc tests to actually test garbage collection after the fact and not having that gives me more control over when it runs. It really should have no effect on anything but unit tests since only reintegration uses these APIs and it's a noop in that case.
I think it could be tricky to do. Adding it to the garbage collector doesn't impose any ordering requirements for commit. The reason for adding the container to the callback is so that the transaction entries get removed when the object/key/value gets removed by the garbage collector. If commit happens first, it will do a little extra work and save the garbage collector a little work. If the delete happens first, commit will have less stuff in its list of things to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some leak issues in the tests.
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Sorry but we cannot pass container "handle" to GC functions, the GC bins & bags are stored in PMEM so GC can proceed even after server shutdown/crash. Container handle is an ephemeral address in DRAM, so it wouldn't work anymore after reboot. Can we just pass vos_cont_df into GC? It also means that we don't need an extra parameter for: The whole intention of having the "args" is for this kind of requirement. |
The DTX entry index is in DRAM, please check lrua_lookupx(), we can find the DTX entry only when we have the container handle in DRAM. The vos_cont_df does NOT work for such purpose. |
Right. I had a chat with Liang last night after he posted the comment. The container handle is only used to
The container handle itself isn't stored anywhere and when a container is closed and reopened, the new handle will be tracked if there are outstanding objects to be collected. |
Since DTX entries can still reference objects that are being deleted by vos_obj_delete, we need to pass a container handle when removing them so that any DTX references can be removed upon deletion. This change adds a container level garbage collection heap so we can meet this requirement. When a container is destroyed, the collection of that container will move any outstanding items to the pool level heap. Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Since DTX entries can still reference objects that are being deleted by vos_obj_delete, we need to pass a container handle when removing them so that any DTX references can be removed upon deletion. This change adds a container level garbage collection heap so we can meet this requirement. When a container is destroyed, the collection of that container will move any outstanding items to the pool level heap. Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Since DTX entries can still reference objects that are
being deleted by vos_obj_delete, we need to pass
a container handle when removing them so that any
DTX references can be removed upon deletion.
This change adds a container level garbage collection heap
so we can meet this requirement. When a container is
destroyed, the collection of that container will move
any outstanding items to the pool level heap.
Signed-off-by: Jeff Olivier jeffrey.v.olivier@intel.com