Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-6967 vos: Data loss on updates after uncommitted punch #4899

Merged
merged 17 commits into from
Mar 17, 2021

Conversation

jolivier23
Copy link
Contributor

@jolivier23 jolivier23 commented Mar 6, 2021

The main issue we are trying to prevent is removal of a subtree that isn't empty.
This typically means that we have a bug in aggregation or in the incarnation log

This patch does the following:

  1. Modifies the btree code so that it returns aborted entries during aggregation.
  2. Modify ilog_aggregate so it takes a minor epoch into account. Previously, if we had a punch of a parent, it would remove the child at the same major epoch even if it had updates afterward.
  3. Modify the incarnation log to ensure we take the punch of a parent into account when doing an update
    4.Modify aggregation so that we skip aggregation of parent objects if a child iterator hits -DER_TX_BUSY. This avoids removing orphaned subtrees.
  4. Aggregation of the incarnation log was returning -DER_TX_BUSY for entries that are after the aggregated range. Moved the check to after the range check.
  5. Modify evtree aggregation to restart the current tree in case of removing an aborted entry rather than returning -DER_TX_BUSY which would cause upper layer to abort too.
  6. Fixed a few other issues with aggregation.

Signed-off-by: Jeff Olivier jeffrey.v.olivier@intel.com

If a punch of a parent object/key is uncommited, a new incarnation
log entry needs to be inserted.   This reguressed with the addition
of minor epochs.
Added a regression test that exercises the problematic case.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
@jolivier23 jolivier23 requested a review from Nasf-Fan March 6, 2021 03:56
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

src/vos/vos_io.c Outdated Show resolved Hide resolved
src/vos/vos_obj.c Outdated Show resolved Hide resolved
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
@jolivier23 jolivier23 requested a review from ryon-jensen March 8, 2021 17:31
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

ryon-jensen
ryon-jensen previously approved these changes Mar 8, 2021
NiuYawei
NiuYawei previously approved these changes Mar 9, 2021
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

* Modify aggregate to bypass aggregation if child tree was aborted
* Fix an issue with incarnation log.  We don't need to abort
  aggregation if the uncommitted entry is after the aggregation period
  so move the check.

Note, there is an issue with single value still because it doesn't
abort the aggregation under such cases so this is a WIP

Skip-test: true

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
@jolivier23 jolivier23 dismissed stale reviews from NiuYawei and ryon-jensen via 318aaf1 March 11, 2021 00:43
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

btr_check_availability for single value but I think this is
best left for 2.0
Modify the reset to also reset the ta_reprobe flags

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@jolivier23
Copy link
Contributor Author

Note, I have ideas for lots more VOS level testing....but it doesn't seem practical for a 1.2 patch at this point

ryon-jensen
ryon-jensen previously approved these changes Mar 12, 2021
@jolivier23
Copy link
Contributor Author

Failure is due to DAOS-7016

@@ -3658,6 +3665,10 @@ dbtree_iter_delete(daos_handle_t ih, void *args)
if (rc != 0)
return rc;

/** The probed entry is not committed so return -DER_TX_BUSY */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why an aborted entry can't be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This needs to be for in-progress entries. We need another way to determine if an entry is aborted. I don't think the dtx_status stuff is working for single value

@@ -409,6 +413,16 @@ vos_agg_sv(daos_handle_t ih, vos_iter_entry_t *entry,

delete:
rc = agg_del_entry(ih, agg_param->ap_umm, entry, acts);
if (rc == -DER_TX_BUSY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the uncommitted check done above (by checking the entry->ie_dtx_state)?

@@ -180,7 +183,8 @@ agg_del_entry(daos_handle_t ih, struct umem_instance *umm,
rc = umem_tx_commit(umm);

if (rc) {
D_ERROR("Failed to delete entry: "DF_RC"\n", DP_RC(rc));
D_CDEBUG(rc == -DER_TX_BUSY, DB_TRACE, DLOG_ERR,
"Failed to delete entry: "DF_RC"\n", DP_RC(rc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think current aggregation can ensure not deleting prepared entries, no?

* moving forward.
* moving forward. We do, however, need to ensure we do not
* aggregate anything in the parent path. Otherwise, we could
* orphan the current entry due to incarnation log semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see the "incarnation log semantics" here, could you explain a bit?

I don't see how this "ap_skip_obj/dkey/akey" works as well, the intention is to set the skip flag for current epoch range aggregation? Or for all epoch ranges aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with you offline but for everyone's benefit, the issue is that the incarnation log doesn't contain enough information to know at the key level that there are in-progress updates that we may orphan by aggregating the ilog. The example I used is this one

  1. committed update at 1
  2. in progress update at 2 - this will not generate an incarnation log entry
  3. punch at 3

From the ilog perspective, we can delete everything but the subtree will not be empty due to the inprogress update.

may need re-probe.   Evtree reprobe does the sort over
again anyway.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-4899/7/display/redirect

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

break;
D_DEBUG(DB_EPC, "Hit uncommitted single value at epoch:"
DF_X64"\n", entry->ie_epoch);
return -DER_TX_BUSY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments above need be revised, the original design is to skip the uncommitted epoch and continue on the next lower epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will fix that.

if (acts & VOS_ITER_CB_RESTART) {
daos_anchor_set_zero(anchor);
probe_anchor = NULL;
goto probe;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply re-probe won't work for sorted evtree iteration, you need to re-prepare the sorted entries on restart, since the deletion before restart have invalidated the visibility of evtree entries.

I think a simpler way is to rely on next round of aggregation to aggregate the same epoch range again. (we just need to skip updating HAE when aggregation aborted on hitting aborted or uncommitted entries.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to re-prepare. For sorted iterator, probe is what re-reads all the entries and sorts them so it works to simply re-probe in this case. See evt_iter_probe_sorted. I suppose it may be a bit weird for upper levels but it's only really used by evtree so maybe it's fine. I originally did re-prepare but this actually requires a reprobe of the parent iterator and is more complicated.

We could do as you say and just abort but what is the advantage of doing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was thinking reprobe won't re-read all entries.

@jolivier23 jolivier23 requested a review from NiuYawei March 16, 2021 02:42
Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

NiuYawei
NiuYawei previously approved these changes Mar 16, 2021
ryon-jensen
ryon-jensen previously approved these changes Mar 16, 2021
return true;

return minor_epc <= agg_arg->aa_punched_minor;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (entry->ie_id.id_punch_minor_eph > entry->ie_id.id_update_minor_eph &&
entry->ie_id.id_punch_minor_eph > agg_arg->aa_punched_minor &&
entry->ie_id.id_epoch == agg_arg->aa_punched)

Then it is a punched entry or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entry is punched (e.g. punch_minor > update_minor) then we will determine later that the entry is punched. But this check is only about whether it is punched entirely by the parent.

@@ -683,11 +681,18 @@ vos_iterate_internal(vos_iter_param_t *param, vos_iter_type_t type,
if (acts & VOS_ITER_CB_ABORT)
break;

if (acts & VOS_ITER_CB_RESTART) {
daos_anchor_set_zero(anchor);
probe_anchor = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set "probe_anchor" as NULL may cause some potential trouble in our current implementation, please check evt_iter_probe_sorted(), it assumes that the input parameter "rect" and "anchor" will not be both NULL. But that is not true in your patch. I am not quite sure for that yet, but need some detailed check/test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anchor is NULL, it passes EVT_ITER_FIRST to evt_iter_probe. In this case, it doesn't reach the scenario your speak of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

*acts |= VOS_ITER_CB_ABORT;
if (rc == -DER_CSUM)
agg_param->ap_csum_err = true;
if (rc == -DER_TX_RESTART)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will be always false since you have checked "if (rc == -DER_CSUM || rc == -DER_TX_BUSY)" at L1834.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I guess I never actually hit this case because the case that was causing issues was that aborted entries were getting ignored entirely. I'll fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this a bit

assert_memory_equal(buf, "xxxxx", 5);

memset(buf, 'x', sizeof(buf));
epoch += 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unnecessary to bump the epoch again for another fetch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but it also doesn't hurt anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not hurt anything, just redundant.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

assert_memory_equal(buf, "xxxxx", 5);

memset(buf, 'x', sizeof(buf));
epoch += 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not hurt anything, just redundant.

@@ -683,11 +681,18 @@ vos_iterate_internal(vos_iter_param_t *param, vos_iter_type_t type,
if (acts & VOS_ITER_CB_ABORT)
break;

if (acts & VOS_ITER_CB_RESTART) {
daos_anchor_set_zero(anchor);
probe_anchor = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@jolivier23 jolivier23 merged commit e8f5784 into master Mar 17, 2021
@jolivier23 jolivier23 deleted the jvolivie/punch branch March 17, 2021 21:32
jolivier23 added a commit that referenced this pull request Mar 17, 2021
The main issue we are trying to prevent is removal of a subtree that isn't empty.
This typically means that we have a bug in aggregation or in the incarnation log

This patch does the following:

Modifies the btree code so that it returns aborted entries during aggregation.
Modify ilog_aggregate so it takes a minor epoch into account. Previously, if we had a punch of a parent, it would remove the child at the same major epoch even if it had updates afterward.
Modify the incarnation log to ensure we take the punch of a parent into account when doing an update
4.Modify aggregation so that we skip aggregation of parent objects if a child iterator hits -DER_TX_BUSY. This avoids removing orphaned subtrees.
Aggregation of the incarnation log was returning -DER_TX_BUSY for entries that are after the aggregated range. Moved the check to after the range check.
Modify evtree aggregation to restart the current tree in case of removing an aborted entry rather than returning -DER_TX_BUSY which would cause upper layer to abort too.
Fixed a few other issues with aggregation.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
jolivier23 added a commit that referenced this pull request Mar 20, 2021
…5059)

The main issue we are trying to prevent is removal of a subtree that isn't empty.
This typically means that we have a bug in aggregation or in the incarnation log

This patch does the following:

Modifies the btree code so that it returns aborted entries during aggregation.
Modify ilog_aggregate so it takes a minor epoch into account. Previously, if we had a punch of a parent, it would remove the child at the same major epoch even if it had updates afterward.
Modify the incarnation log to ensure we take the punch of a parent into account when doing an update
4.Modify aggregation so that we skip aggregation of parent objects if a child iterator hits -DER_TX_BUSY. This avoids removing orphaned subtrees.
Aggregation of the incarnation log was returning -DER_TX_BUSY for entries that are after the aggregated range. Moved the check to after the range check.
Modify evtree aggregation to restart the current tree in case of removing an aborted entry rather than returning -DER_TX_BUSY which would cause upper layer to abort too.
Fixed a few other issues with aggregation.

Signed-off-by: Jeff Olivier <jeffrey.v.olivier@intel.com>
@ashleypittman ashleypittman mentioned this pull request Apr 28, 2021
@ashleypittman ashleypittman mentioned this pull request May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants