Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-1946 md: fix initial solution for metadata full handling #2057

Merged
merged 1 commit into from
Mar 10, 2020

Conversation

kccain
Copy link
Contributor

@kccain kccain commented Mar 9, 2020

The original solution for rdb_raft_state checking to handle
-DER_NOSPACE condition was to trigger log compaction and step down
as the service leader. A problem with this implementation in commit
a614cb1 is that it compacts to the
current index. With this fix, it compacts to the committed index.

Test-tag-hw-large: pr,hw,large metadatafill metadata_free_space

Signed-off-by: Ken Cain kenneth.c.cain@intel.com

The original solution for rdb_raft_state checking to handle
-DER_NOSPACE condition was to trigger log compaction and step down
as the service leader. A problem with this implementation in commit
a614cb1 is that it compacts to the
current index. With this fix, it compacts to the committed index.

Test-tag-hw-large: pr,hw,large metadatafill metadata_free_space

Signed-off-by: Ken Cain <kenneth.c.cain@intel.com>
@kccain kccain requested review from liw and ravalsam March 9, 2020 17:45
Copy link
Contributor

@liw liw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@liw
Copy link
Contributor

liw commented Mar 10, 2020

Functional failed to start daos_agent... That's strange.

@kccain
Copy link
Contributor Author

kccain commented Mar 10, 2020

Functional failed to start daos_agent... That's strange.

Created a bug for this https://jira.hpdd.intel.com/browse/DAOS-4313

@kccain kccain requested a review from johannlombardi March 10, 2020 15:22
@johannlombardi johannlombardi merged commit 60bb952 into master Mar 10, 2020
@johannlombardi johannlombardi deleted the kccain/daos_1946_pr2 branch March 10, 2020 17:22
@kccain kccain requested a review from a team March 10, 2020 17:24
kccain added a commit that referenced this pull request Mar 10, 2020
Cherry picked:
commit a614cb1 PR #1956 and
commit 0bb952e808906c1575e33a345a6d95a4c3f5bc2 PR #2057
from daos master branch to release/0.9 branch.

Change rdb_raft state checking code so that when -DER_NOSPACE
condition is observed when appending to the raft log, it is handled
like the -DER_NOMEM case (become follower, step down). Also trigger
rdb log compaction aggressively seeking to reclaim space. Before this
change, stopping the service may leave it "dead" impacting subsequent
resource destroy operations (e.g., pool destroy).

Re-enable the metadatafill test and run it with multiple (4) servers
and pool service replicas (3). Adjust the maximum number of containers
to approximately 98% of what can be accommodated in a metadata
capacity of 128MB.

Test-tag-hw-large: pr,hw,large metadatafill metadata_free_space

Signed-off-by: Ken Cain <kenneth.c.cain@intel.com>
kccain added a commit that referenced this pull request Mar 18, 2020
Cherry picked:
commit a614cb1 PR #1956 and
commit 0bb952e808906c1575e33a345a6d95a4c3f5bc2 PR #2057
from daos master branch to release/0.9 branch.

Change rdb_raft state checking code so that when -DER_NOSPACE
condition is observed when appending to the raft log, it is handled
like the -DER_NOMEM case (become follower, step down). Also trigger
rdb log compaction aggressively seeking to reclaim space. Before this
change, stopping the service may leave it "dead" impacting subsequent
resource destroy operations (e.g., pool destroy).

The metadatafill test is being disabled again. While it passes
frequently with the above change to DAOS, intermittently it fails
with different symptoms when metadata storage is exhausted.

Test-tag-hw-large: pr,hw,large metadatafill metadata_free_space

Signed-off-by: Ken Cain <kenneth.c.cain@intel.com>
jolivier23 pushed a commit that referenced this pull request Mar 18, 2020
Cherry picked:
commit a614cb1 PR #1956 and
commit 0bb952e808906c1575e33a345a6d95a4c3f5bc2 PR #2057
from daos master branch to release/0.9 branch.

Change rdb_raft state checking code so that when -DER_NOSPACE
condition is observed when appending to the raft log, it is handled
like the -DER_NOMEM case (become follower, step down). Also trigger
rdb log compaction aggressively seeking to reclaim space. Before this
change, stopping the service may leave it "dead" impacting subsequent
resource destroy operations (e.g., pool destroy).

The metadatafill test is being disabled again. While it passes
frequently with the above change to DAOS, intermittently it fails
with different symptoms when metadata storage is exhausted.

Signed-off-by: Ken Cain <kenneth.c.cain@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants