Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

CORTX-33537: allow degraded i/o for the new objects #1959

Merged
merged 4 commits into from
Jul 11, 2022

Conversation

andriytk
Copy link
Contributor

@andriytk andriytk commented Jul 5, 2022

Currently, if no clean pool version (actual or formulaic)
can be found for the new object on its creation, -ENOENT is
returned, which is not good. We want the user to be able to
create new objects, even if this implies the degraded i/o on
them.

Solution: return the actual pver at conf_pver_find_locked()
in case when nothing better (cleaner) can be found.

Closes #1958.
Relates Seagate/cortx-hare#2123.

Coding

Checklist for Author

  • Coding conventions are followed and code is consistent

Testing

Checklist for Author

  • Unit and System Tests are added
  • Test Cases cover Happy Path, Non-Happy Path and Scalability
  • Testing was performed with RPM

Impact Analysis

Checklist for Author/Reviewer/GateKeeper

  • Interface change (if any) are documented
  • Side effects on other features (deployment/upgrade)
  • Dependencies on other component(s)

Review Checklist

Checklist for Author

  • JIRA number/GitHub Issue added to PR
  • PR is self reviewed
  • Jira and state/status is updated and JIRA is updated with PR link
  • Check if the description is clear and explained

Documentation

Checklist for Author

  • Changes done to WIKI / Confluence page / Quick Start Guide

@andriytk andriytk force-pushed the disable-alt-pvers branch 2 times, most recently from d97232f to 12f0856 Compare July 6, 2022 09:56
@andriytk
Copy link
Contributor Author

andriytk commented Jul 6, 2022

conf/pvers.c Outdated Show resolved Hide resolved
@andriytk andriytk changed the title CORTX-33537: disable alternative pvers by default CORTX-33537: allow degraded i/o for the new objects Jul 6, 2022
@andriytk andriytk force-pushed the disable-alt-pvers branch from 12f0856 to 7881087 Compare July 6, 2022 14:51
@andriytk
Copy link
Contributor Author

andriytk commented Jul 6, 2022

@cortx-admin

This comment was marked as outdated.

@cortx-admin
Copy link

Jenkins CI Result : Motr#1429

Motr Test Summary

Test ResultCountInfo
❌Failed2
📁

03motr-single-node/71spiel-sns-motr-repair
01motr-single-node/00userspace-tests

🏁Skipped32
📁

01motr-single-node/28sys-kvs
01motr-single-node/35m0singlenode
01motr-single-node/04initscripts
01motr-single-node/37protocol
02motr-single-node/51kem
02motr-single-node/20rpc-session-cancel
02motr-single-node/10pver-assign
02motr-single-node/21fsync-single-node
02motr-single-node/13dgmode-io
02motr-single-node/14poolmach
02motr-single-node/11m0t1fs
02motr-single-node/26motr-user-kernel-tests
02motr-single-node/08spiel
03motr-single-node/06conf
03motr-single-node/36spare-reservation
04motr-single-node/34sns-repair-1n-1f
04motr-single-node/08spiel-sns-repair-quiesce
04motr-single-node/28sys-kvs-kernel
04motr-single-node/11m0t1fs-rconfc-fail
04motr-single-node/08spiel-sns-repair
04motr-single-node/19sns-repair-abort
04motr-single-node/22sns-repair-ios-fail
05motr-single-node/18sns-repair-quiesce
05motr-single-node/12fwait
05motr-single-node/16sns-repair-multi
05motr-single-node/07mount-fail
05motr-single-node/15sns-repair-single
05motr-single-node/23sns-abort-quiesce
05motr-single-node/17sns-repair-concurrent-io
05motr-single-node/07mount
05motr-single-node/07mount-multiple
05motr-single-node/12fsync

✔️Passed41
📁

01motr-single-node/43m0crate
01motr-single-node/05confgen
01motr-single-node/06hagen
01motr-single-node/52motr-singlenode-sanity
01motr-single-node/01net
01motr-single-node/01kernel-tests
01motr-single-node/03console
01motr-single-node/02rpcping
02motr-single-node/07m0d-fatal
02motr-single-node/67fdmi-plugin-multi-filters
02motr-single-node/53clusterusage-alert
02motr-single-node/41motr-conf-update
03motr-single-node/61sns-repair-motr-1n-1f
03motr-single-node/72spiel-sns-motr-repair-quiesce
03motr-single-node/08spiel-multi-confd
03motr-single-node/69sns-repair-motr-quiesce
03motr-single-node/62sns-repair-motr-mf
03motr-single-node/70sns-failure-after-repair-quiesce
03motr-single-node/63sns-repair-motr-1k-1f
03motr-single-node/60sns-repair-motr-1f
03motr-single-node/66sns-repair-motr-abort-quiesce
03motr-single-node/24motr-dix-repair-lookup-insert-spiel
03motr-single-node/68sns-repair-motr-shutdown
03motr-single-node/64sns-repair-motr-ios-fail
03motr-single-node/24motr-dix-repair-lookup-insert-m0repair
03motr-single-node/04sss
03motr-single-node/65sns-repair-motr-abort
04motr-single-node/48motr-raid0-io
04motr-single-node/49motr-rpc-cancel
04motr-single-node/25m0kv
04motr-single-node/44motr-rm-lock-cc-io
04motr-single-node/45motr-rmw
05motr-single-node/23dix-repair-m0repair
05motr-single-node/43motr-sync-replication
05motr-single-node/42motr-utils
05motr-single-node/45motr-sns-repair-N-1
05motr-single-node/40motr-dgmode
05motr-single-node/23dix-repair-quiesce-m0repair
05motr-single-node/23spiel-dix-repair-quiesce
05motr-single-node/44motr-sns-repair
05motr-single-node/23spiel-dix-repair

Total75🔗

CppCheck Summary

   Cppcheck: No new warnings found 👍

@madhavemuri
Copy link
Contributor

@andriytk:
https://eos-jenkins.colo.seagate.com/job/Cortx-PR-Build/job/Motr/1429/testReport/junit/01motr-single-node/00userspace-tests/attachments/00userspace-tests.stdout.log

Following ut seems to be stuck, which seems to be related to this PR only.
conf-pvers-ut
fid 0.00 sec 0 B
pver-find

Currently, if no clean pool version (actual or formulaic)
can be found for the new object on its creation, -ENOENT is
returned, which is not good. We want the user to be able to
create new objects even if this implies the degraded i/o on
them.

Solution: return the actual pver at conf_pver_find_locked()
in case when nothing better (cleaner) can be found.

Closes Seagate#1958.
Relates Seagate/cortx-hare#2123.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
@andriytk andriytk force-pushed the disable-alt-pvers branch from 7881087 to 7c2b423 Compare July 7, 2022 09:15
@andriytk
Copy link
Contributor Author

andriytk commented Jul 7, 2022

@cortx-admin
Copy link

Jenkins CI Result : Motr#1438

Motr Test Summary

Test ResultCountInfo
❌Failed1
📁

01motr-single-node/00userspace-tests

🏁Skipped32
📁

01motr-single-node/28sys-kvs
01motr-single-node/35m0singlenode
01motr-single-node/04initscripts
01motr-single-node/37protocol
02motr-single-node/51kem
02motr-single-node/20rpc-session-cancel
02motr-single-node/10pver-assign
02motr-single-node/21fsync-single-node
02motr-single-node/13dgmode-io
02motr-single-node/14poolmach
02motr-single-node/11m0t1fs
02motr-single-node/26motr-user-kernel-tests
02motr-single-node/08spiel
03motr-single-node/06conf
03motr-single-node/36spare-reservation
04motr-single-node/34sns-repair-1n-1f
04motr-single-node/08spiel-sns-repair-quiesce
04motr-single-node/28sys-kvs-kernel
04motr-single-node/11m0t1fs-rconfc-fail
04motr-single-node/08spiel-sns-repair
04motr-single-node/19sns-repair-abort
04motr-single-node/22sns-repair-ios-fail
05motr-single-node/18sns-repair-quiesce
05motr-single-node/12fwait
05motr-single-node/16sns-repair-multi
05motr-single-node/07mount-fail
05motr-single-node/15sns-repair-single
05motr-single-node/23sns-abort-quiesce
05motr-single-node/17sns-repair-concurrent-io
05motr-single-node/07mount
05motr-single-node/07mount-multiple
05motr-single-node/12fsync

✔️Passed42
📁

01motr-single-node/43m0crate
01motr-single-node/05confgen
01motr-single-node/06hagen
01motr-single-node/52motr-singlenode-sanity
01motr-single-node/01net
01motr-single-node/01kernel-tests
01motr-single-node/03console
01motr-single-node/02rpcping
02motr-single-node/07m0d-fatal
02motr-single-node/67fdmi-plugin-multi-filters
02motr-single-node/53clusterusage-alert
02motr-single-node/41motr-conf-update
03motr-single-node/61sns-repair-motr-1n-1f
03motr-single-node/72spiel-sns-motr-repair-quiesce
03motr-single-node/08spiel-multi-confd
03motr-single-node/69sns-repair-motr-quiesce
03motr-single-node/62sns-repair-motr-mf
03motr-single-node/70sns-failure-after-repair-quiesce
03motr-single-node/63sns-repair-motr-1k-1f
03motr-single-node/60sns-repair-motr-1f
03motr-single-node/66sns-repair-motr-abort-quiesce
03motr-single-node/24motr-dix-repair-lookup-insert-spiel
03motr-single-node/68sns-repair-motr-shutdown
03motr-single-node/64sns-repair-motr-ios-fail
03motr-single-node/71spiel-sns-motr-repair
03motr-single-node/24motr-dix-repair-lookup-insert-m0repair
03motr-single-node/04sss
03motr-single-node/65sns-repair-motr-abort
04motr-single-node/48motr-raid0-io
04motr-single-node/49motr-rpc-cancel
04motr-single-node/25m0kv
04motr-single-node/44motr-rm-lock-cc-io
04motr-single-node/45motr-rmw
05motr-single-node/23dix-repair-m0repair
05motr-single-node/43motr-sync-replication
05motr-single-node/42motr-utils
05motr-single-node/45motr-sns-repair-N-1
05motr-single-node/40motr-dgmode
05motr-single-node/23dix-repair-quiesce-m0repair
05motr-single-node/23spiel-dix-repair-quiesce
05motr-single-node/44motr-sns-repair
05motr-single-node/23spiel-dix-repair

Total75🔗

CppCheck Summary

   Cppcheck: No new warnings found 👍

@andriytk
Copy link
Contributor Author

andriytk commented Jul 7, 2022

----- run_ut -----
...
cas-service [Nikita]
  init-fini [Nikita]                              0.38 sec   74 MiB
  init-fail [Leonid]                              0.39 sec   81 MiB
  re-init [Egor]                                  1.99 sec    6 MiB
  re-start [Nikita] 
motr[101522]:  4b10  FATAL  [lib/assert.c:50:m0_panic]  panic: ((loc->fl_wail_nr) != 0) at m0_fom_ready() (fop/fom.c:440)  [git: sage-base-1.0-962-ga0d2d7a-dirty] /var/motr/m0ut/m0trace.101522.2022-07-07-03:35:17
Motr panic: ((loc->fl_wail_nr) != 0) at m0_fom_ready() fop/fom.c:440 (errno: 0) (last failed: none) [git: sage-base-1.0-962-ga0d2d7a-dirty] pid: 101522  /var/motr/m0ut/m0trace.101522.2022-07-07-03:35:17
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_arch_backtrace+0x20)[0x7effa0bbfdc0]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_arch_panic+0xe6)[0x7effa0bbff76]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x3a2eb4)[0x7effa0badeb4]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x373bdb)[0x7effa0b7ebdb]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_sm_asts_run+0xb7)[0x7effa0c54297]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x375257)[0x7effa0b80257]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7effa0bb52fe]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x3b5c5d)[0x7effa0bc0c5d]
/lib64/libpthread.so.0(+0x7ea5)[0x7effa02f4ea5]
/lib64/libc.so.6(clone+0x6d)[0x7eff9ec7396d]
/root/motr/motr_test_github_workdir/workdir/src/utils/m0run: line 433: 101522 Aborted                 (core dumped) $(srcdir_path_of $binary) "$@"

Seems known and not caused by this patch. Right, @madhavemuri, @huanghua78 ?

@huanghua78
Copy link

----- run_ut -----
...
cas-service [Nikita]
  init-fini [Nikita]                              0.38 sec   74 MiB
  init-fail [Leonid]                              0.39 sec   81 MiB
  re-init [Egor]                                  1.99 sec    6 MiB
  re-start [Nikita] 
motr[101522]:  4b10  FATAL  [lib/assert.c:50:m0_panic]  panic: ((loc->fl_wail_nr) != 0) at m0_fom_ready() (fop/fom.c:440)  [git: sage-base-1.0-962-ga0d2d7a-dirty] /var/motr/m0ut/m0trace.101522.2022-07-07-03:35:17
Motr panic: ((loc->fl_wail_nr) != 0) at m0_fom_ready() fop/fom.c:440 (errno: 0) (last failed: none) [git: sage-base-1.0-962-ga0d2d7a-dirty] pid: 101522  /var/motr/m0ut/m0trace.101522.2022-07-07-03:35:17
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_arch_backtrace+0x20)[0x7effa0bbfdc0]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_arch_panic+0xe6)[0x7effa0bbff76]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x3a2eb4)[0x7effa0badeb4]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x373bdb)[0x7effa0b7ebdb]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_sm_asts_run+0xb7)[0x7effa0c54297]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x375257)[0x7effa0b80257]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(m0_thread_trampoline+0x5e)[0x7effa0bb52fe]
/root/motr/motr_test_github_workdir/workdir/src/motr/.libs/libmotr.so.2(+0x3b5c5d)[0x7effa0bc0c5d]
/lib64/libpthread.so.0(+0x7ea5)[0x7effa02f4ea5]
/lib64/libc.so.6(clone+0x6d)[0x7eff9ec7396d]
/root/motr/motr_test_github_workdir/workdir/src/utils/m0run: line 433: 101522 Aborted                 (core dumped) $(srcdir_path_of $binary) "$@"

Seems known and not caused by this patch. Right, @madhavemuri, @huanghua78 ?

This will be fixed in #1951

@rkothiya
Copy link
Contributor

retest this please

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disable alternative pool versions by default
7 participants