Client crashes on too many failures #1838

andriytk · 2022-05-30T16:20:39Z

motr[00001]:  12a0  ERROR  [io_req.c:1551:device_check]  <! rc=-5 [0x558882659000] too many failures: nodes=1 + svcs=1 + devs=0, allowed: nodes=1 or svcs=1 or devs=2
motr[00001]:  14a0  ERROR  [io_req.c:549:ioreq_iosm_handle_executed]  iro_dgmode_write() failed, rc=-5
<7600000000000001:b8>: nr_failures:5 max_failures:2 event_index:10 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:f5>: nr_failures:3 max_failures:2 event_index:5 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:5 max_failures:2 event_index:8 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:5 max_failures:2 event_index:8 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:a>: nr_failures:3 max_failures:2 event_index:8 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:5 max_failures:2 event_index:8 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:6 max_failures:2 event_index:11 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:6 max_failures:2 event_index:9 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:6 max_failures:2 event_index:9 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:a>: nr_failures:4 max_failures:2 event_index:9 event_state:3
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:6 max_failures:2 event_index:9 event_state:3
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:14>: nr_failures:3 max_failures:2 event_index:4 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  ada0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:14>: nr_failures:3 max_failures:2 event_index:4 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  3da0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:5 max_failures:2 event_index:4 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:14>: nr_failures:3 max_failures:2 event_index:4 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:5 max_failures:2 event_index:2 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7600000000000001:b8>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:11>: nr_failures:4 max_failures:2 event_index:5 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:3>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  dda0  ERROR  [pool/pool_machine.c:783:m0_poolmach_state_transit]  <7680000000000002:6>: nr_failures:4 max_failures:2 event_index:3 event_state:1
motr[00001]:  6010  ERROR  [rpc/rpc.c:119:m0_rpc__post_locked]  <! rc=-107
motr[00001]:  6540  ERROR  [cas/client.c:556:cas_req_failure_ast]  <! rc=-107
motr[00001]:  9010  ERROR  [rpc/rpc.c:119:m0_rpc__post_locked]  <! rc=-107
motr[00001]:  9540  ERROR  [cas/client.c:556:cas_req_failure_ast]  <! rc=-107
motr[00001]:  f010  ERROR  [rpc/rpc.c:119:m0_rpc__post_locked]  <! rc=-107
motr[00001]:  f540  ERROR  [cas/client.c:556:cas_req_failure_ast]  <! rc=-107
motr[00001]:  9010  ERROR  [rpc/rpc.c:119:m0_rpc__post_locked]  <! rc=-107
motr[00001]:  9540  ERROR  [cas/client.c:556:cas_req_failure_ast]  <! rc=-107
motr[00001]:  5610  ERROR  [fd/fd.c:425:tolerance_check]  <! rc=-22
motr[00001]:  5740  FATAL  [lib/assert.c:50:m0_panic]  panic: (({ unsigned __nr = (depth); unsigned i; for (i = 0; i < __nr && ({ children_nr[i] != 0 ; }); ++i) ; i == __nr; })) at pool_width_calc() (fd/fd.c:482)  [git: 2.0.0-790-9-g662e7a18] /etc/cortx/log/rgw/dbcf46ecb8524a26b17c207373397162/motr_trace_files/m0trace.1.2022-05-30-11:05:05
Motr panic: (({ unsigned __nr = (depth); unsigned i; for (i = 0; i < __nr && ({ children_nr[i] != 0 ; }); ++i) ; i == __nr; })) at pool_width_calc() fd/fd.c:482 (errno: 11) (last failed: none) [git: 2.0.0-790-9-g662e7a18] pid: 1  /etc/cortx/log/rgw/dbcf46ecb8524a26b17c207373397162/motr_trace_files/m0trace.1.2022-05-30-11:05:05
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f77f382f1a3]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f77f382f379]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7f77f381de2d]
/lib64/libmotr.so.2(m0_fd__tile_init+0x1ae)[0x7f77f37dc24e]
/lib64/libmotr.so.2(m0_fd_tile_build+0xad)[0x7f77f37dc5cd]
/lib64/libmotr.so.2(m0_pool_version_init_by_conf+0x1a7)[0x7f77f3890c07]
/lib64/libmotr.so.2(m0_pool_version_append+0xf8)[0x7f77f3892118]
/lib64/libmotr.so.2(+0x4010ff)[0x7f77f38980ff]
/lib64/libmotr.so.2(m0_pool_version_get+0x18c)[0x7f77f3890a0c]
/lib64/libmotr.so.2(m0_layout_find_by_objsz+0x34)[0x7f77f38187e4]
/lib64/libradosgw.so.2(_ZN3rgw3sal10MotrObject11create_mobjEPK18DoutPrefixProviderm+0x42e)[0x7f77f993627e]
/lib64/libradosgw.so.2(_ZN3rgw3sal16MotrAtomicWriter5writeEv+0x980)[0x7f77f9948890]
/lib64/libradosgw.so.2(_ZN9RGWPutObj7executeE14optional_yield+0xd5d)[0x7f77f96a6f5d]
/lib64/libradosgw.so.2(_Z25rgw_process_authenticatedP15RGWHandler_RESTRP5RGWOpP10RGWRequestP9req_state14optional_yieldPN3rgw3sal5StoreEb+0xb3f)[0x7f77f92fb7df]
/lib64/libradosgw.so.2(_Z15process_requestPN3rgw3sal5StoreEP7RGWRESTP10RGWRequestRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_4auth16StrategyRegistryEP12RGWRestfulIOP10OpsLogSink14optional_yieldPNS_7dmclock9SchedulerEPSC_PNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEESt10shared_ptrI11RateLimiterEPi+0x25c6)[0x7f77f92fe6f6]
/lib64/libradosgw.so.2(+0x4cd0aa)[0x7f77f926b0aa]
/lib64/libradosgw.so.2(+0x4ce751)[0x7f77f926c751]
/lib64/libradosgw.so.2(+0x4ce8cc)[0x7f77f926c8cc]
/lib64/libradosgw.so.2(make_fcontext+0x2f)[0x7f77f9b8a65f]
*** Caught signal (Aborted) **
 in thread 7f77d92a4700 thread_name:radosgw
 ceph version 17.0.0-10334-gbdae4dbc0c9 (bdae4dbc0c9a5ccd3d2d3cb430f4d0085802cef4) quincy (dev)
 1: /lib64/libpthread.so.0(+0x12b30) [0x7f77f67b8b30]
 2: gsignal()
 3: abort()
 4: /lib64/libmotr.so.2(+0x398383) [0x7f77f382f383]
 5: m0_panic()
 6: m0_fd__tile_init()
 7: m0_fd_tile_build()
 8: m0_pool_version_init_by_conf()
 9: m0_pool_version_append()
 10: /lib64/libmotr.so.2(+0x4010ff) [0x7f77f38980ff]
 11: m0_pool_version_get()
 12: m0_layout_find_by_objsz()
 13: (rgw::sal::MotrObject::create_mobj(DoutPrefixProvider const*, unsigned long)+0x42e) [0x7f77f993627e]
 14: (rgw::sal::MotrAtomicWriter::write()+0x980) [0x7f77f9948890]
 15: (RGWPutObj::execute(optional_yield)+0xd5d) [0x7f77f96a6f5d]
 16: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, rgw::sal::Store*, bool)+0xb3f) [0x7f77f92fb7df]
 17: (process_request(rgw::sal::Store*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, std::shared_ptr<RateLimiter>, int*)+0x25c6) [0x7f77f92fe6f6]
 18: /lib64/libradosgw.so.2(+0x4cd0aa) [0x7f77f926b0aa]
 19: /lib64/libradosgw.so.2(+0x4ce751) [0x7f77f926c751]
 20: /lib64/libradosgw.so.2(+0x4ce8cc) [0x7f77f926c8cc]
 21: make_fcontext()
2022-05-30T16:10:39.967+0000 7f77d92a4700 -1 *** Caught signal (Aborted) **
 in thread 7f77d92a4700 thread_name:radosgw

Versions:

[root@ssc-vm-g4-rhev4-1490 ~]# kubectl exec -it cortx-server-ssc-vm-g4-rhev4-1490-59469c57cd-9xdz6 -c cortx-rgw -- rpm -qa | grep -E cortx\|radosgw
cortx-rgw-integration-2.0.0-5068_765d062.noarch
cortx-motr-2.0.0-5155_git662e7a18.el8.x86_64
cortx-provisioner-2.0.0-5038_0aa6ce08.noarch
cortx-py-utils-2.0.0-5043_a2e13c4.noarch
ceph-radosgw-17.0.0-10334.gbdae4dbc0c9.el8.x86_64
cortx-hare-2.0.0-5229_git5443f9c.el8.x86_64

Container images versions from solution.yaml file:

    cortxcontrol: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-5518
    cortxdata: cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-5518
    cortxserver: cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-5518
    cortxha: cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-5518
    cortxclient: cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-5518

Script to reproduce the issue:

#!/bin/bash

dd if=/dev/urandom of=/tmp/200m bs=1M count=200
dd if=/tmp/200m of=/tmp/196m bs=1M count=196

pods=($(kubectl get pods | grep x-data | awk '{print $1}'))

put='aws s3api put-object --bucket test-bucket --key 200m --body /tmp/200m --endpoint-url http://192.168.60.187:30080'
get='aws s3api get-object --bucket test-bucket --key 200m --range bytes=0-205520895 /tmp/196m.check --endpoint-url http://192.168.60.187:30080'

# kills random m0d-ios
kill()
{
  n=${#pods[@]}
  i=$(($RANDOM % $n))
  c=$(($RANDOM % 2 + 1))
  kubectl exec -it ${pods[$i]} -c cortx-motr-io-00$c -- /bin/pkill -9 m0d
}

$put || exit 1
rc=$?

while [[ $rc -eq 0 ]] && $get && cmp /tmp/196m{,.check}; do
  { $put; rc=$?; } &
  sleep 3
  $(kill)
  wait
  kubectl get pods
done

To see the motr errors from rgw run this command:

kubectl get pods | grep x-server | awk '{print $1}' | while read p; do kubectl logs $p -c cortx-rgw -f & done

The text was updated successfully, but these errors were encountered:

cortx-admin · 2022-05-30T16:21:28Z

For the convenience of the Seagate development team, this issue has been mirrored in a private Seagate Jira Server: https://jts.seagate.com/browse/CORTX-31844. Note that community members will not be able to access that Jira server but that is not a problem since all activity in that Jira mirror will be copied into this GitHub issue.

siningwuseagate · 2022-06-06T10:56:33Z

After analysing the m0trace files collected from the crash, the code path leading to the crash is explained in more details below:

m0_layout_find_by_objsz() --> m0_pool_version_get() --> pool->po_pver_policy->pp_ops->ppo_get()
pop_get is the function pointer to pver_first_available_get(), inside this function, the following work is done:
it tries to get a clean pool version from the cache by calling m0_pool_clean_pver_find(), in our case, as we observed, the
cached pool version is dirty (we may need to dig further to understand how this happens), so a new pool version is created
by calling m0_conf_pver_get() and is appended to the pool using m0_pool_version_append()
m0_pool_version_append() --> m0_pool_version_init_by_conf() --> m0_fd_tile_build()
m0_fd_tile_build() calls symm_tree_attr_get() to get the failure domain tree's attributes, one of the attributes is the
the minimum numbers of children at each level.

In our case, the tree for the pool version has 4 levels
(root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS, M0_CONF_PVER_LVL_DRIVES), the minimum
number of children of the top 3 levels are 3, 1, 0. Although at the end of symm_tree_attr_get(), it calls tolerance_check()
to check failure settings. But tolerance_check() only checks the top 2 levels and ignores the 3rd level
(M0_CONF_PVER_LVL_CTRLS).
After the above checks, m0_fd__tile_init() is called and it calls pool_width_calc() which asserts that the minimum
number of children at each level (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS) is not 0, but as the
number at the M0_CONF_PVER_LVL_DRIVES is 0, that leads to the panic.

To avoid the panic, adding a check in symm_tree_attr_get() to ensure the minimum number of children at each level must be greater than 0, otherwise an -EINVAL is returned.

PR for the fix: #1856

Problem: as desribed in issue Seagate#1838, there exists a case in which the failure domain tree built for a pool version has 4 levels (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS, M0_CONF_PVER_LVL_DRIVES), the minimum number of children at top 3 level are 3, 1, 0. Although at the end of symm_tree_attr_get(), it calls tolerance_check() to check failure settings, tolerance_check() only checks the top 2 levels and ignores the 3rd level (M0_CONF_PVER_LVL_CTRLS). After the above checks, m0_fd__tile_init() is called and it calls pool_width_calc() which asserts that the minimum number of children at the top 3 level (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS) is not 0, but as the number at the M0_CONF_PVER_LVL_CTRLS is 0, that leads to the panic. Solution: to avoid the panic, adding a check in symm_tree_attr_get() to ensure the minimum number of children at each level must be greater than 0, otherwise -EINVAL is returned. Signed-off-by: Sining Wu <sining.wu@seagate.com>

siningwuseagate · 2022-06-08T14:28:46Z

RGW fix to avoid the panic by Andriy: Seagate/cortx-rgw@12d90d3

stale · 2022-06-13T02:02:35Z

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

chandradharraval · 2022-06-13T06:19:28Z

Hi @andriytk ,
I see RGW fix is integrated based on above comments from @siningwuseagate. Any further work pending for this issue?

chandradharraval · 2022-06-16T04:54:55Z

HI @andriytk ,
Are we good to close this based on above comment?

andriytk · 2022-06-16T07:29:33Z

No, the fix has not been landed yet - #1856.

Problem: as desribed in issue #1838, there exists a case in which the failure domain tree built for a pool version has 4 levels (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS, M0_CONF_PVER_LVL_DRIVES), the minimum number of children at top 3 level are 3, 1, 0. Although at the end of symm_tree_attr_get(), it calls tolerance_check() to check failure settings, tolerance_check() only checks the top 2 levels and ignores the 3rd level (M0_CONF_PVER_LVL_CTRLS). After the above checks, m0_fd__tile_init() is called and it calls pool_width_calc() which asserts that the minimum number of children at the top 3 level (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS) is not 0, but as the number at the M0_CONF_PVER_LVL_CTRLS is 0, that leads to the panic. Solution: to avoid the panic, adding a check in symm_tree_attr_get() to ensure the minimum number of children at each level must be greater than 0, otherwise -EINVAL is returned. * conf: check pvs_tolerance is greater than 0 before decreasing it Signed-off-by: Sining Wu <sining.wu@seagate.com>

cortx-admin · 2022-06-17T04:57:35Z

Gaurav Chaudhari commented in Jira Server:

{panel:bgColor=#c1c7d0}h2. motr - main branch build pipeline SUCCESS
h3. Build Info:

Author: Sining Wu
Component Build : 5186
Release Build : 5638

h3. Artifact Location :
http://cortx-storage.colo.seagate.com/releases/cortx/github/main/rockylinux-8.4/5638

h3. Image Location :

cortx-docker.colo.seagate.com/seagate/cortx-all:2.0.0-5638
cortx-docker.colo.seagate.com/seagate/cortx-rgw:2.0.0-5638
cortx-docker.colo.seagate.com/seagate/cortx-data:2.0.0-5638
cortx-docker.colo.seagate.com/seagate/cortx-control:2.0.0-5638
{panel}

cortx-admin · 2022-06-20T06:30:27Z

Chandradhar Raval commented in Jira Server:

Marking this issue Closed and corresponding PR [https://github.com//pull/1856] is merged

…ate#1856) Problem: as desribed in issue Seagate#1838, there exists a case in which the failure domain tree built for a pool version has 4 levels (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS, M0_CONF_PVER_LVL_DRIVES), the minimum number of children at top 3 level are 3, 1, 0. Although at the end of symm_tree_attr_get(), it calls tolerance_check() to check failure settings, tolerance_check() only checks the top 2 levels and ignores the 3rd level (M0_CONF_PVER_LVL_CTRLS). After the above checks, m0_fd__tile_init() is called and it calls pool_width_calc() which asserts that the minimum number of children at the top 3 level (root, M0_CONF_PVER_LVL_ENCLS, M0_CONF_PVER_LVL_CTRLS) is not 0, but as the number at the M0_CONF_PVER_LVL_CTRLS is 0, that leads to the panic. Solution: to avoid the panic, adding a check in symm_tree_attr_get() to ensure the minimum number of children at each level must be greater than 0, otherwise -EINVAL is returned. * conf: check pvs_tolerance is greater than 0 before decreasing it Signed-off-by: Sining Wu <sining.wu@seagate.com>

andriytk mentioned this issue May 31, 2022

panic at m0_balloc_load_extents() (balloc/balloc.c:1347) #1845

Closed

andriytk assigned siningwuseagate Jun 1, 2022

andriytk mentioned this issue Jun 1, 2022

Handle error from m0_layout_find_by_objsz() Seagate/cortx-rgw#254

Merged

14 tasks

siningwuseagate mentioned this issue Jun 6, 2022

CORTX-31844: Client crashes on too many failures (#1838) #1856

Merged

12 tasks

r-wambui added Triage: DevTeam Triage owner is on the dev team Status: L2 Triage labels Jun 8, 2022

stale bot added the needs-attention label Jun 13, 2022

stale bot removed the needs-attention label Jun 13, 2022

mehjoshi closed this as completed in #1856 Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client crashes on too many failures #1838

Client crashes on too many failures #1838

andriytk commented May 30, 2022 •

edited

Loading

cortx-admin commented May 30, 2022

siningwuseagate commented Jun 6, 2022 •

edited

Loading

siningwuseagate commented Jun 8, 2022

stale bot commented Jun 13, 2022

chandradharraval commented Jun 13, 2022

chandradharraval commented Jun 16, 2022

andriytk commented Jun 16, 2022

cortx-admin commented Jun 17, 2022

cortx-admin commented Jun 20, 2022

Client crashes on too many failures #1838

Client crashes on too many failures #1838

Comments

andriytk commented May 30, 2022 • edited Loading

cortx-admin commented May 30, 2022

siningwuseagate commented Jun 6, 2022 • edited Loading

siningwuseagate commented Jun 8, 2022

stale bot commented Jun 13, 2022

chandradharraval commented Jun 13, 2022

chandradharraval commented Jun 16, 2022

andriytk commented Jun 16, 2022

cortx-admin commented Jun 17, 2022

cortx-admin commented Jun 20, 2022

andriytk commented May 30, 2022 •

edited

Loading

siningwuseagate commented Jun 6, 2022 •

edited

Loading