Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

reqh: md device service type is not M0_CST_IOS #1388

Merged
merged 1 commit into from
Feb 1, 2022

Conversation

mssawant
Copy link

@mssawant mssawant commented Jan 14, 2022

While generating confguration, Hare assigns metadata device to CAS
service type, M0_CST_CAS. But m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS and asserts the same.

Solution:
Expect metadata device service type to be M0_CST_CAS instead
of M0_CST_IOS.

Signed-off-by: Mandar Sawant mandar.sawant@seagate.com

Problem Statement

  • Problem statement

Design

  • For Bug, Describe the fix here.
  • For Feature, Post the link for design

Coding

Checklist for Author

  • Coding conventions are followed and code is consistent

Testing

Checklist for Author

  • Unit and System Tests are added
  • Test Cases cover Happy Path, Non-Happy Path and Scalability
  • Testing was performed with RPM

Impact Analysis

Checklist for Author/Reviewer/GateKeeper

  • Interface change (if any) are documented
  • Side effects on other features (deployment/upgrade)
  • Dependencies on other component(s)

Review Checklist

Checklist for Author

  • JIRA number/GitHub Issue added to PR
  • PR is self reviewed
  • Jira and state/status is updated and JIRA is updated with PR link
  • Check if the description is clear and explained

Documentation

Checklist for Author

  • Changes done to WIKI / Confluence page / Quick Start Guide

@andriytk
Copy link
Contributor

andriytk commented Jan 14, 2022

According to @mssawant this patch should fix the issue I'm seeing now with the latest hare+motr:

12:46 ant@centos8:mcp$ hctl status
Data pool:
    # fid name
    0x6f00000000000001:0x2f 'the pool'
Profile:
    # fid name: pool(s)
    0x7000000000000001:0x4d 'default': 'the pool' None None
Services:
    centos8  (RC)
    [started]  hax        0x7200000000000001:0x6   inet:tcp:192.168.180.182@2001
    [started]  confd      0x7200000000000001:0x9   inet:tcp:192.168.180.182@3001
    [started]  ioservice  0x7200000000000001:0xc   inet:tcp:192.168.180.182@3002
    [offline]  m0_client  0x7200000000000001:0x29  inet:tcp:192.168.180.182@5001
    [unknown]  m0_client  0x7200000000000001:0x2c  inet:tcp:192.168.180.182@5002
12:47 ant@centos8:mcp$ ./mcp -ep inet:tcp:192.168.180.182@5001 -proc 0x7200000000000001:0x29 -prof 0x7000000000000001:0x4d -hax inet:tcp:192.168.180.182@2001 -v -osz $((10*1024)) /dev/zero 12345:6789001
motr[175700]:  4d10  FATAL  [lib/assert.c:50:m0_panic]  panic: (cas_svc->sc_type == M0_CST_CAS) at dix_cas_rops_send() (dix/req.c:1762)  [git: 2.0.0-585-8-g4de43b22] 
Motr panic: (cas_svc->sc_type == M0_CST_CAS) at dix_cas_rops_send() dix/req.c:1762 (errno: 0) (last failed: none) [git: 2.0.0-585-8-g4de43b22] pid: 175700  
/lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f5a706ba523]
/lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f5a706ba6f9]
/lib64/libmotr.so.2(m0_panic+0x13d)[0x7f5a706a91ad]
/lib64/libmotr.so.2(+0x32130e)[0x7f5a7064d30e]
/lib64/libmotr.so.2(+0x321f44)[0x7f5a7064df44]
/lib64/libmotr.so.2(+0x32259f)[0x7f5a7064e59f]
/lib64/libmotr.so.2(+0x322cf8)[0x7f5a7064ecf8]
/lib64/libmotr.so.2(m0_sm_asts_run+0x131)[0x7f5a70751581]
/lib64/libmotr.so.2(m0_dix_cli_lock+0x38)[0x7f5a7064a7a8]
/lib64/libmotr.so.2(m0_dix_cli_start_sync+0x41)[0x7f5a7064aa31]
/lib64/libmotr.so.2(+0x3c7bee)[0x7f5a706f3bee]
/lib64/libmotr.so.2(+0x3a7aa1)[0x7f5a706d3aa1]
/lib64/libmotr.so.2(+0x426ba0)[0x7f5a70752ba0]
/lib64/libmotr.so.2(m0_client_init+0x270)[0x7f5a706d5dc0]
...
12:50 ant@centos8:mcp$ rpm -qa | grep cortx
cortx-motr-2.0.0-1_git4de43b22_any.el8.x86_64
cortx-py-utils-2.0.0-3_61edb2c.noarch
cortx-motr-devel-2.0.0-1_git4de43b22_any.el8.x86_64
cortx-hare-2.0.0-1_git3cf6efa.el8.x86_64

Meanwhile, it works fine on one of the previous hare commits suggested by @mssawant - 83dcfe2.

@andriytk
Copy link
Contributor

andriytk commented Jan 14, 2022

Hare assigns metadata device to CAS
service type, M0_CST_CAS. But m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS and asserts the same.

@mssawant, can you elaborate a bit - why m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS? How did it work before and what's changed and why so that it does not work now?

@mssawant
Copy link
Author

@andriytk, earlier, even though Hare CDF had a provision to specify metadata device, the device was not part of the pool devices and configuration. Process's metadata device was set to the given metadata device.
OLD cfgen code

        m0conf[proc_id] = cls(nr_cpu=facts['processorcount'],
                              memsize_MB=facts['_memsize_MB'],
                              endpoint=ep,
                              meta_data=meta_data, <<<<
                              services=[])
        m0conf[parent].processes.append(proc_id)

        for stype in service_types(proc_t, proc_desc):
            svc_id = ConfService.build(m0conf, proc_id, stype, ep, ctrl)
            if stype is SvcT.M0_CST_IOS:  # XXX What about M0_CST_CAS? <<<<<<
                assert proc_desc is not None
                for disk in proc_desc['_io_disks']:
                    assert ctrl is not None
                    ConfDrive.build(m0conf, ctrl,
                                    ConfSdev.build(m0conf, svc_id, disk))
        return proc_id

Before Seagate/cortx-hare#1888, a dummy (/dev/null) device was used for CAS service, now we create a separate ConfDrive object specified by the metadata device mentioned in the CDF associated with CAS service.
OLD code

    if ptype is PoolT.dix:
        cas = []
        for item in m0conf:
            if item.type is ObjT.enclosure:
                encl_id = item
                for proc_id in m0conf[m0conf[encl_id].node].processes:
                    for svc_id in m0conf[proc_id].services:
                        if m0conf[svc_id].type is SvcT.M0_CST_CAS:
                            cas.append((svc_id, m0conf[svc_id].ctrl_id))

        return [ConfDrive.build(m0conf, ctrl_id,
                                ConfSdev.build(m0conf, svc_id,
                                               Disk(path='/dev/null',
                                                    size=1024,
                                                    blksize=1)))
                for (svc_id, ctrl_id) in cas]

Now, although associated with the process, we create a separate ConfDrive object for the given metadata device in the CDF explicitly, associated with CAS service type and that is also the part of the configuration.

        if proc_desc is not None and proc_t is ProcT.m0_server:
            meta_data = proc_desc['io_disks'].get('meta_data')

        m0conf[proc_id] = cls(nr_cpu=facts['processorcount'],
                              memsize_MB=facts['_memsize_MB'],
                              endpoint=ep,
                              meta_data=meta_data,
                              services=[])
        m0conf[parent].processes.append(proc_id)

So as you see earlier, a ConfDrive object was not getting created for metadata device specified in CDF and thus was not part of the configuration, CAS device was set to /dev/null. Frankly, I am not sure how things worked, it was using the first data device attached to the ioservice.

@mssawant
Copy link
Author

CAS device mapping from old hare cfgen code in confd.xc,

 {0x73| ((^s|1:25), @M0_CST_CAS, [1: "inet:tcp:10.230.250.51@3002"], [0], [1: ^d|1:53])},
 {0x64| ((^d|1:53), 6, 2, 1, 1, 0x400, 0, 0, "/dev/null")},

CAS device mapping in latest Hare main by cfgen in confd.xc

{0x73| ((^s|1:15), @M0_CST_CAS, [1: "inet:tcp:10.230.250.51@3002"], [0], [1: ^d|1:16])},
 {0x64| ((^d|1:16), 0, 2, 1, 1, 0x400, 0, 0, "/dev/sde")},

for the same CDF,

# Cluster Description File (CDF).
# See `cfgen --help-schema` for the format description.

create_aux: false # optional; supported values: "false" (default), "true"
nodes:
  - hostname: ssc-vm-1623.colo.seagate.com     # [user@]hostname
    data_iface: eth0        # name of data network interface
    #data_iface_type: o2ib  # type of network interface (optional);
                            # supported values: "tcp" (default), "o2ib"
    transport_type: libfab
    m0_servers:
      - runs_confd: true
        io_disks:
          data: []
      - io_disks:
          meta_data: /dev/sde
          data:
            - path: /dev/sdb
            - path: /dev/sdc
            - path: /dev/sdd
            # - path: /dev/sde
      - io_disks:
          meta_data: /dev/sdi
          data:
            - path: /dev/sdf
            - path: /dev/sdg
            - path: /dev/sdh
            # - path: /dev/sdi
    m0_clients:
      s3: 1         # number of S3 servers to start
      other: 2      # max quantity of other Motr clients this host may have
pools:
  - name: dix-pool
    type: dix  # optional; supported values: "sns" (default), "dix", "md"
    disk_refs:
      - { path: /dev/sde, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdi, node: ssc-vm-1623.colo.seagate.com }
    data_units: 1
    parity_units: 0
    spare_units: 0
  - name: sns-pool
    type: sns  # optional; supported values: "sns" (default), "dix", "md"
    disk_refs:
      - { path: /dev/sdb, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdc, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdd, node: ssc-vm-1623.colo.seagate.com }
      # - { path: /dev/sde, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdf, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdg, node: ssc-vm-1623.colo.seagate.com }
      - { path: /dev/sdh, node: ssc-vm-1623.colo.seagate.com }
      # - { path: /dev/sdi, node: ssc-vm-1623.colo.seagate.com }
    data_units: 1
    parity_units: 0
    spare_units: 0

@madhavemuri
Copy link
Contributor

madhavemuri commented Jan 17, 2022

@mssawant : There are two types of meta-data, motr internal meta-data which is stored in cobs called mdcob in ioservice, which is stored in BE seg1 associated with the ioservice.
Another one is external or stored using DIX api, which is stored in BE seg1 using CAS service.

When ioservice and CAS service are part of same m0d, they share the BE seg1.

For motr clients other than S3, we need both of them.

@@ -213,7 +213,7 @@ m0_reqh_mdpool_service_index_to_session(const struct m0_reqh *reqh,
pd_sdev_idx;
ctx = md_pv->pv_pc->pc_dev2svc[idx].pds_ctx;
M0_ASSERT(ctx != NULL);
M0_ASSERT(ctx->sc_type == M0_CST_IOS);
M0_ASSERT(ctx->sc_type == M0_CST_CAS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be IOS only, MDPOOL associated with IOS needs to have dummy devices or first device in each m0d to use mdcobs associated with the m0d's.

They are not related to CAS service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose if CAS service is part of separate m0d where m0d isn ot present, this won't work.

Copy link
Contributor

@madhavemuri madhavemuri Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also even if this change is there, issue is still coming up,

 # ./build-deploy/cortx-motr/utils/m0cp -l inet:tcp:10.230.250.75@5001 -H inet:tcp:10.230.250.75@2001 -p  0x7000000000000001:0x4f -P 0x7200000000000001:0x29  -o 12:39 -s
 1m -c 1 -L 9 /tmp/128M
motr[126663]:  4cf0 ALWAYS  [client_init.c:468:client_net_init]  trasnport ep:inet:tcp:10.230.250.75@5001
motr[126663]:  4650  FATAL  [lib/assert.c:50:m0_panic]  panic: (cas_svc->sc_type == M0_CST_CAS) at dix_cas_rops_send() (dix/req.c:1749)  [git: 2.0.0-527-112-g409f711-dirty] /root/m0trace.126663
Motr panic```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madhavemuri, so, we have 2 devices then,

  1. CAS device, is that the meta_data device mentioned in the CDF?
  2. IOS md device, there's no other mechanism to specify IOS metadata device and one device cannot be associated with multiple service types.

MDPOOL associated with IOS needs to have dummy devices or first device in each m0d to use mdcobs associated with the m0d's.

  • dummy device - how will it be used?
  • first device of ioservice - that means it is same as the data device and will share space with data. Is that what we want?, will the first device assumption always hold true? or do we want an explicit mechanism to mention the CAS device and IOS md device separately?

Also even if this change is there, issue is still coming up,

@madhavemuri, This means a wrong CAS device is being accessed. It seems, the problem is in the clear differentiation and assignment of metadata devices, which device must be used for CAS and which for IOS metadata.

@madhavemuri
Copy link
Contributor

madhavemuri commented Jan 17, 2022

diff --git a/cfgen/cfgen b/cfgen/cfgen
index 647c53a..e2907f1 100755
--- a/cfgen/cfgen
+++ b/cfgen/cfgen
@@ -1132,8 +1132,9 @@ class ConfProcess(ToDhall):
disk = '/dev/null'
if meta_data:
disk = proc_desc['io_disks'].get('meta_data')
- assert ctrl is not None
- ConfDrive.build(m0conf, ctrl,
+
+ assert ctrl is not None
+ ConfDrive.build(m0conf, ctrl,
ConfSdev.build(
m0conf, svc_id,
Disk(

@mssawant @supriyachavan4398 : After above changes in cfgen m0cp is working fine.

@cc : @andriytk @yatin-mahajan

@mssawant
Copy link
Author

@madhavemuri, the patch you mentioned is good if metadata device is not specified in CDF then cfgen is presently not creating a default ConfDrive object for CAS (i.e. /dev/null) created Seagate/cortx-hare#1952, good catch. But even with the cfgen patch we hit,

[root@ssc-vm-1623:root] m0cp -l 'inet:tcp:10.230.250.51@5001' -H 'inet:tcp:10.230.250.51@2001' -p '<0x7000000000000001:0x57>' -P '0x7200000000000001:0x32' -o 12:39 -s 1m -c 1 -L 9 /tmp/128M
motr[121777]:  a660  FATAL  [lib/assert.c:50:m0_panic]  panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() (reqh/reqh.c:216)  [git: 2.0.0-585-11-g49443624] /root/m0trace.121777
Motr panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() reqh/reqh.c:216 (errno: 95) (last failed: none) [git: 2.0.0-585-11-g49443624] pid: 121777  /root/m0trace.121777
/lib64/libmotr.so.2(m0_arch_backtrace+0x2f)[0x7fd9edfca16f]
/lib64/libmotr.so.2(m0_arch_panic+0xf3)[0x7fd9edfca353]
/lib64/libmotr.so.2(+0x36a6f4)[0x7fd9edfb96f4]
/lib64/libmotr.so.2(m0_reqh_mdpool_service_index_to_session+0x1d5)[0x7fd9ee0330d5]
/lib64/libmotr.so.2(+0x3ac3f1)[0x7fd9edffb3f1]
/lib64/libmotr.so.2(+0x3ac9fd)[0x7fd9edffb9fd]
/lib64/libmotr.so.2(+0x3acf55)[0x7fd9edffbf55]
/lib64/libmotr.so.2(m0__obj_namei_send+0x15d)[0x7fd9edffc4fd]
/lib64/libmotr.so.2(+0x3ae322)[0x7fd9edffd322]
/lib64/libmotr.so.2(m0_op_launch_one+0x12a)[0x7fd9edfe0ffa]
/lib64/libmotr.so.2(m0_op_launch+0x54)[0x7fd9edfe1154]
m0cp[0x401f46]
m0cp(m0_write+0x12b)[0x4026ab]
m0cp(main+0xbc)[0x401c4c]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd9ebfe2555]
m0cp[0x401d42]
Aborted (core dumped)

This is because the default metadata device is now created for M0_CST_CAS service type and not for M0_CST_IOS service type. So we need #1388 as well.
I think we still need to clarify and make sure to which service the metadata device mentioned in the CDF must be associated with, and if we need more amendments to CDF. I think it would be better to understand this and make things more explicit.

@mssawant
Copy link
Author

@madhavemuri,

@mssawant : There are two types of meta-data, motr internal meta-data which is stored in cobs called mdcob in ioservice, which is stored in BE seg1 associated with the ioservice.
Another one is external or stored using DIX api, which is stored in BE seg1 using CAS service.
When ioservice and CAS service are part of same m0d, they share the BE seg1.
For motr clients other than S3, we need both of them.

I understand that there are 2 types of metadata, but there are no 2 types of metadata devices. But if CAS device is required and by default we set it to /dev/null, how does that work? I mean if S3 uses it to create objects how are they referred back?
And regarding sharing of a same device, let's do that explicitly, we can create the ConfDive object for the metadata device specified in CDF for M0_CST_IOS for now, until we separate CAS service. But that needs to be in sync with all the motr clients. It must not be that one type of client expects the devices to be associated with CAS service and other with IOS.
Or if we need 2 different devices, then let's do that.

@madhavemuri
Copy link
Contributor

@madhavemuri,

@mssawant : There are two types of meta-data, motr internal meta-data which is stored in cobs called mdcob in ioservice, which is stored in BE seg1 associated with the ioservice.
Another one is external or stored using DIX api, which is stored in BE seg1 using CAS service.
When ioservice and CAS service are part of same m0d, they share the BE seg1.
For motr clients other than S3, we need both of them.

I understand that there are 2 types of metadata, but there are no 2 types of metadata devices. But if CAS device is required and by default we set it to /dev/null, how does that work? I mean if S3 uses it to create objects how are they referred back? And regarding sharing of a same device, let's do that explicitly, we can create the ConfDive object for the metadata device specified in CDF for M0_CST_IOS for now, until we separate CAS service. But that needs to be in sync with all the motr clients. It must not be that one type of client expects the devices to be associated with CAS service and other with IOS. Or if we need 2 different devices, then let's do that.

Yes, we need M0_BE_CST service, where along with CAS service the meta-data device needs to be addded.

@madhavemuri
Copy link
Contributor

@mssawant : I think this PR is not needed as hare PR 1952 is landed,
Seagate/cortx-hare#1952

Optimizations related to BE service can be taken up in a separate task.

@mssawant
Copy link
Author

@madhavemuri, as I mentioned above even with #1952, if a metadata device is specified in CDF then we see,

 [root@ssc-vm-1623:root] m0cp -l 'inet:tcp:10.230.250.51@5001' -H 'inet:tcp:10.230.250.51@2001' -p '<0x7000000000000001:0x57>' -P '0x7200000000000001:0x32' -o 12:39 -s 1m -c 1 -L 9 /tmp/128M
motr[121777]:  a660  FATAL  [lib/assert.c:50:m0_panic]  panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() (reqh/reqh.c:216)  [git: 2.0.0-585-11-g49443624] /root/m0trace.121777
Motr panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() reqh/reqh.c:216 (errno: 95) (last failed: none) [git: 2.0.0-585-11-g49443624] pid: 121777  /root/m0trace.121777
/lib64/libmotr.so.2(m0_arch_backtrace+0x2f)[0x7fd9edfca16f]

@stale
Copy link

stale bot commented Jan 25, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@madhavemuri
Copy link
Contributor

madhavemuri commented Jan 31, 2022

@madhavemuri, as I mentioned above even with #1952, if a metadata device is specified in CDF then we see,

 [root@ssc-vm-1623:root] m0cp -l 'inet:tcp:10.230.250.51@5001' -H 'inet:tcp:10.230.250.51@2001' -p '<0x7000000000000001:0x57>' -P '0x7200000000000001:0x32' -o 12:39 -s 1m -c 1 -L 9 /tmp/128M
motr[121777]:  a660  FATAL  [lib/assert.c:50:m0_panic]  panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() (reqh/reqh.c:216)  [git: 2.0.0-585-11-g49443624] /root/m0trace.121777
Motr panic: (ctx->sc_type == M0_CST_IOS) at m0_reqh_mdpool_service_index_to_session() reqh/reqh.c:216 (errno: 95) (last failed: none) [git: 2.0.0-585-11-g49443624] pid: 121777  /root/m0trace.121777
/lib64/libmotr.so.2(m0_arch_backtrace+0x2f)[0x7fd9edfca16f]

Metdata pool expects a device from each ioservice in the cluster, as it uses cob domain and in turn need to refer ioservice to create mdcobs.

Upto hare commit 8147220cdaf7f2760f7eea28d1ee28e16c949008,

{0x6f| ((^o|1:63), 0, [1: ^v|1:66])},
 {0x64| ((^d|1:64), 10, 2, 1, 1, 0x400, 0, 0, "/dev/null")},
 {0x6b| ((^k|1:65), ^d|1:64, [1: ^v|1:66])},
 {0x76| ((^v|1:66), {0| (1, 0, 0, 1, [5: 0, 0, 0, 0, 0], [1: ^j|1:71])})},
 
 
 {0x6f| ((^o|1:72), 0, [1: ^v|1:73])},
 {0x76| ((^v|1:73), {0| (1, 0, 0, 1, [5: 0, 0, 0, 0, 0], [1: ^j|1:78])})},
 {0x6a| ((^j|1:74), ^k|1:16, [0])},       <== First device of ioservice
 
 {0x64| ((^d|1:15), 0, 2, 1, 0x1000, 0x271000000, 0, 0, "/dev/loop0")},
 {0x6b| ((^k|1:16), ^d|1:15, [2: ^v|1:48, ^v|1:73])},
 
 
 Now,
 
 {0x6f| ((^o|1:65), 0, [1: ^v|1:66])},
 {0x76| ((^v|1:66), {0| (1, 0, 0, 1, [5: 0, 0, 0, 0, 0], [1: ^j|1:71])})},
 {0x6a| ((^j|1:67), ^k|1:16, [0])},
 
 
 {0x76| ((^v|1:73), {0| (1, 0, 0, 1, [5: 0, 0, 0, 0, 0], [1: ^j|1:78])})},
 {0x6a| ((^j|1:74), ^k|1:16, [0])},
 {0x6a| ((^j|1:67), ^k|1:16, [0])},
 
 Now both mdpool and dix pool pointing the device of CAS service, which won't work
  {0x73| ((^s|1:14), @M0_CST_CAS, [1: "inet:tcp:10.230.242.196@3002"], [0], [1: ^d|1:15])},
 {0x64| ((^d|1:15), 0, 2, 1, 1, 0x400, 0, 0, "/dev/null")},
 {0x6b| ((^k|1:16), ^d|1:15, [2: ^v|1:66, ^v|1:73])},

@mssawant : I have check confd.xc, looks like with recent changes this pattern is changed, causing above issue.
This needs to be fixed asap in hare, otherwise motr clients won't work.

@stale stale bot removed the needs-attention label Jan 31, 2022
While generating confguration, Hare assigns metadata device to CAS
service type, M0_CST_CAS. But m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS and asserts the same.

Solution:
Expect metadata device service type to be M0_CST_CAS instead
of M0_CST_IOS.

Signed-off-by: Mandar Sawant <mandar.sawant@seagate.com>
@mssawant
Copy link
Author

@madhavemuri, as discussed we will use this patch and refine the fix to use M0_CST_BE as a service type for meta data devices used in CAS and IOS m0ds in a separate patch, presently motr clients work fine with this patch, tested with latest Hare main and motr main + pr 1388,

[root@ssc-vm-g4-rhev4-0554 ~]# hctl status
Byte_count:
critical_byte_count : 0
damaged_byte_count : 0
degraded_byte_count : 0
healthy_byte_count : 0
Data pool:
# fid name
0x6f00000000000001:0x38 'sns-pool'
Profile:
# fid name: pool(s)
0x7000000000000001:0x57 'default': 'sns-pool' None None
Services:
ssc-vm-g4-rhev4-0554.colo.seagate.com (RC)
[started] hax 0x7200000000000001:0x7 inet:tcp:10.230.240.235@2001
[started] confd 0x7200000000000001:0xa inet:tcp:10.230.240.235@3001
[started] ioservice 0x7200000000000001:0xd inet:tcp:10.230.240.235@3002
[started] ioservice 0x7200000000000001:0x1e inet:tcp:10.230.240.235@3003
[unknown] s3server 0x7200000000000001:0x2f inet:tcp:10.230.240.235@4001
[unknown] m0_client 0x7200000000000001:0x32 inet:tcp:10.230.240.235@5001
[unknown] m0_client 0x7200000000000001:0x35 inet:tcp:10.230.240.235@5002
[root@ssc-vm-g4-rhev4-0554 ~]# m0client -l 'inet:tcp:10.230.240.235@5001' -H 'inet:tcp:10.230.240.235@2001' -p '<0x7000000000000001:0x57>' -P '<0x7200000000000001:0x32>'
m0client >>write 1048680 /tmp/128M 4096 200 50

m0client >>read 1048680 /tmp/read_1048680 4096 200 50
m0client >>quit
[root@ssc-vm-g4-rhev4-0554 ~]# m0cp -l 'inet:tcp:10.230.240.235@5001' -H 'inet:tcp:10.230.240.235@2001' -p 0x7000000000000001:0x57 -P 0x7200000000000001:0x32 -o 20:20 -s 4k -c 32 /tmp/128M
[root@ssc-vm-g4-rhev4-0554 ~]#

I have updated the patch with the comment as discussed.

Copy link
Contributor

@madhavemuri madhavemuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device associated with the BE service of ios m0d needs to be added to mdpool, otherwise if we have a separate m0d without ioservice, it will fail.
A todo is added for it.

@madhavemuri madhavemuri merged commit 6d6c31c into Seagate:main Feb 1, 2022
@welcome
Copy link

welcome bot commented Feb 1, 2022

Thanks for your contribution to CORTX! 🎉

atulsdeshmukh2312 pushed a commit that referenced this pull request Feb 1, 2022
While generating configuration, Hare assigns metadata device (BE seg1) to CAS
service type, M0_CST_CAS. But m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS and asserts the same.

Solution:
Expect metadata device service type to be M0_CST_CAS instead
of M0_CST_IOS. And in future this will be updated to M0_CST_BE.

Signed-off-by: Mandar Sawant <mandar.sawant@seagate.com>
Signed-off-by: Atul Deshmukh <atul.deshmukh@seagate.com>
atulsdeshmukh2312 pushed a commit that referenced this pull request Feb 1, 2022
While generating configuration, Hare assigns metadata device (BE seg1) to CAS
service type, M0_CST_CAS. But m0_reqh_mdpool_service_index_to_session()
expects it to be M0_CST_IOS and asserts the same.

Solution:
Expect metadata device service type to be M0_CST_CAS instead
of M0_CST_IOS. And in future this will be updated to M0_CST_BE.

Signed-off-by: Mandar Sawant <mandar.sawant@seagate.com>
Signed-off-by: Atul Deshmukh <atul.deshmukh@seagate.com>
@nkommuri
Copy link

nkommuri commented Feb 1, 2022

@mssawant. m0cp is dumping core(similar assert) with latest Motr(1e9feed) and Hare(b617d0875ca45a2fed493891ca4a09d03808001f).

(gdb) bt
#0 0x00007fa1283c3387 in raise () from /lib64/libc.so.6
#1 0x00007fa1283c4a78 in abort () from /lib64/libc.so.6
#2 0x00007fa12a3be885 in m0_arch_panic (c=c@entry=0x7fa12a8a3800 <__pctx.24724>, ap=ap@entry=0x7ffe854a3f38) at lib/user_space/uassert.c:131
#3 0x00007fa12a3ac864 in m0_panic (ctx=ctx@entry=0x7fa12a8a3800 <__pctx.24724>) at lib/assert.c:52
#4 0x00007fa12a3eef3f in m0_obj_container_id_to_session (pver=pver@entry=0xbbae60, container_id=) at motr/cob.c:937
#5 0x00007fa12a3e3078 in target_session (tfid=..., ioo=0xcf9010) at motr/io_nw_xfer.c:743
#6 nw_xfer_tioreq_map (xfer=0xcf9650, src=, tgt=0x7ffe854a43f0, tio=0x7ffe854a43d8) at motr/io_nw_xfer.c:2110
#7 0x00007fa12a3e24d0 in nw_xfer_io_distribute (xfer=0xcf9650) at motr/io_nw_xfer.c:1558
#8 0x00007fa12a3ebbf0 in obj_io_cb_launch (oc=0xcf9010) at motr/io.c:259
#9 0x00007fa12a3d4d3b in m0_op_launch_one (op=0xcf9010) at motr/client.c:719
#10 0x00007fa12a3d4e84 in m0_op_launch (op=op@entry=0x7ffe854a4730, nr=nr@entry=1) at motr/client.c:733
#11 0x000000000040213d in write_data_to_object (obj=obj@entry=0x7ffe854a49a0, ext=ext@entry=0x7ffe854a4790, data=data@entry=0x7ffe854a47b0,
attr=0x0) at motr/st/utils/helper.c:298
#12 0x000000000040262d in m0_write (container=container@entry=0x6065e0 , src=, id=..., block_size=4096,
block_count=786, update_offset=0, blks_per_io=, take_locks=false, update_mode=false) at motr/st/utils/helper.c:421
#13 0x0000000000401bcd in main (argc=, argv=0x7ffe854a4d78) at motr/st/utils/copy.c:112
(gdb) f 4
#4 0x00007fa12a3eef3f in m0_obj_container_id_to_session (pver=pver@entry=0xbbae60, container_id=) at motr/cob.c:937
937 M0_ASSERT(ios_ctx->sc_type == M0_CST_IOS);
(gdb) p ios_ctx->sc_type
$1 = M0_CST_CAS
(gdb)

madhavemuri added a commit that referenced this pull request Feb 1, 2022
madhavemuri added a commit that referenced this pull request Feb 1, 2022
This reverts commit 6d6c31c

Motr ST's still uses a first device from IOS, until those changes are there,
motr tests won't work, so reverting this change.
atulsdeshmukh2312 pushed a commit that referenced this pull request Feb 2, 2022
This reverts commit 6d6c31c

Motr ST's still uses a first device from IOS, until those changes are there,
motr tests won't work, so reverting this change.

Signed-off-by: Atul Deshmukh <atul.deshmukh@seagate.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants