Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Can/should we create Doxygen? #22

Closed
johnbent opened this issue Aug 26, 2020 · 29 comments
Closed

Can/should we create Doxygen? #22

johnbent opened this issue Aug 26, 2020 · 29 comments
Labels
needs-attention rose Status: Fix Scheduled Will be fixed in an upcoming release or sooner Triage: DevTeam Triage owner is on the dev team

Comments

@johnbent
Copy link
Contributor

Hello @nikitadanilov ,

In https://github.com/Seagate/cortx-motr/pull/233/files/ba835da44d825a329c7c4694087f9e3b7ebea369..dd549c1eec91f9c8bae2d4be430547aea3957c08, I saw that you mentioned Doxygen. Are the motr source files set up for Doxygen? If so, are the developers currently using Doxygen auto-generated servers? If so, where? If it is not currently set up, or if it is set up but only internal, what do you think about having someone follow this guide and set up public Doxygen for us?

https://gist.github.com/francesco-romano/351a6ae457860c14ee7e907f2b0fc1a5

Thanks,

John

@johnbent
Copy link
Contributor Author

Plus @justinzw as an FYI.

@yanqingfu
Copy link
Contributor

Seagate/cortx#160 related

@stale
Copy link

stale bot commented Sep 9, 2020

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @huanghua78 @mukundkanekar for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@stale stale bot closed this as completed Sep 16, 2020
@gauravchaudhari02 gauravchaudhari02 transferred this issue from another repository Sep 22, 2020
@stale
Copy link

stale bot commented Oct 16, 2020

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @huanghua78 @mukundkanekar for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@huanghua78
Copy link

I think we need someone to check the Doxygen building in Motr.

@stale stale bot removed the needs-attention label Nov 26, 2020
@stale
Copy link

stale bot commented Nov 30, 2020

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @huanghua78 @mukundkanekar for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@nkommuri
Copy link

nkommuri commented Oct 4, 2021

Verified that Motr Doxygen building works.
@johnbent , Is this ticket intended for creation of public Doxygen using Github pages? If not, we can close this ticket.

@stale stale bot removed the needs-attention label Oct 4, 2021
@johnbent
Copy link
Contributor Author

johnbent commented Oct 4, 2021

@nkommuri yes, this is about creating public doxygen if that would be easy to do and people think it is useful

@stale
Copy link

stale bot commented Oct 9, 2021

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@stale stale bot added the needs-attention label Oct 9, 2021
@r-wambui r-wambui added rose Triage: DevAd Triage owned by DevAd labels Nov 17, 2021
@stale stale bot removed the needs-attention label Nov 17, 2021
@r-wambui
Copy link
Contributor

Hi @johnbent , what are the next steps on this? Does it need to be added to the documentation or the product. If product, has this been planned for future sprints?

@johnbent
Copy link
Contributor Author

johnbent commented Nov 18, 2021 via email

@stale
Copy link

stale bot commented Nov 23, 2021

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@truptiatseagate
Copy link

@johnbent, @chandradharraval and team will review the request here to setup public Doxygen for Motr repo and they will get back shortly on it.

@stale stale bot removed the needs-attention label Nov 23, 2021
@r-wambui r-wambui added Triage: DevTeam Triage owner is on the dev team Status: Fix Scheduled Will be fixed in an upcoming release or sooner and removed Triage: DevAd Triage owned by DevAd labels Nov 24, 2021
@stale
Copy link

stale bot commented Nov 29, 2021

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

Copy link

Chandradhar Raval commented in Jira Server:

We plan to pick this up in PI7

@stale stale bot removed the needs-attention label Apr 11, 2022
@stale
Copy link

stale bot commented Apr 15, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

Copy link

Nagakishore Kommuri commented in Jira Server:

Total number of doc files generated after running doxygen...

root@ssc-vm-g2-rhev4-3141 ~/doxygen/cortx-motr/doc (gh-pages) > find . -type f | wc
104591 104604 7124485

 

@stale stale bot removed the needs-attention label Apr 27, 2022
andriytk added a commit to andriytk/cortx-motr that referenced this issue Apr 27, 2022
All m0d-ios processes may crash if some of them is crashed during write
i/o. Here is the scenario, for example:

setup: 1 node, 5 ios processes configured with 2 disks each.

1) Start write:

    m0cp -l <local-endpoint> -H <hax-endpoint> -p <profile-fid> -P <process-fid> -s 32k -c 1024 -L 2 /dev/zero -o 0x12345678:0x678900202

2) After about 5 secs (to allow m0_client_init() to finish and start the i/o)
   kill -9 <PID> one of m0d-ios processes.

Panic msg: `((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)`

Stack:

    Seagate#3  0x00007f9454ccbe37 in m0_panic (ctx=ctx@entry=0x7f94550e15a0 <__pctx.8664>) at lib/assert.c:52
    Seagate#4  0x00007f9454c0747c in btree_save (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90, op=op@entry=0x7f944ca77ec0,
        val=val@entry=0x7f944ca77e70, anchor=0x0, optype=BTREE_SAVE_UPDATE, zonemask=2, key=<optimized out>, key=<optimized out>)
        at be/btree.c:339
    Seagate#5  0x00007f9454c086f7 in m0_be_btree_update (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90,
        op=op@entry=0x7f944ca77ec0, key=key@entry=0x7f944ca77e60, val=val@entry=0x7f944ca77e70) at be/btree.c:1952
    Seagate#6  0x00007f9454bfd62b in btree_update_sync (val=0x7f944ca77e70, key=0x7f944ca77e60, tx=0x7f94200a7f90, tree=0x40000010ed80)
        at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync (cb=cb@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90, gi=gi@entry=0x13ff860) at balloc/balloc.c:928
    Seagate#8  0x00007f9454bfe36e in balloc_free_db_update (motr=motr@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90,
        grp=grp@entry=0x13ff860, tgt=tgt@entry=0x7f944ca78470, alloc_flag=<optimized out>) at balloc/balloc.c:1934
    Seagate#9  0x00007f9454bff9c6 in balloc_free_internal (req=<synthetic pointer>, req=<synthetic pointer>, tx=0x7f94200a7f90,
        ctx=0x40000010eb40) at balloc/balloc.c:2716
    Seagate#10 balloc_free (ballroom=0x40000010ec68, tx=0x7f94200a7f88, ext=0x7f944ca78560) at balloc/balloc.c:2929
    Seagate#11 0x00007f9454d97681 in stob_ad_bfree (adom=<optimized out>, adom=<optimized out>, ext=0x7f944ca78530, ext=0x7f944ca78530,
        tx=0x7f94200a7f88) at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (tx=0x7f94200a7f88, adom=<optimized out>, ext=ext@entry=0x7f944ca79160, val=1594, seg=<optimized out>)
        at stob/ad.c:1647
    Seagate#13 0x00007f9454d9783d in __lambda (seg=0x7f944ca79150) at stob/ad.c:1719
    Seagate#14 0x00007f9454c10802 in m0_be_emap_paste (it=it@entry=0x7f944ca79140, tx=0x7f94200a7f90, ext=ext@entry=0x7f944ca78a90,
        val=1794, del=del@entry=0x7f944ca78b1c, cut_left=cut_left@entry=0x7f944ca78b38, cut_right=0x7f944ca78b54)
        at be/extmap.c:628
    Seagate#15 0x00007f9454d9a546 in stob_ad_write_map_ext (orig=<optimized out>, off=464, adom=0x4000001120d8) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map (map=0x7f944ca78900, frags=18, wc=0x7f944ca78920, dst=0x7f944ca789b0, adom=0x4000001120d8,
        io=0x7f94340b4298) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare (map=0x7f944ca78900, src=0x7f944ca78970, adom=0x4000001120d8, io=<optimized out>) at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare (io=<optimized out>) at stob/ad.c:2052
    Seagate#19 0x00007f9454d9ca47 in m0_stob_io_prepare (io=io@entry=0x7f94340b4298, obj=obj@entry=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:178
    Seagate#20 0x00007f9454d9ce92 in m0_stob_io_prepare_and_launch (io=io@entry=0x7f94340b4298, obj=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:226
    Seagate#21 0x00007f9454cb702c in io_launch (fom=0x7f94200a7ec0) at ioservice/io_foms.c:1837
    Seagate#22 0x00007f9454cb47a0 in m0_io_fom_cob_rw_tick (fom=0x7f94200a7ec0) at ioservice/io_foms.c:2333
    Seagate#23 0x00007f9454c9edf1 in fom_exec (fom=0x7f94200a7ec0) at fop/fom.c:791
    Seagate#24 loc_handler_thread (th=0x11ed150) at fop/fom.c:931

RCA: the regression of BE credit calculation in
stob/ad.c:stob_ad_write_credit() code was introduced at
commit ab22d23.

Solution: rollback the change in stob/ad.c:stob_ad_write_credit()
introduced at commit ab22d23.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this issue Apr 27, 2022
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    Seagate#3  0x00007f9454ccbe37 in m0_panic (ctx=ctx@entry=0x7f94550e15a0 <__pctx.8664>) at lib/assert.c:52
    Seagate#4  0x00007f9454c0747c in btree_save (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90, op=op@entry=0x7f944ca77ec0,
        val=val@entry=0x7f944ca77e70, anchor=0x0, optype=BTREE_SAVE_UPDATE, zonemask=2, key=<optimized out>, key=<optimized out>)
        at be/btree.c:339
    Seagate#5  0x00007f9454c086f7 in m0_be_btree_update (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90,
        op=op@entry=0x7f944ca77ec0, key=key@entry=0x7f944ca77e60, val=val@entry=0x7f944ca77e70) at be/btree.c:1952
    Seagate#6  0x00007f9454bfd62b in btree_update_sync (val=0x7f944ca77e70, key=0x7f944ca77e60, tx=0x7f94200a7f90, tree=0x40000010ed80)
        at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync (cb=cb@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90, gi=gi@entry=0x13ff860) at balloc/balloc.c:928
    Seagate#8  0x00007f9454bfe36e in balloc_free_db_update (motr=motr@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90,
        grp=grp@entry=0x13ff860, tgt=tgt@entry=0x7f944ca78470, alloc_flag=<optimized out>) at balloc/balloc.c:1934
    Seagate#9  0x00007f9454bff9c6 in balloc_free_internal (req=<synthetic pointer>, req=<synthetic pointer>, tx=0x7f94200a7f90,
        ctx=0x40000010eb40) at balloc/balloc.c:2716
    Seagate#10 balloc_free (ballroom=0x40000010ec68, tx=0x7f94200a7f88, ext=0x7f944ca78560) at balloc/balloc.c:2929
    Seagate#11 0x00007f9454d97681 in stob_ad_bfree (adom=<optimized out>, adom=<optimized out>, ext=0x7f944ca78530, ext=0x7f944ca78530,
        tx=0x7f94200a7f88) at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (tx=0x7f94200a7f88, adom=<optimized out>, ext=ext@entry=0x7f944ca79160, val=1594, seg=<optimized out>)
        at stob/ad.c:1647
    Seagate#13 0x00007f9454d9783d in __lambda (seg=0x7f944ca79150) at stob/ad.c:1719
    Seagate#14 0x00007f9454c10802 in m0_be_emap_paste (it=it@entry=0x7f944ca79140, tx=0x7f94200a7f90, ext=ext@entry=0x7f944ca78a90,
        val=1794, del=del@entry=0x7f944ca78b1c, cut_left=cut_left@entry=0x7f944ca78b38, cut_right=0x7f944ca78b54)
        at be/extmap.c:628
    Seagate#15 0x00007f9454d9a546 in stob_ad_write_map_ext (orig=<optimized out>, off=464, adom=0x4000001120d8) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map (map=0x7f944ca78900, frags=18, wc=0x7f944ca78920, dst=0x7f944ca789b0, adom=0x4000001120d8,
        io=0x7f94340b4298) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare (map=0x7f944ca78900, src=0x7f944ca78970, adom=0x4000001120d8, io=<optimized out>) at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare (io=<optimized out>) at stob/ad.c:2052
    Seagate#19 0x00007f9454d9ca47 in m0_stob_io_prepare (io=io@entry=0x7f94340b4298, obj=obj@entry=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:178
    Seagate#20 0x00007f9454d9ce92 in m0_stob_io_prepare_and_launch (io=io@entry=0x7f94340b4298, obj=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:226
    Seagate#21 0x00007f9454cb702c in io_launch (fom=0x7f94200a7ec0) at ioservice/io_foms.c:1837
    Seagate#22 0x00007f9454cb47a0 in m0_io_fom_cob_rw_tick (fom=0x7f94200a7ec0) at ioservice/io_foms.c:2333
    Seagate#23 0x00007f9454c9edf1 in fom_exec (fom=0x7f94200a7ec0) at fop/fom.c:791
    Seagate#24 loc_handler_thread (th=0x11ed150) at fop/fom.c:931

Setup: 1-node, 5 ios processes configured with 2 disks each,
       4+2+0 EC data pool.

Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this issue Apr 27, 2022
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    Seagate#3  0x00007f9454ccbe37 in m0_panic (ctx=ctx@entry=0x7f94550e15a0 <__pctx.8664>) at lib/assert.c:52
    Seagate#4  0x00007f9454c0747c in btree_save (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90, op=op@entry=0x7f944ca77ec0,
        val=val@entry=0x7f944ca77e70, anchor=0x0, optype=BTREE_SAVE_UPDATE, zonemask=2, key=<optimized out>, key=<optimized out>)
        at be/btree.c:339
    Seagate#5  0x00007f9454c086f7 in m0_be_btree_update (tree=tree@entry=0x40000010ed80, tx=tx@entry=0x7f94200a7f90,
        op=op@entry=0x7f944ca77ec0, key=key@entry=0x7f944ca77e60, val=val@entry=0x7f944ca77e70) at be/btree.c:1952
    Seagate#6  0x00007f9454bfd62b in btree_update_sync (val=0x7f944ca77e70, key=0x7f944ca77e60, tx=0x7f94200a7f90, tree=0x40000010ed80)
        at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync (cb=cb@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90, gi=gi@entry=0x13ff860) at balloc/balloc.c:928
    Seagate#8  0x00007f9454bfe36e in balloc_free_db_update (motr=motr@entry=0x40000010eb40, tx=tx@entry=0x7f94200a7f90,
        grp=grp@entry=0x13ff860, tgt=tgt@entry=0x7f944ca78470, alloc_flag=<optimized out>) at balloc/balloc.c:1934
    Seagate#9  0x00007f9454bff9c6 in balloc_free_internal (req=<synthetic pointer>, req=<synthetic pointer>, tx=0x7f94200a7f90,
        ctx=0x40000010eb40) at balloc/balloc.c:2716
    Seagate#10 balloc_free (ballroom=0x40000010ec68, tx=0x7f94200a7f88, ext=0x7f944ca78560) at balloc/balloc.c:2929
    Seagate#11 0x00007f9454d97681 in stob_ad_bfree (adom=<optimized out>, adom=<optimized out>, ext=0x7f944ca78530, ext=0x7f944ca78530,
        tx=0x7f94200a7f88) at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (tx=0x7f94200a7f88, adom=<optimized out>, ext=ext@entry=0x7f944ca79160, val=1594, seg=<optimized out>)
        at stob/ad.c:1647
    Seagate#13 0x00007f9454d9783d in __lambda (seg=0x7f944ca79150) at stob/ad.c:1719
    Seagate#14 0x00007f9454c10802 in m0_be_emap_paste (it=it@entry=0x7f944ca79140, tx=0x7f94200a7f90, ext=ext@entry=0x7f944ca78a90,
        val=1794, del=del@entry=0x7f944ca78b1c, cut_left=cut_left@entry=0x7f944ca78b38, cut_right=0x7f944ca78b54)
        at be/extmap.c:628
    Seagate#15 0x00007f9454d9a546 in stob_ad_write_map_ext (orig=<optimized out>, off=464, adom=0x4000001120d8) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map (map=0x7f944ca78900, frags=18, wc=0x7f944ca78920, dst=0x7f944ca789b0, adom=0x4000001120d8,
        io=0x7f94340b4298) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare (map=0x7f944ca78900, src=0x7f944ca78970, adom=0x4000001120d8, io=<optimized out>) at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare (io=<optimized out>) at stob/ad.c:2052
    Seagate#19 0x00007f9454d9ca47 in m0_stob_io_prepare (io=io@entry=0x7f94340b4298, obj=obj@entry=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:178
    Seagate#20 0x00007f9454d9ce92 in m0_stob_io_prepare_and_launch (io=io@entry=0x7f94340b4298, obj=0x7f94341170a0,
        tx=tx@entry=0x7f94200a7f88, scope=scope@entry=0x0) at stob/io.c:226
    Seagate#21 0x00007f9454cb702c in io_launch (fom=0x7f94200a7ec0) at ioservice/io_foms.c:1837
    Seagate#22 0x00007f9454cb47a0 in m0_io_fom_cob_rw_tick (fom=0x7f94200a7ec0) at ioservice/io_foms.c:2333
    Seagate#23 0x00007f9454c9edf1 in fom_exec (fom=0x7f94200a7ec0) at fop/fom.c:791
    Seagate#24 loc_handler_thread (th=0x11ed150) at fop/fom.c:931

Setup: 1-node, 4+2+0 EC data pool with 10 disks.
Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this issue Apr 27, 2022
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    Seagate#3  m0_panic() at lib/assert.c:52
    Seagate#4  btree_save() at be/btree.c:339
    Seagate#5  m0_be_btree_update() at be/btree.c:1952
    Seagate#6  btree_update_sync() at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync() at balloc/balloc.c:928
    Seagate#8  balloc_free_db_update() at balloc/balloc.c:1934
    Seagate#9  balloc_free_internal() at balloc/balloc.c:2716
    Seagate#10 balloc_free() at balloc/balloc.c:2929
    Seagate#11 stob_ad_bfree() at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (val=1594) at stob/ad.c:1647
    Seagate#13 __lambda() at stob/ad.c:1719
    Seagate#14 m0_be_emap_paste(val=1794) at be/extmap.c:628
    Seagate#15 stob_ad_write_map_ext(off=464) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map(frags=18) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare() at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare() at stob/ad.c:2052
    Seagate#19 m0_stob_io_prepare() at stob/io.c:178
    Seagate#20 m0_stob_io_prepare_and_launch() at stob/io.c:226
    Seagate#21 io_launch() at ioservice/io_foms.c:1837
    Seagate#22 m0_io_fom_cob_rw_tick() at ioservice/io_foms.c:2333
    Seagate#23 fom_exec() at fop/fom.c:791
    Seagate#24 loc_handler_thread() at fop/fom.c:931

Setup: 1 node, 4+2+0 EC data pool with 10 disks.
Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Co-authored-by: Madhavrao Vemuri <madhav.vemuri@seagate.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this issue Apr 27, 2022
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    Seagate#3  m0_panic() at lib/assert.c:52
    Seagate#4  btree_save() at be/btree.c:339
    Seagate#5  m0_be_btree_update() at be/btree.c:1952
    Seagate#6  btree_update_sync() at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync() at balloc/balloc.c:928
    Seagate#8  balloc_free_db_update() at balloc/balloc.c:1934
    Seagate#9  balloc_free_internal() at balloc/balloc.c:2716
    Seagate#10 balloc_free() at balloc/balloc.c:2929
    Seagate#11 stob_ad_bfree() at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (val=1594) at stob/ad.c:1647
    Seagate#13 __lambda() at stob/ad.c:1719
    Seagate#14 m0_be_emap_paste(val=1794) at be/extmap.c:628
    Seagate#15 stob_ad_write_map_ext(off=464) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map(frags=18) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare() at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare() at stob/ad.c:2052
    Seagate#19 m0_stob_io_prepare() at stob/io.c:178
    Seagate#20 m0_stob_io_prepare_and_launch() at stob/io.c:226
    Seagate#21 io_launch() at ioservice/io_foms.c:1837
    Seagate#22 m0_io_fom_cob_rw_tick() at ioservice/io_foms.c:2333
    Seagate#23 fom_exec() at fop/fom.c:791
    Seagate#24 loc_handler_thread() at fop/fom.c:931

Setup: 1 node, 4+2+0 EC data pool with 10 disks.
Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Co-authored-by: Madhavrao Vemuri <madhav.vemuri@seagate.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
@nkommuri
Copy link

Requirement is to host Seagate/cortx-motr documentation via GitHub Pages (a free web service by GitHub for hosting GitHub repositories).
Cortx-motr uses “Doxygen” tool to generate documentation. Idea is, to host Seagate/cortx-motr repository’s documentation via GitHub Pages and generate documentation automatically upon every commit, which will get reflected in GitHub pages automatically.

To achieve this, we need to commit documentation files, generated by Doxygen, into Seagate/cortx-motr repository. But the problem is, when we run Doxygen tool on cortx-motr repo, it is producing 100K (new) files, whereas we have only 4K files currently in cortx-motr repo.
We may run into maintenance issues if we commit 100K new files into cortx-motr repository

Alternatives..
• Create a separate repository, Ex: Seagate/cortx-motr-doc, for storing/maintaining doc files produced by Doxygen tool. We need to update this new repository periodically
• Instead of hosting Seagate/cortx-motr in GitHub pages, create an rpm file containing Doxygen documentation, and deliver it with every release.

@johnbent please provide your suggestion/feedback.

@johnbent
Copy link
Contributor Author

I think the github pages approach will be best. Will it still be a problem if the 100K pages go into a doc branch instead of main? I definitely don't like the RPM approach because we want to host it somewhere. The separate repo might be OK but it's disappointing that it doesn't get updated automatically on each commit.

@osowski , thoughts on this?

@stale
Copy link

stale bot commented May 2, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@stale stale bot added the needs-attention label May 2, 2022
madhavemuri pushed a commit that referenced this issue May 4, 2022
Problem: Following panic occurs if update operation is done,
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    #3  m0_panic() at lib/assert.c:52
    #4  btree_save() at be/btree.c:339
    #5  m0_be_btree_update() at be/btree.c:1952
    #6  btree_update_sync() at balloc/balloc.c:95
    #7  balloc_gi_sync() at balloc/balloc.c:928
    #8  balloc_free_db_update() at balloc/balloc.c:1934
    #9  balloc_free_internal() at balloc/balloc.c:2716
    #10 balloc_free() at balloc/balloc.c:2929
    #11 stob_ad_bfree() at stob/ad.c:1098
    #12 stob_ad_seg_free (val=1594) at stob/ad.c:1647
    #13 __lambda() at stob/ad.c:1719
    #14 m0_be_emap_paste(val=1794) at be/extmap.c:628
    #15 stob_ad_write_map_ext(off=464) at stob/ad.c:1731
    #16 stob_ad_write_map(frags=18) at stob/ad.c:1858
    #17 stob_ad_write_prepare() at stob/ad.c:2006
    #18 stob_ad_io_launch_prepare() at stob/ad.c:2052
    #19 m0_stob_io_prepare() at stob/io.c:178
    #20 m0_stob_io_prepare_and_launch() at stob/io.c:226
    #21 io_launch() at ioservice/io_foms.c:1837
    #22 m0_io_fom_cob_rw_tick() at ioservice/io_foms.c:2333
    #23 fom_exec() at fop/fom.c:791
    #24 loc_handler_thread() at fop/fom.c:931

Setup: 1 node, 4+2+0 EC data pool with 10 disks.
Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
andriytk added a commit to andriytk/cortx-motr that referenced this issue May 10, 2022
Panic: ((cr->tc_balance[cu]) != 0) at btree_save() (be/btree.c:1393)

Stack:

    Seagate#3  m0_panic() at lib/assert.c:52
    Seagate#4  btree_save() at be/btree.c:339
    Seagate#5  m0_be_btree_update() at be/btree.c:1952
    Seagate#6  btree_update_sync() at balloc/balloc.c:95
    Seagate#7  balloc_gi_sync() at balloc/balloc.c:928
    Seagate#8  balloc_free_db_update() at balloc/balloc.c:1934
    Seagate#9  balloc_free_internal() at balloc/balloc.c:2716
    Seagate#10 balloc_free() at balloc/balloc.c:2929
    Seagate#11 stob_ad_bfree() at stob/ad.c:1098
    Seagate#12 stob_ad_seg_free (val=1594) at stob/ad.c:1647
    Seagate#13 __lambda() at stob/ad.c:1719
    Seagate#14 m0_be_emap_paste(val=1794) at be/extmap.c:628
    Seagate#15 stob_ad_write_map_ext(off=464) at stob/ad.c:1731
    Seagate#16 stob_ad_write_map(frags=18) at stob/ad.c:1858
    Seagate#17 stob_ad_write_prepare() at stob/ad.c:2006
    Seagate#18 stob_ad_io_launch_prepare() at stob/ad.c:2052
    Seagate#19 m0_stob_io_prepare() at stob/io.c:178
    Seagate#20 m0_stob_io_prepare_and_launch() at stob/io.c:226
    Seagate#21 io_launch() at ioservice/io_foms.c:1837
    Seagate#22 m0_io_fom_cob_rw_tick() at ioservice/io_foms.c:2333
    Seagate#23 fom_exec() at fop/fom.c:791
    Seagate#24 loc_handler_thread() at fop/fom.c:931

Setup: 1 node, 4+2+0 EC data pool with 10 disks.
Scenario: write the same object twice like this:

    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207
    $ m0cp <motr-conn-params> -s 1m -c 40 -L 4 /dev/zero -o 0x12345678:0x678900207 -u

RCA: regression of BE credit calculation in stob_ad_write_credit()
code was introduced at commit ab22d23.

Solution: rollback the regression change.

Co-authored-by: Madhavrao Vemuri <madhav.vemuri@seagate.com>
Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com>
mehjoshi pushed a commit that referenced this issue Jun 7, 2022
… gc callback of (#1820)

Problem : In m0_be_op_fini, when bos_tlink_fini is performed then its expected that bo_set_link should not have link for link for parent's m0_be_op::bo_children.

State seen at the time of crash:
Two gft_pd_io in progress state, with corresponding two bio in sched queue; crash is hit while performing the gc callback processing for gft whhose gft_pd_io is in progress state and bio is queued behind an active io.

Panic:
2022-04-24 11:19:15,672 - motr[00107]: e2e0 FATAL [lib/assert.c:50:m0_panic] panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() (lib/list.c:178) [git: 2.0.0-670-27-g0012fe90] /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,672 - Motr panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() lib/list.c:178 (errno: 0) (last failed: none) [git: 2.0.0-670-27-g0012fe90] pid: 107 /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f7514e79c83]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f7514e79e59]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_panic+0x13d)[0x7f7514e6890d]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x3895f6)[0x7f7514e6c5f6]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_be_op_fini+0x1f)[0x7f7514dae66f]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c4c5b)[0x7f7514da7c5b]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c300a)[0x7f7514da600a]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c3119)[0x7f7514da6119]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386f7f)[0x7f7514e69f7f]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386ffa)[0x7f7514e69ffa]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(m0_chan_broadcast_lock+0x1d)[0x7f7514e6a08d]

Backtrace:
(gdb) bt
#0 0x00007f7512d8938f in raise () from /lib64/libc.so.6
#1 0x00007f7512d73dc5 in abort () from /lib64/libc.so.6
#2 0x00007f7514e79e63 in m0_arch_panic (c=c@entry=0x7f751531ade0 <__pctx.4611>, ap=ap@entry=0x7f74afffe390)
at lib/user_space/uassert.c:131
#3 0x00007f7514e6890d in m0_panic (ctx=ctx@entry=0x7f751531ade0 <__pctx.4611>) at lib/assert.c:52
#4 0x00007f7514e6c5f6 in m0_list_link_fini (link=) at lib/list.c:178
#5 0x00007f7514e70310 in m0_tlink_fini (d=d@entry=0x7f75152880a0 <bos_tl>, obj=obj@entry=0x56523e641a90) at lib/tlist.c:283
#6 0x00007f7514dae66f in bos_tlink_fini (amb=0x56523e641a90) at be/op.c:109
#7 m0_be_op_fini (op=0x56523e641a90) at be/op.c:109
#8 0x00007f7514dae826 in be_op_state_change (op=, state=state@entry=M0_BOS_DONE) at be/op.c:213
#9 0x00007f7514daea17 in m0_be_op_done (op=) at be/op.c:231
#10 0x00007f7514da7c5b in be_io_sched_cb (op=op@entry=0x56523e5f7870, param=param@entry=0x56523e5f7798) at be/io_sched.c:141
#11 0x00007f7514dae826 in be_op_state_change (op=op@entry=0x56523e5f7870, state=state@entry=M0_BOS_DONE) at be/op.c:213
#12 0x00007f7514daea17 in m0_be_op_done (op=op@entry=0x56523e5f7870) at be/op.c:231
#13 0x00007f7514da600a in be_io_finished (bio=bio@entry=0x56523e5f7798) at be/io.c:555
#14 0x00007f7514da6119 in be_io_cb (link=0x56523e61ac60) at be/io.c:587
#15 0x00007f7514e69f7f in clink_signal (clink=clink@entry=0x56523e61ac60) at lib/chan.c:135
#16 0x00007f7514e69ffa in chan_signal_nr (chan=chan@entry=0x56523e61ab58, nr=0) at lib/chan.c:154
#17 0x00007f7514e6a06c in m0_chan_broadcast (chan=chan@entry=0x56523e61ab58) at lib/chan.c:174
#18 0x00007f7514e6a08d in m0_chan_broadcast_lock (chan=chan@entry=0x56523e61ab58) at lib/chan.c:181
#19 0x00007f7514f4209a in ioq_complete (res2=, res=, qev=, ioq=0x56523e5de610)
at stob/ioq.c:587
#20 stob_ioq_thread (ioq=0x56523e5de610) at stob/ioq.c:669
#21 0x00007f7514e6f49e in m0_thread_trampoline (arg=arg@entry=0x56523e5de6e8) at lib/thread.c:117
#22 0x00007f7514e7ab11 in uthread_trampoline (arg=0x56523e5de6e8) at lib/user_space/uthread.c:98
#23 0x00007f751454915a in start_thread () from /lib64/libpthread.so.0
#24 0x00007f7512e4edd3 in clone () from /lib64/libc.so.6

RCA - Sequence of Events:

be_tx_group_format_seg_io_op_gc invoked for gft_pd_io_op of tx_group_fom_1 (last_child is false)
(gdb) p &((struct m0_be_group_format *)cb_gc_param)->gft_pd_io_op
$29 = (struct m0_be_op *) 0x56523e641a90

be_tx_group_format_seg_io_op_gc handling of gft_pd_io_op invokes m0_be_op_done for gft_tmp_op (no callbacks for gft_tmp_op) but now last_child is set true for parent as its both child (gft_tmp_op and gft_pd_io_op) op dones have been invoked

m0_be_op_done handling of gft_tmp_op invokes be_op_state_change with M0_BOS_DONE for parent(tgf_op)

During be_op_state_change processing for main parent tgf_op, m0_sm_state_set will update bo_sm state and it will unblock the tx_group_fom_1 by triggering op->bo_sm.sm_chan
This recursive callback processing happens in context of stob_ioq_thread which is initialized on M0_STOB_IOQ_NR_THREADS.
Here due to invocation of gft_tmp_op (i.e peer) child done processing from gft_pd_io_op child gc processing results in their parent early callback invocation.

Parent Callback Prcoseeing:
6. This now unblocks tx_group_fom_1 which will lead to m0_be_pd_io_put in m0_be_group_format_reset and and tx_group_fom_1 will move to TGS_OPEN.
So pd_io and tx_group_fom_1 is now ready for reuse.

Problem window:
7. problem will now occur in window if remaining gc callback processing of gft_pd_io_op
i.e.
m0_be_op_fini(&gft->gft_tmp_op);
m0_be_op_fini(op);
is being done if the pd_io and/or tx_group_fom_1 is reused with new context.

Solution:
Removal of gft_tmp_op altogether will ensure that parent callback processing never invoked ahead of its child callback processing
This way tx_group_fom will always be notifed of seg io completion only after all the relevent child calbback processing is completed and thereby
will avoid the crashes seen in the gc callback processing(be_tx_group_format_seg_io_op_gc) after m0_be_op_done(&gft->gft_tmp_op);
In proposed solution main parent op is made active at the start at the same place where gft_tmp_op was being activated in order to put this parent
into active state; there by making gft_tmp_op redundent and avoiding the out of order execution of child/parent callback executions;
RCA: Due to recursive calls to be_op_state_change where gc callback of gft_op i.e. child1 invokes done callback of gft_tmp_op i.e. child 2 which subsequently results in invocation of parent be_op_state_change. This results in group fom getting completed ahead of child op callback processing. so the subsequently crash is observed when group is reused before child callback processing is finished.

Signed-off-by: Vidyadhar Pinglikar vidyadhar.pinglikar@seagate.com
Copy link

Papan Kumar Singh commented in Jira Server:

create a docs directory in cortx-motr.

moved all files  generated using "make docs" command from cortx-motr/doc/html to 

cortx-motr/docs directory 

Created github page https://papan-singh.github.io/cortx-motr-1/

mehjoshi pushed a commit to mehjoshi/cortx-motr that referenced this issue Jul 18, 2022
… gc callback of (Seagate#1820)

Problem : In m0_be_op_fini, when bos_tlink_fini is performed then its expected that bo_set_link should not have link for link for parent's m0_be_op::bo_children.

State seen at the time of crash:
Two gft_pd_io in progress state, with corresponding two bio in sched queue; crash is hit while performing the gc callback processing for gft whhose gft_pd_io is in progress state and bio is queued behind an active io.

Panic:
2022-04-24 11:19:15,672 - motr[00107]: e2e0 FATAL [lib/assert.c:50:m0_panic] panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() (lib/list.c:178) [git: 2.0.0-670-27-g0012fe90] /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,672 - Motr panic: (!m0_list_link_is_in(link)) at m0_list_link_fini() lib/list.c:178 (errno: 0) (last failed: none) [git: 2.0.0-670-27-g0012fe90] pid: 107 /etc/cortx/log/motr/0696b1d9e4744c59a92cb2bdded112ac/trace/m0d-0x7200000000000001:0x2e/m0trace.107
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_backtrace+0x33)[0x7f7514e79c83]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_arch_panic+0xe9)[0x7f7514e79e59]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_panic+0x13d)[0x7f7514e6890d]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x3895f6)[0x7f7514e6c5f6]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(m0_be_op_fini+0x1f)[0x7f7514dae66f]
2022-04-24 11:19:15,706 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c4c5b)[0x7f7514da7c5b]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2cb826)[0x7f7514dae826]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c300a)[0x7f7514da600a]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x2c3119)[0x7f7514da6119]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386f7f)[0x7f7514e69f7f]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(+0x386ffa)[0x7f7514e69ffa]
2022-04-24 11:19:15,707 - /lib64/libmotr.so.2(m0_chan_broadcast_lock+0x1d)[0x7f7514e6a08d]

Backtrace:
(gdb) bt
#0 0x00007f7512d8938f in raise () from /lib64/libc.so.6
Seagate#1 0x00007f7512d73dc5 in abort () from /lib64/libc.so.6
Seagate#2 0x00007f7514e79e63 in m0_arch_panic (c=c@entry=0x7f751531ade0 <__pctx.4611>, ap=ap@entry=0x7f74afffe390)
at lib/user_space/uassert.c:131
Seagate#3 0x00007f7514e6890d in m0_panic (ctx=ctx@entry=0x7f751531ade0 <__pctx.4611>) at lib/assert.c:52
Seagate#4 0x00007f7514e6c5f6 in m0_list_link_fini (link=) at lib/list.c:178
Seagate#5 0x00007f7514e70310 in m0_tlink_fini (d=d@entry=0x7f75152880a0 <bos_tl>, obj=obj@entry=0x56523e641a90) at lib/tlist.c:283
Seagate#6 0x00007f7514dae66f in bos_tlink_fini (amb=0x56523e641a90) at be/op.c:109
Seagate#7 m0_be_op_fini (op=0x56523e641a90) at be/op.c:109
Seagate#8 0x00007f7514dae826 in be_op_state_change (op=, state=state@entry=M0_BOS_DONE) at be/op.c:213
Seagate#9 0x00007f7514daea17 in m0_be_op_done (op=) at be/op.c:231
Seagate#10 0x00007f7514da7c5b in be_io_sched_cb (op=op@entry=0x56523e5f7870, param=param@entry=0x56523e5f7798) at be/io_sched.c:141
Seagate#11 0x00007f7514dae826 in be_op_state_change (op=op@entry=0x56523e5f7870, state=state@entry=M0_BOS_DONE) at be/op.c:213
Seagate#12 0x00007f7514daea17 in m0_be_op_done (op=op@entry=0x56523e5f7870) at be/op.c:231
Seagate#13 0x00007f7514da600a in be_io_finished (bio=bio@entry=0x56523e5f7798) at be/io.c:555
Seagate#14 0x00007f7514da6119 in be_io_cb (link=0x56523e61ac60) at be/io.c:587
Seagate#15 0x00007f7514e69f7f in clink_signal (clink=clink@entry=0x56523e61ac60) at lib/chan.c:135
Seagate#16 0x00007f7514e69ffa in chan_signal_nr (chan=chan@entry=0x56523e61ab58, nr=0) at lib/chan.c:154
Seagate#17 0x00007f7514e6a06c in m0_chan_broadcast (chan=chan@entry=0x56523e61ab58) at lib/chan.c:174
Seagate#18 0x00007f7514e6a08d in m0_chan_broadcast_lock (chan=chan@entry=0x56523e61ab58) at lib/chan.c:181
Seagate#19 0x00007f7514f4209a in ioq_complete (res2=, res=, qev=, ioq=0x56523e5de610)
at stob/ioq.c:587
Seagate#20 stob_ioq_thread (ioq=0x56523e5de610) at stob/ioq.c:669
Seagate#21 0x00007f7514e6f49e in m0_thread_trampoline (arg=arg@entry=0x56523e5de6e8) at lib/thread.c:117
Seagate#22 0x00007f7514e7ab11 in uthread_trampoline (arg=0x56523e5de6e8) at lib/user_space/uthread.c:98
Seagate#23 0x00007f751454915a in start_thread () from /lib64/libpthread.so.0
Seagate#24 0x00007f7512e4edd3 in clone () from /lib64/libc.so.6

RCA - Sequence of Events:

be_tx_group_format_seg_io_op_gc invoked for gft_pd_io_op of tx_group_fom_1 (last_child is false)
(gdb) p &((struct m0_be_group_format *)cb_gc_param)->gft_pd_io_op
$29 = (struct m0_be_op *) 0x56523e641a90

be_tx_group_format_seg_io_op_gc handling of gft_pd_io_op invokes m0_be_op_done for gft_tmp_op (no callbacks for gft_tmp_op) but now last_child is set true for parent as its both child (gft_tmp_op and gft_pd_io_op) op dones have been invoked

m0_be_op_done handling of gft_tmp_op invokes be_op_state_change with M0_BOS_DONE for parent(tgf_op)

During be_op_state_change processing for main parent tgf_op, m0_sm_state_set will update bo_sm state and it will unblock the tx_group_fom_1 by triggering op->bo_sm.sm_chan
This recursive callback processing happens in context of stob_ioq_thread which is initialized on M0_STOB_IOQ_NR_THREADS.
Here due to invocation of gft_tmp_op (i.e peer) child done processing from gft_pd_io_op child gc processing results in their parent early callback invocation.

Parent Callback Prcoseeing:
6. This now unblocks tx_group_fom_1 which will lead to m0_be_pd_io_put in m0_be_group_format_reset and and tx_group_fom_1 will move to TGS_OPEN.
So pd_io and tx_group_fom_1 is now ready for reuse.

Problem window:
7. problem will now occur in window if remaining gc callback processing of gft_pd_io_op
i.e.
m0_be_op_fini(&gft->gft_tmp_op);
m0_be_op_fini(op);
is being done if the pd_io and/or tx_group_fom_1 is reused with new context.

Solution:
Removal of gft_tmp_op altogether will ensure that parent callback processing never invoked ahead of its child callback processing
This way tx_group_fom will always be notifed of seg io completion only after all the relevent child calbback processing is completed and thereby
will avoid the crashes seen in the gc callback processing(be_tx_group_format_seg_io_op_gc) after m0_be_op_done(&gft->gft_tmp_op);
In proposed solution main parent op is made active at the start at the same place where gft_tmp_op was being activated in order to put this parent
into active state; there by making gft_tmp_op redundent and avoiding the out of order execution of child/parent callback executions;
RCA: Due to recursive calls to be_op_state_change where gc callback of gft_op i.e. child1 invokes done callback of gft_tmp_op i.e. child 2 which subsequently results in invocation of parent be_op_state_change. This results in group fom getting completed ahead of child op callback processing. so the subsequently crash is observed when group is reused before child callback processing is finished.

Signed-off-by: Vidyadhar Pinglikar vidyadhar.pinglikar@seagate.com
@nkommuri
Copy link

nkommuri commented Aug 9, 2022

We tried to host cortx-motr documentation in Github pages by manually committing 100k doc files into Github repo. Github Page link is provided in above comment [ https://github.com//issues/22#issuecomment-1159988919 ].
But Github pages is not able to load our documentation as we have way too many files [ > 1,00,000 ]. Github pages was able to load some doc pages but failed to load some.

For the time being, any user can run "make doc" to generation documentation and host it on their local machines. Instructions on how to generate the documentation, is provided in cortx-motr quick start guide (https://github.com/Seagate/cortx-motr/blob/main/doc/Quick-Start-Guide.rst) under "Build the documentation" section.

Closing the ticket.

@stale stale bot removed the needs-attention label Aug 9, 2022
@anaghadeshmukh
Copy link

anaghadeshmukh commented Aug 9, 2022

Thanks Kishore.
@johnbent can we close this ticket since the purpose of this tickrt is completed which was to setup doxygen for community?

@stale
Copy link

stale bot commented Aug 16, 2022

This issue/pull request has been marked as needs attention as it has been left pending without new activity for 4 days. Tagging @nkommuri @mehjoshi @huanghua78 for appropriate assignment. Sorry for the delay & Thank you for contributing to CORTX. We will get back to you as soon as possible.

@anaghadeshmukh
Copy link

@johnbent I am closing the issue as per above comment.

@anaghadeshmukh
Copy link

Closing as original purpose of the ticket is completed.

Copy link

Chandradhar Raval commented in Jira Server:

Marking this issue Closed and corresponding Github issue is closed now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs-attention rose Status: Fix Scheduled Will be fixed in an upcoming release or sooner Triage: DevTeam Triage owner is on the dev team
Projects
None yet
Development

No branches or pull requests

9 participants