Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS_Documentation_Integration #4

Closed
wants to merge 30 commits into from
Closed

Conversation

cdurf1
Copy link
Contributor

@cdurf1 cdurf1 commented May 2, 2017

Documentation to accompany the DAOS source is placed. Changes include the HLD sections and linking.

GitHuaKuang added a commit that referenced this pull request Jan 20, 2021
For these directoriesi and their sub_dirs:
daos/src/bio
daos/src/cart
daos/src/client
daos/src/common
daos/src/container

Skip-build: true
Skip-test: true

Change-Id: I319a445c69a540266d953d4eaa5f06ffcda6323a
Signed-off-by: Hua Kuang <hua.kuang@intel.com>
jolivier23 pushed a commit that referenced this pull request Jan 27, 2021
For these directories and their sub_dirs:
daos/src/bio
daos/src/cart
daos/src/client
daos/src/common
daos/src/container

Signed-off-by: Hua Kuang <hua.kuang@intel.com>
liw added a commit that referenced this pull request Jun 7, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <wei.g.li@intel.com>
shimizukko pushed a commit that referenced this pull request Jun 28, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Skip-test: true
Signed-off-by: Li Wei <wei.g.li@intel.com>
liw added a commit that referenced this pull request Jul 6, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <wei.g.li@intel.com>
jolivier23 pushed a commit that referenced this pull request Jul 19, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <wei.g.li@intel.com>
liw added a commit that referenced this pull request Jul 20, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <wei.g.li@intel.com>
jolivier23 pushed a commit that referenced this pull request Jul 21, 2022
Makito and Samir observed the following assertion failure after
restarting engines.

  #0  raise () from /lib64/libc.so.6
  #1  abort () from /lib64/libc.so.6
  #2  __assert_fail_base () from /lib64/libc.so.6
  #3  __assert_fail () from /lib64/libc.so.6
  #4  pool_map_get_version (map=0x0) at src/common/pool_map.c:2852
  #5  ds_pool_get_version (pool=0x7f0ca063c690, pool=0x7f0ca063c690) at
      src/include/daos_srv/pool.h:296
  #6  pc=rpc@entry=0x7f0ca0998d30, p_rpt=p_rpt@entry=0x7f0ca83a77b0) at
      src/rebuild/srv.c:2101
  #7  rebuild_tgt_scan_handler (rpc=0x7f0ca0998d30) at
      src/rebuild/scan.c:954
  #8  crt_handle_rpc (arg=0x7f0ca0998d30) at src/cart/crt_rpc.c:1654
  #9  ABTD_ythread_func_wrapper (p_arg=0x7f0ca83a78a0) at
      arch/abtd_ythread.c:21
  #10 make_fcontext () from /usr/lib64/libabt.so.1
  #11 ?? ()

The ds_pool_get_version call passed a NULL map argument to
pool_map_get_version. The ds_pool.sp_map field may be NULL after the
pool is started but before the pool receives the initial pool map from
the pool service. This patch fixes ds_pool_get_version to return 0,
which is less than all valid pool map versions, when sp_map is NULL,
resulting in rebuild retries like this:

  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [failed] (pool 3bf68c9c ver=2 status=DER_BUSY(-1012): 'Device
    or resource busy')
  Rebuild [queued] (pool=3bf68c9c ver=2) tgts=2
  Rebuild [started] (pool 3bf68c9c ver=2)
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [scanning] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0, [...]
  Rebuild [completed] (pool 3bf68c9c ver=2, toberb_obj=0, rb_obj=0,[...]
  Target[2] (rank 2 idx 0 status 16 ver 1) is excluded.

Also, this patch removes some rebuild code that handles NULL
ds_pool.sp_group fields. Those can not happen as we always initialize
sp_group (as well as sp_iv_ns) before putting a ds_pool object into the
LRU.

Signed-off-by: Li Wei <wei.g.li@intel.com>
liw added a commit that referenced this pull request Mar 8, 2023
Commit 53d66d7 might have introduced the following "epi != NULL"
assertion failure in crt_context_req_untrack.

  #0  raise () from /usr/lib64/libc.so.6
  #1  abort () from /usr/lib64/libc.so.6
  #2  __assert_fail_base.cold.0 () from /usr/lib64/libc.so.6
  #3  __assert_fail () from /usr/lib64/libc.so.6
  #4  crt_context_req_untrack (rpc_priv=rpc_priv@entry=0x55a254b83150)
      at src/cart/crt_context.c:1358
          crt_ctx = 0x55a25465f500
          epi = 0x0
          submit_list = {next = 0x0, prev = 0x4}
          tmp_rpc = <optimized out>
          rc = <optimized out>
          __func__ = "crt_context_req_untrack"
          __PRETTY_FUNCTION__ = "crt_context_req_untrack"
  #5  crt_hg_req_send_cb (hg_cbinfo=<optimized out>) at
      src/cart/crt_hg.c:1287
          rpc_pub = <optimized out>
          rpc_priv = <optimized out>
          hg_ret = HG_SUCCESS
          rc = 0
          __func__ = "crt_hg_req_send_cb"
          __PRETTY_FUNCTION__ = "crt_hg_req_send_cb"
          __rc = <optimized out>

The RPC might be an outgoing URI_LOOKUP request, which we don't track.
Hence its crp_epi field was NULL as expected. Commit 53d66d7 should have
asserted "epi != NULL" _after_ making sure that the RPC is not an
URI_LOOKUP request. This patch does that and removes a few useless lines
from crt_context_req_untrack_internal.

Test-tag: pr dynamic_server_pool OSAOfflineExtend NvmePoolExtend
Signed-off-by: Li Wei <wei.g.li@intel.com>
Required-githooks: true
frostedcmos pushed a commit that referenced this pull request Mar 9, 2023
Commit 53d66d7 might have introduced the following "epi != NULL"
assertion failure in crt_context_req_untrack.

  #0  raise () from /usr/lib64/libc.so.6
  #1  abort () from /usr/lib64/libc.so.6
  #2  __assert_fail_base.cold.0 () from /usr/lib64/libc.so.6
  #3  __assert_fail () from /usr/lib64/libc.so.6
  #4  crt_context_req_untrack (rpc_priv=rpc_priv@entry=0x55a254b83150)
      at src/cart/crt_context.c:1358
          crt_ctx = 0x55a25465f500
          epi = 0x0
          submit_list = {next = 0x0, prev = 0x4}
          tmp_rpc = <optimized out>
          rc = <optimized out>
          __func__ = "crt_context_req_untrack"
          __PRETTY_FUNCTION__ = "crt_context_req_untrack"
  #5  crt_hg_req_send_cb (hg_cbinfo=<optimized out>) at
      src/cart/crt_hg.c:1287
          rpc_pub = <optimized out>
          rpc_priv = <optimized out>
          hg_ret = HG_SUCCESS
          rc = 0
          __func__ = "crt_hg_req_send_cb"
          __PRETTY_FUNCTION__ = "crt_hg_req_send_cb"
          __rc = <optimized out>

The RPC might be an outgoing URI_LOOKUP request, which we don't track.
Hence its crp_epi field was NULL as expected. Commit 53d66d7 should have
asserted "epi != NULL" _after_ making sure that the RPC is not an
URI_LOOKUP request. This patch does that and removes a few useless lines
from crt_context_req_untrack_internal.

Signed-off-by: Li Wei <wei.g.li@intel.com>
mlawsonca pushed a commit that referenced this pull request May 16, 2023
Initial commit that installs the mpi file utils (mfu) and other
dependencies for the io500 workload into the DAOS client image.

Dependencies are installed into /usr/local/io500

Signed-off-by: Joel Rosenzweig <joel.b.rosenzweig@intel.com>
Co-authored-by: Johann Lombardi <johann.lombardi@intel.com>
Co-authored-by: lsitkiew <lukasz.sitkiewicz@intel.com>
osalyk pushed a commit to osalyk/daos that referenced this pull request Oct 18, 2023
dumps: refine dtx_update_abort/vos_ioc_create
jolivier23 added a commit that referenced this pull request Jan 12, 2025
Features: dfuse
Allow-unstable-test: true

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants