Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GIT PULL] Block driver updates for 5.16-rc1 #13

Merged
merged 226 commits into from
Nov 1, 2021

Commits on Oct 18, 2021

  1. blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on …

    …blkg->iostat_cpu
    
    c3df5fb ("cgroup: rstat: fix A-A deadlock on 32bit around
    u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
    Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.
    
    Fixes: 2d146aa ("mm: memcontrol: switch to rstat")
    Cc: stable@vger.kernel.org # v5.13+
    Reported-by: syzbot+9738c8815b375ce482a1@syzkaller.appspotmail.com
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    htejun authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3c08b09 View commit details
    Browse the repository at this point in the history
  2. mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>

    blk-cgroup.h pulls in blkdev.h and thus pretty much all the block
    headers.  Break this dependency chain by turning wbc_blkcg_css into a
    macro and dropping the blk-cgroup.h include.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    348332e View commit details
    Browse the repository at this point in the history
  3. mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>

    There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
    break the include chain.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e41d12f View commit details
    Browse the repository at this point in the history
  4. mm: don't include <linux/blkdev.h> in <linux/backing-dev.h>

    Move inode_to_bdi out of line to avoid having to include blkdev.h.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ccdf774 View commit details
    Browse the repository at this point in the history
  5. mm: remove spurious blkdev.h includes

    Various files have acquired spurious includes of <linux/blkdev.h> over
    time.  Remove them.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    518d550 View commit details
    Browse the repository at this point in the history
  6. arch: remove spurious blkdev.h includes

    Various files have acquired spurious includes of <linux/blkdev.h> over
    time.  Remove them.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    dcbfa22 View commit details
    Browse the repository at this point in the history
  7. kernel: remove spurious blkdev.h includes

    Various files have acquired spurious includes of <linux/blkdev.h> over
    time.  Remove them.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-7-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    545c664 View commit details
    Browse the repository at this point in the history
  8. sched: move the <linux/blkdev.h> include out of kernel/sched/sched.h

    Only core.c needs blkdev.h, so move the #include statement there.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-8-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    6a5850d View commit details
    Browse the repository at this point in the history
  9. block: remove the unused rq_end_sector macro

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-9-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1d9433c View commit details
    Browse the repository at this point in the history
  10. block: remove the unused blk_queue_state enum

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-10-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9013823 View commit details
    Browse the repository at this point in the history
  11. block: remove the cmd_size field from struct request_queue

    Entirely unused.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-11-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    713e4e1 View commit details
    Browse the repository at this point in the history
  12. block: remove the struct blk_queue_ctx forward declaration

    This type doesn't exist at all, so no need to forward declare it.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-12-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9778ac7 View commit details
    Browse the repository at this point in the history
  13. block: move elevator.h to block/

    Except for the features passed to blk_queue_required_elevator_features,
    elevator.h is only needed internally to the block layer.  Move the
    ELEVATOR_F_* definitions to blkdev.h, and the move elevator.h to
    block/, dropping all the spurious includes outside of that.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-13-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2e9bc34 View commit details
    Browse the repository at this point in the history
  14. block: drop unused includes in <linux/blkdev.h>

    Drop various include not actually used in blkdev.h itself.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3ab0bc7 View commit details
    Browse the repository at this point in the history
  15. block: drop unused includes in <linux/genhd.h>

    Drop various include not actually used in genhd.h itself, and
    move the remaning includes closer together.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b81e0c2 View commit details
    Browse the repository at this point in the history
  16. block: move a few merge helpers out of <linux/blkdev.h>

    These are block-layer internal helpers, so move them to block/blk.h and
    block/blk-merge.c.  Also update a comment a bit to use better grammar.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-16-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    badf7f6 View commit details
    Browse the repository at this point in the history
  17. block: move integrity handling out of <linux/blkdev.h>

    Split the integrity/metadata handling definitions out into a new header.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fe45e63 View commit details
    Browse the repository at this point in the history
  18. block: move struct request to blk-mq.h

    struct request is only used by blk-mq drivers, so move it and all
    related declarations to blk-mq.h.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    24b83de View commit details
    Browse the repository at this point in the history
  19. block/mq-deadline: Improve request accounting further

    The scheduler .insert_requests() callback is called when a request is
    queued for the first time and also when it is requeued. Only count a
    request the first time it is queued. Additionally, since the mq-deadline
    scheduler only performs zone locking for requests that have been
    inserted, skip the zone unlock code for requests that have not been
    inserted into the mq-deadline scheduler.
    
    Fixes: 38ba64d ("block/mq-deadline: Track I/O statistics")
    Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20210927220328.1410161-2-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    bvanassche authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e2c7275 View commit details
    Browse the repository at this point in the history
  20. block/mq-deadline: Add an invariant check

    Check a statistics invariant at module unload time. When running
    blktests, the invariant is verified every time a request queue is
    removed and hence is verified at least once per test.
    
    Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20210927220328.1410161-3-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    bvanassche authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    32f64ca View commit details
    Browse the repository at this point in the history
  21. block/mq-deadline: Stop using per-CPU counters

    Calculating the sum over all CPUs of per-CPU counters frequently is
    inefficient. Hence switch from per-CPU to individual counters. Three
    counters are protected by the mq-deadline spinlock since these are
    only accessed from contexts that already hold that spinlock. The fourth
    counter is atomic because protecting it with the mq-deadline spinlock
    would trigger lock contention.
    
    Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Niklas Cassel <Niklas.Cassel@wdc.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20210927220328.1410161-4-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    bvanassche authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    bce0363 View commit details
    Browse the repository at this point in the history
  22. block/mq-deadline: Prioritize high-priority requests

    In addition to reverting commit 7b05bf7 ("Revert "block/mq-deadline:
    Prioritize high-priority requests""), this patch uses 'jiffies' instead
    of ktime_get() in the code for aging lower priority requests.
    
    This patch has been tested as follows:
    
    Measured QD=1/jobs=1 IOPS for nullb with the mq-deadline scheduler.
    Result without and with this patch: 555 K IOPS.
    
    Measured QD=1/jobs=8 IOPS for nullb with the mq-deadline scheduler.
    Result without and with this patch: about 380 K IOPS.
    
    Ran the following script:
    
    set -e
    scriptdir=$(dirname "$0")
    if [ -e /sys/module/scsi_debug ]; then modprobe -r scsi_debug; fi
    modprobe scsi_debug ndelay=1000000 max_queue=16
    sd=''
    while [ -z "$sd" ]; do
      sd=$(basename /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/*)
    done
    echo $((100*1000)) > "/sys/block/$sd/queue/iosched/prio_aging_expire"
    if [ -e /sys/fs/cgroup/io.prio.class ]; then
      cd /sys/fs/cgroup
      echo restrict-to-be >io.prio.class
      echo +io > cgroup.subtree_control
    else
      cd /sys/fs/cgroup/blkio/
      echo restrict-to-be >blkio.prio.class
    fi
    echo $$ >cgroup.procs
    mkdir -p hipri
    cd hipri
    if [ -e io.prio.class ]; then
      echo none-to-rt >io.prio.class
    else
      echo none-to-rt >blkio.prio.class
    fi
    { "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/low-pri.txt & }
    echo $$ >cgroup.procs
    "${scriptdir}/max-iops" -a1 -d32 -j1 -e mq-deadline "/dev/$sd" >& ~/hi-pri.txt
    
    Result:
    * 11000 IOPS for the high-priority job
    *    40 IOPS for the low-priority job
    
    If the prio aging expiry time is changed from 100s into 0, the IOPS results
    change into 6712 and 6796 IOPS.
    
    The max-iops script is a script that runs fio with the following arguments:
    --bs=4K --gtod_reduce=1 --ioengine=libaio --ioscheduler=${arg_e} --runtime=60
    --norandommap --rw=read --thread --buffered=0 --numjobs=${arg_j}
    --iodepth=${arg_d} --iodepth_batch_submit=${arg_a}
    --iodepth_batch_complete=$((arg_d / 2)) --name=${positional_argument_1}
    --filename=${positional_argument_1}
    
    Cc: Damien Le Moal <damien.lemoal@wdc.com>
    Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Link: https://lore.kernel.org/r/20210927220328.1410161-5-bvanassche@acm.org
    [axboe: @latest -> @latest_start]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    bvanassche authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    322cff7 View commit details
    Browse the repository at this point in the history
  23. block: print the current process in handle_bad_sector

    Make the bad sector information a little more useful by printing
    current->comm to identify the caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20210928052755.113016-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8a3ee67 View commit details
    Browse the repository at this point in the history
  24. blk-mq: Change rqs check in blk_mq_free_rqs()

    The original code in commit 24d2f90 ("blk-mq: split out tag
    initialization, support shared tags") would check tags->rqs is non-NULL and
    then dereference tags->rqs[].
    
    Then in commit 2af8cbe ("blk-mq: split tag ->rqs[] into two"), we
    started to dereference tags->static_rqs[], but continued to check non-NULL
    tags->rqs.
    
    Check tags->static_rqs as non-NULL instead, which is more logical.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    65de57b View commit details
    Browse the repository at this point in the history
  25. block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ

    It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
    the name BLKDEV_MAX_RQ would imply the max requests always, which it is
    not.
    
    Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
    the default number of requests assigned when allocating a request queue.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d2a2796 View commit details
    Browse the repository at this point in the history
  26. blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()

    For shared sbitmap, if the call to blk_mq_tag_update_depth() was
    successful for any hctx when hctx->sched_tags is not set, then it would be
    successful for all (due to nature in which blk_mq_tag_update_depth()
    fails).
    
    As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
    each hctx. So relocate the call until after the hctx iteration under the
    !q->elevator check, which is equivalent (to !hctx->sched_tags).
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8fa0446 View commit details
    Browse the repository at this point in the history
  27. blk-mq: Invert check in blk_mq_update_nr_requests()

    It's easier to read:
    
    if (x)
    	X;
    else
    	Y;
    
    over:
    
    if (!x)
    	Y;
    else
    	X;
    
    No functional change intended.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f6adcef View commit details
    Browse the repository at this point in the history
  28. blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()

    Function blk_mq_sched_alloc_tags() does same as
    __blk_mq_alloc_map_and_request(), so give a similar name to be consistent.
    
    Similarly rename label err_free_tags -> err_free_map_and_rqs.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-6-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d99a6bb View commit details
    Browse the repository at this point in the history
  29. blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()

    To be more concise and consistent in naming, rename
    blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1820f4f View commit details
    Browse the repository at this point in the history
  30. blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()

    Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
    in future, so pass a driver tags pointer instead of the tagset container
    and HW queue index.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f32e4ea View commit details
    Browse the repository at this point in the history
  31. blk-mq: Don't clear driver tags own mapping

    Function blk_mq_clear_rq_mapping() is required to clear the sched tags
    mappings in driver tags rqs[].
    
    But there is no need for a driver tags to clear its own mapping, so skip
    clearing the mapping in this scenario.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4f245d5 View commit details
    Browse the repository at this point in the history
  32. blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()

    Put the functionality to update the sched shared sbitmap size in a common
    function.
    
    Since the same formula is always used to resize, and it can be got from
    the request queue argument, so just pass the request queue pointer.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    a7e7388 View commit details
    Browse the repository at this point in the history
  33. blk-mq: Add blk_mq_alloc_map_and_rqs()

    Add a function to combine allocating tags and the associated requests,
    and factor out common patterns to use this new function.
    
    Some function only call blk_mq_alloc_map_and_rqs() now, but more
    functionality will be added later.
    
    Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
    are only used in blk-mq.c, and finally rename some functions for
    conciseness and consistency with other function names:
    - __blk_mq_alloc_map_and_{request -> rqs}()
    - blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()
    
    Suggested-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    63064be View commit details
    Browse the repository at this point in the history
  34. blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()

    Refactor blk_mq_free_map_and_requests() such that it can be used at many
    sites at which the tag map and rqs are freed.
    
    Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
    alloc equivalent.
    
    Suggested-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    645db34 View commit details
    Browse the repository at this point in the history
  35. blk-mq: Use shared tags for shared sbitmap support

    Currently we use separate sbitmap pairs and active_queues atomic_t for
    shared sbitmap support.
    
    However a full sets of static requests are used per HW queue, which is
    quite wasteful, considering that the total number of requests usable at
    any given time across all HW queues is limited by the shared sbitmap depth.
    
    As such, it is considerably more memory efficient in the case of shared
    sbitmap to allocate a set of static rqs per tag set or request queue, and
    not per HW queue.
    
    So replace the sbitmap pairs and active_queues atomic_t with a shared
    tags per tagset and request queue, which will hold a set of shared static
    rqs.
    
    Since there is now no valid HW queue index to be passed to the blk_mq_ops
    .init and .exit_request callbacks, pass an invalid index token. This
    changes the semantics of the APIs, such that the callback would need to
    validate the HW queue index before using it. Currently no user of shared
    sbitmap actually uses the HW queue index (as would be expected).
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e155b0c View commit details
    Browse the repository at this point in the history
  36. blk-mq: Stop using pointers for blk_mq_tags bitmap tags

    Now that we use shared tags for shared sbitmap support, we don't require
    the tags sbitmap pointers, so drop them.
    
    This essentially reverts commit 222a5ae ("blk-mq: Use pointers for
    blk_mq_tags bitmap tags").
    
    Function blk_mq_init_bitmap_tags() is removed also, since it would be only
    a wrappper for blk_mq_init_bitmaps().
    
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ae0f1a7 View commit details
    Browse the repository at this point in the history
  37. blk-mq: Change shared sbitmap naming to shared tags

    Now that shared sbitmap support really means shared tags, rename symbols
    to match that.
    
    Signed-off-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    John Garry authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    079a2e3 View commit details
    Browse the repository at this point in the history
  38. block: move blk-throtl fast path inline

    Even if no policies are defined, we spend ~2% of the total IO time
    checking. Move the fast path inline.
    
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    a7b36ee View commit details
    Browse the repository at this point in the history
  39. block: inherit request start time from bio for BLK_CGROUP

    Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the
    time just doing ktime_get_ns() -> readtsc. We essentially read and
    set the start time twice, one for the bio and then again when that bio
    is mapped to a request.
    
    Given that the time between the two is very short, inherit the bio
    start time instead of reading it again. This cuts 1/3rd of the overhead
    of the time keeping.
    
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    0006707 View commit details
    Browse the repository at this point in the history
  40. block: bump max plugged deferred size from 16 to 32

    Particularly for NVMe with efficient deferred submission for many
    requests, there are nice benefits to be seen by bumping the default max
    plug count from 16 to 32. This is especially true for virtualized setups,
    where the submit part is more expensive. But can be noticed even on
    native hardware.
    
    Reduce the multiple queue factor from 4 to 2, since we're changing the
    default size.
    
    While changing it, move the defines into the block layer private header.
    These aren't values that anyone outside of the block layer uses, or
    should use.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ba0ffdd View commit details
    Browse the repository at this point in the history
  41. block: pre-allocate requests if plug is started and is a batch

    The caller typically has a good (or even exact) idea of how many requests
    it needs to submit. We can make the request/tag allocation a lot more
    efficient if we just allocate N requests/tags upfront when we queue the
    first bio from the batch.
    
    Provide a new plug start helper that allows the caller to specify how many
    IOs are expected. This sets plug->nr_ios, and we can use that for smarter
    request allocation. The plug provides a holding spot for requests, and
    request allocation will check it before calling into the normal request
    allocation path.
    
    The blk_finish_plug() is called, check if there are unused requests and
    free them. This should not happen in normal operations. The exception is
    if we get merging, then we may be left with requests that need freeing
    when done.
    
    This raises the per-core performance on my setup from ~5.8M to ~6.1M
    IOPS.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    47c122e View commit details
    Browse the repository at this point in the history
  42. blk-mq: cleanup and rename __blk_mq_alloc_request

    The newly added loop for the cached requests in __blk_mq_alloc_request
    is a little too convoluted for my taste, so unwind it a bit.  Also
    rename the function to __blk_mq_alloc_requests now that it can allocate
    more than a single request.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012104045.658051-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b90cfae View commit details
    Browse the repository at this point in the history
  43. blk-mq: cleanup blk_mq_submit_bio

    Move the blk_mq_alloc_data stack allocation only into the branch
    that actually needs it, and use rq->mq_hctx instead of data.hctx
    to refer to the hctx.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012104045.658051-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    0f38d76 View commit details
    Browse the repository at this point in the history
  44. block: don't dereference request after flush insertion

    We could have a race here, where the request gets freed before we call
    into blk_mq_run_hw_queue(). If this happens, we cannot rely on the state
    of the request.
    
    Grab the hardware context before inserting the flush.
    
    Fixes: 0f38d76 ("blk-mq: cleanup blk_mq_submit_bio")
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4a60f36 View commit details
    Browse the repository at this point in the history
  45. block: unexport blkdev_ioctl

    With the raw driver gone, there is no modular user left.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012104450.659013-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fea349b View commit details
    Browse the repository at this point in the history
  46. block: move the *blkdev_ioctl declarations out of blkdev.h

    These are only used inside of block/.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012104450.659013-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    84b8514 View commit details
    Browse the repository at this point in the history
  47. block: merge block_ioctl into blkdev_ioctl

    Simplify the ioctl path and match the code structure on the compat side.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012104450.659013-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8a70951 View commit details
    Browse the repository at this point in the history
  48. block: inline hot paths of blk_account_io_*()

    Extract hot paths of __blk_account_io_start() and
    __blk_account_io_done() into inline functions, so we don't always pay
    for function calls.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    be6bfe3 View commit details
    Browse the repository at this point in the history
  49. blk-mq: inline hot part of __blk_mq_sched_restart

    Extract a fast check out of __block_mq_sched_restart() and inline it for
    performance reasons.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/894abaa0998e5999f2fe18f271e5efdfc2c32bd2.1633781740.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e9ea159 View commit details
    Browse the repository at this point in the history
  50. block: remove BIO_BUG_ON

    BIO_DEBUG is always defined, so just switch the two instances to use
    BUG_ON directly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9e8c0d0 View commit details
    Browse the repository at this point in the history
  51. block: don't include <linux/ioprio.h> in <linux/bio.h>

    bio.h doesn't need any of the definitions from ioprio.h.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    11d9cab View commit details
    Browse the repository at this point in the history
  52. block: move bio_mergeable out of bio.h

    bio_mergeable is only needed by I/O schedulers, so move it to
    blk-mq-sched.h.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8addffd View commit details
    Browse the repository at this point in the history
  53. block: fold bio_cur_bytes into blk_rq_cur_bytes

    Fold bio_cur_bytes into the only caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b6559d8 View commit details
    Browse the repository at this point in the history
  54. block: move bio_full out of bio.h

    bio_full is only used in bio.c, so move it there.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9a6083b View commit details
    Browse the repository at this point in the history
  55. block: mark __bio_try_merge_page static

    Mark __bio_try_merge_page static and move it up a bit to avoid the need
    for a forward declaration.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-7-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9774b39 View commit details
    Browse the repository at this point in the history
  56. block: move bio_get_{first,last}_bvec out of bio.h

    bio_get_first_bvec and bio_get_last_bvec are only used in blk-merge.c,
    so move them there.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-8-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ff18d77 View commit details
    Browse the repository at this point in the history
  57. block: mark bio_truncate static

    bio_truncate is only used in bio.c, so mark it static.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012161804.991559-9-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4f7ab09 View commit details
    Browse the repository at this point in the history
  58. blk-mq: optimise *end_request non-stat path

    We already have a blk_mq_need_time_stamp() check in
    __blk_mq_end_request() to get a timestamp, hide all the statistics
    accounting under it. It cuts some cycles for requests that don't need
    stats, and is free otherwise.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/e0f2ea812e93a8adcd07101212e7d7e70ca304e7.1634115360.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8971a3b View commit details
    Browse the repository at this point in the history
  59. sbitmap: add __sbitmap_queue_get_batch()

    The block layer tag allocation batching still calls into sbitmap to get
    each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
    which returns a mask of tags all at once, along with an offset for
    those tags.
    
    An example return would be 0xff, where bits 0..7 are set, with
    tag_offset == 128. The valid tags in this case would be 128..135.
    
    A batch is specific to an individual sbitmap_map, hence it cannot be
    larger than that. The requested number of tags is automatically reduced
    to the max that can be satisfied with a single map.
    
    On failure, 0 is returned. Caller should fall back to single tag
    allocation at that point/
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9672b0d View commit details
    Browse the repository at this point in the history
  60. block: improve batched tag allocation

    Add a blk_mq_get_tags() helper, which uses the new sbitmap API for
    allocating a batch of tags all at once. This both simplifies the block
    code for batched allocation, and it is also more efficient than just
    doing repeated calls into __sbitmap_queue_get().
    
    This reduces the sbitmap overhead in peak runs from ~3% to ~1% and
    yields a performanc increase from 6.6M IOPS to 6.8M IOPS for a single
    CPU core.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    349302d View commit details
    Browse the repository at this point in the history
  61. block: remove redundant =y from BLK_CGROUP dependency

    CONFIG_BLK_CGROUP is a boolean option, that is, its value is 'y' or 'n'.
    The comparison to 'y' is redundant.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20210927140000.866249-2-masahiroy@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    masahir0y authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    df252bd View commit details
    Browse the repository at this point in the history
  62. block: simplify Kconfig files

    Everything under block/ depends on BLOCK. BLOCK_HOLDER_DEPRECATED is
    selected from drivers/md/Kconfig, which is entirely dependent on BLOCK.
    
    Extend the 'if BLOCK' ... 'endif' so it covers the whole block/Kconfig.
    
    Also, clean up the definition of BLOCK_COMPAT and BLK_MQ_PCI because
    COMPAT and PCI are boolean.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20210927140000.866249-3-masahiroy@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    masahir0y authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c50fca5 View commit details
    Browse the repository at this point in the history
  63. block: move menu "Partition type" to block/partitions/Kconfig

    Move the menu to the relevant place.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20210927140000.866249-4-masahiroy@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    masahir0y authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b8b98a6 View commit details
    Browse the repository at this point in the history
  64. block: move CONFIG_BLOCK guard to top Makefile

    Every object under block/ depends on CONFIG_BLOCK.
    
    Move the guard to the top Makefile since there is no point to
    descend into block/ if CONFIG_BLOCK=n.
    
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20210927140000.866249-5-masahiroy@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    masahir0y authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4c92890 View commit details
    Browse the repository at this point in the history
  65. block: only check previous entry for plug merge attempt

    Currently we scan the entire plug list, which is potentially very
    expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with
    merging enabled, and profiling shows that the plug merge check is the
    (by far) most expensive thing we're doing:
    
      Overhead  Command   Shared Object     Symbol
      +   20.89%  io_uring  [kernel.vmlinux]  [k] blk_attempt_plug_merge
      +    4.98%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
      +    4.78%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
      +    4.61%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio
    
    Instead of browsing the whole list, just check the previously inserted
    entry. That is enough for a naive merge check and will catch most cases,
    and for devices that need full merging, the IO scheduler attached to
    such devices will do that anyway. The plug merge is meant to be an
    inexpensive check to avoid getting a request, but if we repeatedly
    scan the list for every single insert, it is very much not a cheap
    check.
    
    With this patch, the workload instead runs at ~7.0M IOPS, providing
    a 25% improvement. Disabling merging entirely yields another 5%
    improvement.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d38a9c0 View commit details
    Browse the repository at this point in the history
  66. direct-io: remove blk_poll support

    The polling support in the legacy direct-io support is a little crufty.
    It already doesn't support the asynchronous polling needed for io_uring
    polling, and is hard to adopt to upcoming changes in the polling
    interfaces.  Given that all the major file systems already use the iomap
    direct I/O code, just drop the polling support.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    94c2ed5 View commit details
    Browse the repository at this point in the history
  67. block: don't try to poll multi-bio I/Os in __blkdev_direct_IO

    If an iocb is split into multiple bios we can't poll for both.  So don't
    even bother to try to poll in that case.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    71fc3f5 View commit details
    Browse the repository at this point in the history
  68. iomap: don't try to poll multi-bio I/Os in __iomap_dio_rw

    If an iocb is split into multiple bios we can't poll for both.  So don't
    bother to even try to poll in that case.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f79d474 View commit details
    Browse the repository at this point in the history
  69. io_uring: fix a layering violation in io_iopoll_req_issued

    syscall-level code can't just poke into the details of the poll cookie,
    which is private information of the block layer.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211012111226.760968-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    30da1b4 View commit details
    Browse the repository at this point in the history
  70. blk-mq: factor out a blk_qc_to_hctx helper

    Add a helper to get the hctx from a request_queue and cookie, and fold
    the blk_qc_t_to_queue_num helper into it as no other callers are left.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f70299f View commit details
    Browse the repository at this point in the history
  71. blk-mq: factor out a "classic" poll helper

    Factor the code to do the classic full metal polling out of blk_poll into
    a separate blk_mq_poll_classic helper.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-7-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c6699d6 View commit details
    Browse the repository at this point in the history
  72. blk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internal

    Merge both functions into their only caller to keep the blk-mq tag to
    blk_qc_t mapping as private as possible in blk-mq.c.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-8-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    efbabbe View commit details
    Browse the repository at this point in the history
  73. blk-mq: remove blk_qc_t_valid

    Move the trivial check into the only caller.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-9-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    28a1ae6 View commit details
    Browse the repository at this point in the history
  74. block: replace the spin argument to blk_iopoll with a flags argument

    Switch the boolean spin argument to blk_poll to passing a set of flags
    instead.  This will allow to control polling behavior in a more fine
    grained way.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
    [axboe: adapt to changed io_uring iopoll]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ef99b2d View commit details
    Browse the repository at this point in the history
  75. io_uring: don't sleep when polling for I/O

    There is no point in sleeping for the expected I/O completion timeout
    in the io_uring async polling model as we never poll for a specific
    I/O.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d729cf9 View commit details
    Browse the repository at this point in the history
  76. block: rename REQ_HIPRI to REQ_POLLED

    Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague,
    the bio flag is specific to the polling implementation, so rename and
    document it properly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    6ce913f View commit details
    Browse the repository at this point in the history
  77. block: use SLAB_TYPESAFE_BY_RCU for the bio slab

    This flags ensures that the pages will not be reused for non-bio
    allocations before the end of an RCU grace period.  With that we can
    safely use a RCU lookup for bio polling as long as we are fine with
    occasionally polling the wrong device.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-13-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1a7e76e View commit details
    Browse the repository at this point in the history
  78. block: define 'struct bvec_iter' as packed

    'struct bvec_iter' is embedded into 'struct bio', define it as packed
    so that we can get one extra 4bytes for other uses without expanding
    bio.
    
    'struct bvec_iter' is often allocated on stack, so making it packed
    doesn't affect performance. Also I have run io_uring on both
    nvme/null_blk, and not observe performance effect in this way.
    
    Suggested-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-14-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ming Lei authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1941612 View commit details
    Browse the repository at this point in the history
  79. block: switch polling to be bio based

    Replace the blk_poll interface that requires the caller to keep a queue
    and cookie from the submissions with polling based on the bio.
    
    Polling for the bio itself leads to a few advantages:
    
     - the cookie construction can made entirely private in blk-mq.c
     - the caller does not need to remember the request_queue and cookie
       separately and thus sidesteps their lifetime issues
     - keeping the device and the cookie inside the bio allows to trivially
       support polling BIOs remapping by stacking drivers
     - a lot of code to propagate the cookie back up the submission path can
       be removed entirely.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3e08773 View commit details
    Browse the repository at this point in the history
  80. block: don't allow writing to the poll queue attribute

    The poll attribute is a historic artefact from before when we had
    explicit poll queues that require driver specific configuration.
    Just print a warning when writing to the attribute.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-16-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    a614dd2 View commit details
    Browse the repository at this point in the history
  81. nvme-multipath: enable polled I/O

    Set the poll queue flag to enable polling, given that the multipath
    node just dispatches the bios to a lower queue.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
    Link: https://lore.kernel.org/r/20211012111226.760968-17-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c712dcc View commit details
    Browse the repository at this point in the history
  82. block: cache bdev in struct file for raw bdev IO

    bdev = &BDEV_I(file->f_mapping->host)->bdev
    
    Getting struct block_device from a file requires 2 memory dereferences
    as illustrated above, that takes a toll on performance, so cache it in
    yet unused file->private_data. That gives a noticeable peak performance
    improvement.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fac7c6d View commit details
    Browse the repository at this point in the history
  83. block: use flags instead of bit fields for blkdev_dio

    This generates a lot better code for me, and bumps performance from
    7650K IOPS to 7750K IOPS. Looking at profiles for the run and running
    perf diff, it confirms that we're now sending a lot less time there:
    
         6.38%     -2.80%  [kernel.vmlinux]  [k] blkdev_direct_IO
    
    Taking it from the 2nd most cycle consumer to only the 9th most at
    3.35% of the CPU time.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    09ce874 View commit details
    Browse the repository at this point in the history
  84. block: handle fast path of bio splitting inline

    The fast path is no splitting needed. Separate the handling into a
    check part we can inline, and an out-of-line handling path if we do
    need to split.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    abd45c1 View commit details
    Browse the repository at this point in the history
  85. block: cache request queue in bdev

    There are tons of places where we need to get a request_queue only
    having bdev, which turns into bdev->bd_disk->queue. There are probably a
    hundred of such places considering inline helpers, and enough of them
    are in hot paths.
    
    Cache queue pointer in struct block_device and make use of it in
    bdev_get_queue().
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    17220ca View commit details
    Browse the repository at this point in the history
  86. block: use bdev_get_queue() in bdev.c

    Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
    queue pointer and so is faster.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/a352936ce5d9ac719645b1e29b173d931ebcdc02.1634219547.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    025a386 View commit details
    Browse the repository at this point in the history
  87. block: use bdev_get_queue() in bio.c

    Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
    queue pointer and so is faster.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/85c36ea784d285a5075baa10049e6b59e15fb484.1634219547.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3caee46 View commit details
    Browse the repository at this point in the history
  88. block: use bdev_get_queue() in blk-core.c

    Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
    queue pointer and so is faster.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    eab4e02 View commit details
    Browse the repository at this point in the history
  89. block: convert the rest of block to bdev_get_queue

    Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
    queue pointer and so is faster.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    ed6cdde View commit details
    Browse the repository at this point in the history
  90. block: don't bother iter advancing a fully done bio

    If we're completing nbytes and nbytes is the size of the bio, don't bother
    with calling into the iterator increment helpers. Just clear the bio
    size and we're done.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d4aa57a View commit details
    Browse the repository at this point in the history
  91. block: remove useless caller argument to print_req_error()

    We have exactly one caller of this, just get rid of adding the useless
    function name to the output.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c477b79 View commit details
    Browse the repository at this point in the history
  92. block: move update request helpers into blk-mq.c

    For some reason we still have them in blk-core, with the rest of the
    request completion being in blk-mq. That causes and out-of-line call
    for each completion.
    
    Move them into blk-mq.c instead, where they belong.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9be3e06 View commit details
    Browse the repository at this point in the history
  93. block: improve layout of struct request

    It's been a while since this was analyzed, move some members around to
    better flow with the use case. Initial state up top, and queued state
    after that. This improves my peak case by about 1.5%, from 7750K to
    7900K IOPS.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b608762 View commit details
    Browse the repository at this point in the history
  94. block: only mark bio as tracked if it really is tracked

    We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
    though we may not even have an rq_qos handler. Only mark it as TRACKED if
    it really is potentially tracked.
    
    This saves considerable time for the case where the bio isn't tracked:
    
         2.64%     -1.65%  [kernel.vmlinux]  [k] bio_endio
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    90b8faa View commit details
    Browse the repository at this point in the history
  95. block: store elevator state in request

    Add an rq private RQF_ELV flag, which tells the block layer that this
    request was initialized on a queue that has an IO scheduler attached.
    This allows for faster checking in the fast path, rather than having to
    deference rq->q later on.
    
    Elevator switching does full quiesce of the queue before detaching an
    IO scheduler, so it's safe to cache this in the request itself.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2ff0682 View commit details
    Browse the repository at this point in the history
  96. block: skip elevator fields init for non-elv queue

    Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is
    no elevator. Also, move some other initialisers that imply barriers to
    the end, so the compiler is free to rearrange and optimise other the
    rest of them.
    
    note: fold in a change from Jens leaving queue_list unconditional, as
    it might lead to problems otherwise.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4f266f2 View commit details
    Browse the repository at this point in the history
  97. block: blk_mq_rq_ctx_init cache ctx/q/hctx

    We should have enough of registers in blk_mq_rq_ctx_init(), store them
    in local vars, so we don't keep reloading them.
    
    note: keeping q->elevator may look unnecessary, but it's also used
    inside inlined blk_mq_tags_from_data().
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    605f784 View commit details
    Browse the repository at this point in the history
  98. block: cache rq_flags inside blk_mq_rq_ctx_init()

    Add a local variable for rq_flags, it helps to compile out some of
    rq_flags reloads.
    
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    isilence authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1284590 View commit details
    Browse the repository at this point in the history
  99. block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes

    These were added as part of early days debugging for blk-mq, and they
    are not really useful anymore. Rather than spend cycles updating them,
    just get rid of them.
    
    As a bonus, this shrinks the per-cpu software queue size from 256b
    to 192b. That's a whole cacheline less.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9a14d6c View commit details
    Browse the repository at this point in the history
  100. block: remove some blk_mq_hw_ctx debugfs entries

    Just like the blk_mq_ctx counterparts, we've got a bunch of counters
    in here that are only for debugfs and are of questionnable value. They
    are:
    
    - dispatched, index of how many requests were dispatched in one go
    
    - poll_{considered,invoked,success}, which track poll sucess rates. We're
      confident in the iopoll implementation at this point, don't bother
      tracking these.
    
    As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes,
    dropping a whole cacheline.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    afd7de0 View commit details
    Browse the repository at this point in the history
  101. block: provide helpers for rq_list manipulation

    Instead of open-coding the list additions, traversal, and removal,
    provide a basic set of helpers.
    
    Suggested-by: Christoph Hellwig <hch@infradead.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    013a7f9 View commit details
    Browse the repository at this point in the history
  102. block: add a struct io_comp_batch argument to fops->iopoll()

    struct io_comp_batch contains a list head and a completion handler, which
    will allow completions to more effciently completed batches of IO.
    
    For now, no functional changes in this patch, we just define the
    io_comp_batch structure and add the argument to the file_operations iopoll
    handler.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    5a72e89 View commit details
    Browse the repository at this point in the history
  103. sbitmap: add helper to clear a batch of tags

    sbitmap currently only supports clearing tags one-by-one, add a helper
    that allows the caller to pass in an array of tags to clear.
    
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1aec5e4 View commit details
    Browse the repository at this point in the history
  104. block: add support for blk_mq_end_request_batch()

    Instead of calling blk_mq_end_request() on a single request, add a helper
    that takes the new struct io_comp_batch and completes any request stored
    in there.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f794f33 View commit details
    Browse the repository at this point in the history
  105. nvme: add support for batched completion of polled IO

    Take advantage of struct io_comp_batch, if passed in to the nvme poll
    handler. If it's set, rather than complete each request individually
    inline, store them in the io_comp_batch list. We only do so for requests
    that will complete successfully, anything else will be completed inline as
    before.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c234a65 View commit details
    Browse the repository at this point in the history
  106. io_uring: utilize the io batching infrastructure for more efficient p…

    …olled IO
    
    Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
    supports it, we can handle high rates of polled IO more efficiently.
    
    This raises the single core efficiency on my system from ~6.1M IOPS to
    ~6.6M IOPS running a random read workload at depth 128 on two gen2
    Optane drives.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b688f11 View commit details
    Browse the repository at this point in the history
  107. nvme: wire up completion batching for the IRQ path

    Trivial to do now, just need our own io_comp_batch on the stack and pass
    that in to the usual command completion handling.
    
    I pondered making this dependent on how many entries we had to process,
    but even for a single entry there's no discernable difference in
    performance or latency. Running a sync workload over io_uring:
    
    t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
    
    yields the below performance before the patch:
    
    IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    
    and the following after:
    
    IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
    IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
    
    which definitely isn't slower, about the same if you factor in a bit of
    variance. For peak performance workloads, benchmarking shows a 2%
    improvement.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4f50224 View commit details
    Browse the repository at this point in the history
  108. null_blk: poll queue support

    There's currently no way to experiment with polled IO with null_blk,
    which seems like an oversight. This patch adds support for polled IO.
    We keep a list of issued IOs on submit, and then process that list
    when mq_ops->poll() is invoked.
    
    A new parameter is added, poll_queues. It defaults to 1 like the
    submit queues, meaning we'll have 1 poll queue available.
    
    Fixes-by: Bart Van Assche <bvanassche@acm.org>
    Fixes-by: Pavel Begunkov <asml.silence@gmail.com>
    Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
    Link: https://lore.kernel.org/r/baca710d-0f2a-16e2-60bd-b105b854e0ae@kernel.dk
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    0a593fb View commit details
    Browse the repository at this point in the history
  109. loop: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    905705f View commit details
    Browse the repository at this point in the history
  110. nbd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e1654f4 View commit details
    Browse the repository at this point in the history
  111. aoe: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d9c2bd2 View commit details
    Browse the repository at this point in the history
  112. drbd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    e92ab4e View commit details
    Browse the repository at this point in the history
  113. n64cart: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d1df602 View commit details
    Browse the repository at this point in the history
  114. pcd: move the identify buffer into pcd_identify

    No need to pass it through a bunch of functions.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    7d8b72a View commit details
    Browse the repository at this point in the history
  115. pcd: cleanup initialization

    Refactor the pcd initialization to have a dedicated helper to initialize
    a single disk.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    af761f2 View commit details
    Browse the repository at this point in the history
  116. pf: cleanup initialization

    Refactor the pf initialization to have a dedicated helper to initialize
    a single disk.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fb367e6 View commit details
    Browse the repository at this point in the history
  117. pd: cleanup initialization

    Refactor the pf initialization to have a dedicated helper to initialize
    a single disk.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1ad392a View commit details
    Browse the repository at this point in the history
  118. pcd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4dfbd13 View commit details
    Browse the repository at this point in the history
  119. pcd: fix ordering of unregister_cdrom()

    We first register cdrom and then we add_disk() and
    so we we should likewise unregister the cdrom first and
    then del_gendisk().
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2b6cabc View commit details
    Browse the repository at this point in the history
  120. pcd: capture errors on cdrom_register()

    No errors were being captured wehen cdrom_register() fails,
    capture the error and return the error.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b6fa069 View commit details
    Browse the repository at this point in the history
  121. pd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3dfdd5f View commit details
    Browse the repository at this point in the history
  122. mtip32xx: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    The read_capacity_error error label already does what we need,
    so just re-use that.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4a32e1c View commit details
    Browse the repository at this point in the history
  123. pktcdvd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    The out_mem2 error label already does what we need so
    re-use that.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    7b50562 View commit details
    Browse the repository at this point in the history
  124. block/rsxx: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    54494d1 View commit details
    Browse the repository at this point in the history
  125. block/sx8: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    A completion is used to notify the initial probe what is
    happening and so we must defer error handling on completion.
    Do this by remembering the error and using the shared cleanup
    function.
    
    The tags are shared and so are hanlded later for the
    driver already.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    637208e View commit details
    Browse the repository at this point in the history
  126. pf: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4fac63f View commit details
    Browse the repository at this point in the history
  127. cdrom/gdrom: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d6ac27c View commit details
    Browse the repository at this point in the history
  128. rbd: add add_disk() error handling

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    27c97ab View commit details
    Browse the repository at this point in the history
  129. block/swim3: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-2-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2d4bcf7 View commit details
    Browse the repository at this point in the history
  130. floppy: fix add_disk() assumption on exit due to new developments

    After the patch titled "floppy: use blk_mq_alloc_disk and
    blk_cleanup_disk" the floppy driver was modified to allocate
    the blk_mq_alloc_disk() which allocates the disk with the
    queue. This is further clarified later with the patch titled
    "block: remove alloc_disk and alloc_disk_node". This clarifies
    that:
    
       Most drivers should use and have been converted to use
       blk_alloc_disk and blk_mq_alloc_disk.  Only the scsi
       ULPs and dasd still allocate a disk separately from the
       request_queue so don't bother with convenience macros for
       something that should not see significant new users and
       remove these wrappers.
    
    And then we have the patch titled, "block: hold a request_queue
    reference for the lifetime of struct gendisk" which ensures
    that a queue is *always* present for sure during the entire
    lifetime of a disk.
    
    In the floppy driver's case then the disk always comes with the
    queue. So even if even if the queue was cleaned up on exit, putting
    the disk *is* still required, and likewise, blk_cleanup_queue() on
    a null queue should not happen now as disk->queue is valid from
    disk allocation time on.
    
    Automatic backport code scrapers should hopefully not cherry pick
    this patch as a stable fix candidate without full due dilligence to
    ensure all the work done on the block layer to make this happen is
    merged first.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-3-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2598a2b View commit details
    Browse the repository at this point in the history
  131. floppy: use blk_cleanup_disk()

    Use the blk_cleanup_queue() followed by put_disk() can be
    replaced with blk_cleanup_disk(). No need for two separate
    loops.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-4-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3776339 View commit details
    Browse the repository at this point in the history
  132. floppy: fix calling platform_device_unregister() on invalid drives

    platform_device_unregister() should only be called when
    a respective platform_device_register() is called. However
    the floppy driver currently allows failures when registring
    a drive and a bail out could easily cause an invalid call
    to platform_device_unregister() where it was not intended.
    
    Fix this by adding a bool to keep track of when the platform
    device was registered for a drive.
    
    This does not fix any known panic / bug. This issue was found
    through code inspection while preparing the driver to use the
    up and coming support for device_add_disk() error handling.
    From what I can tell from code inspection, chances of this
    ever happening should be insanely small, perhaps OOM.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-5-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    662167e View commit details
    Browse the repository at this point in the history
  133. floppy: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-6-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    47d34aa View commit details
    Browse the repository at this point in the history
  134. amiflop: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling. The caller for fd_alloc_disk() deals with
    the rest of the cleanup like the tag.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-7-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    a237942 View commit details
    Browse the repository at this point in the history
  135. swim: simplify using blk_cleanup_disk() on swim_remove()

    We can simplify swim_remove() by using one call instead of two,
    just as other drivers do. Use that pattern.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-8-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    b76a30c View commit details
    Browse the repository at this point in the history
  136. swim: add helper for disk cleanup

    Disk cleanup can be shared between exit and bringup. Use a
    helper to do the work required. The only functional change at
    this point is we're being overly paraoid on exit to check for
    a null disk as well now, and this should be safe.
    
    We'll later expand on this, this change just makes subsequent
    changes easier to read.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-9-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4e9abe7 View commit details
    Browse the repository at this point in the history
  137. swim: add a floppy registration bool which triggers del_gendisk()

    Instead of calling del_gendisk() on exit alone, let's add
    a registration bool to the floppy disk state, this way this can
    be done on the shared caller, swim_cleanup_floppy_disk().
    
    This will be more useful in subsequent patches. Right now, this
    just shuffles functionality out to a helper in a safe way.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-10-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9ef41ef View commit details
    Browse the repository at this point in the history
  138. swim: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Since we have a caller to do our unwinding for the disk,
    and this is already dealt with safely we can re-use our
    existing error path goto label which already deals with
    the cleanup.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-11-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    625a28a View commit details
    Browse the repository at this point in the history
  139. block/ataflop: use the blk_cleanup_disk() helper

    Use the helper to replace two lines with one.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-12-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    44a469b View commit details
    Browse the repository at this point in the history
  140. block/ataflop: add registration bool before calling del_gendisk()

    The ataflop assumes del_gendisk() is safe to call, this is only
    true because add_disk() does not return a failure, but that will
    change soon. And so, before we get to adding error handling for
    that case, let's make sure we keep track of which disks actually
    get registered. Then we use this to only call del_gendisk for them.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-13-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    573effb View commit details
    Browse the repository at this point in the history
  141. block/ataflop: provide a helper for cleanup up an atari disk

    Instead of using two separate code paths for cleaning up an atari disk,
    use one. We take the more careful approach to check for *all* disk
    types, as is done on exit. The init path didn't have that check as
    the alternative disk types are only probed for later, they are not
    initialized by default.
    
    Yes, there is a shared tag for all disks.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-14-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    deae113 View commit details
    Browse the repository at this point in the history
  142. block/ataflop: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20210927220302.1073499-15-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2f15107 View commit details
    Browse the repository at this point in the history
  143. xtensa/platforms/iss/simdisk: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Acked-by: Max Filippov <jcmvbkbc@gmail.com>
    Link: https://lore.kernel.org/r/20210927220110.1066271-7-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    db8eda9 View commit details
    Browse the repository at this point in the history
  144. pcd: fix error codes in pcd_init_unit()

    Return -ENODEV on these error paths instead of returning success.
    
    Fixes: af761f2 ("pcd: cleanup initialization")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Link: https://lore.kernel.org/r/20211001122623.GA2283@kili
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dan Carpenter authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    d0ac7a3 View commit details
    Browse the repository at this point in the history
  145. pf: fix error codes in pf_init_unit()

    Return a negative error code instead of success on these error paths.
    
    Fixes: fb367e6 ("pf: cleanup initialization")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Link: https://lore.kernel.org/r/20211001122654.GB2283@kili
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dan Carpenter authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    cfc03ea View commit details
    Browse the repository at this point in the history
  146. sx8: fix an error code in carm_init_one()

    Return a negative error code here on this error path instead of
    returning success.
    
    Fixes: 637208e ("block/sx8: add error handling support for add_disk()")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Link: https://lore.kernel.org/r/20211001122722.GC2283@kili
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Dan Carpenter authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    5deae20 View commit details
    Browse the repository at this point in the history
  147. swim3: add missing major.h include

    swim3 got this through blkdev.h previously, but blkdev.h is not including
    it anymore. Include it specifically for the driver, otherwise FLOPPY_MAJOR
    is undefined and breaks the compile on PPC if swim3 is configured.
    
    Fixes: b81e0c2 ("block: drop unused includes in <linux/genhd.h>")
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    1f0a258 View commit details
    Browse the repository at this point in the history
  148. md: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    We just do the unwinding of what was not done before, and are
    sure to unlock prior to bailing.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    9be68dd View commit details
    Browse the repository at this point in the history
  149. md: add the bitmap group to the default groups for the md kobject

    Replace the deprecated default_attrs with the default_groups mechanism,
    and add the always visible bitmap group to the groups created add
    kobject_add time.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    51238e7 View commit details
    Browse the repository at this point in the history
  150. md: extend disks_mutex coverage

    disks_mutex is intended to serialize md_alloc.  Extended it to also cover
    the kobject_uevent call and getting the sysfs dirent to help reducing
    error handling complexity.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    94f3cd7 View commit details
    Browse the repository at this point in the history
  151. md: properly unwind when failing to add the kobject in md_alloc

    Add proper error handling to delete the gendisk when failing to add
    the md kobject and clean up the error unwinding in general.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    7ad1069 View commit details
    Browse the repository at this point in the history
  152. md/raid1: only allocate write behind bio for WriteMostly device

    Commit 6607cd3 ("raid1: ensure write
    behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
    size of behind bio is not bigger than BIO_MAX_VECS sectors.
    
    Unfortunately the same calltrace still could happen since an array could
    enable write-behind without write mostly device.
    
    To match the manpage of mdadm (which says "write-behind is only attempted
    on drives marked as write-mostly"), we need to check WriteMostly flag to
    avoid such unexpected behavior.
    
    [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25
    
    Cc: stable@vger.kernel.org # v5.12+
    Cc: Jens Stutte <jens@chianterastutte.eu>
    Reported-by: Jens Stutte <jens@chianterastutte.eu>
    Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Guoqing Jiang authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fd3b697 View commit details
    Browse the repository at this point in the history
  153. md/raid1: use rdev in raid1_write_request directly

    We already get rdev from conf->mirrors[i].rdev at the beginning of the
    loop, so just use it.
    
    Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Guoqing Jiang authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    2e94275 View commit details
    Browse the repository at this point in the history
  154. md/raid5: call roundup_pow_of_two in raid5_run

    Let's call roundup_pow_of_two here instead of open code.
    
    Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Guoqing Jiang authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c6efe43 View commit details
    Browse the repository at this point in the history
  155. md: remove unused argument from md_new_event

    Actually, mddev is not used by md_new_event.
    
    Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Guoqing Jiang authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    5467948 View commit details
    Browse the repository at this point in the history
  156. md: update superblock after changing rdev flags in state_store

    When the in memory flag is changed, we need to persist the change in the
    rdev superblock flags. This is needed for "writemostly" and "failfast".
    
    Reviewed-by: Li Feng <fengli@smartx.com>
    Signed-off-by: Xiao Ni <xni@redhat.com>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    XiaoNi87 authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8b9e229 View commit details
    Browse the repository at this point in the history
  157. mtip32xx: Remove redundant 'flush_workqueue()' calls

    'destroy_workqueue()' already drains the queue before destroying it, so
    there is no need to flush it explicitly.
    
    Remove the redundant 'flush_workqueue()' calls.
    
    This was generated with coccinelle:
    
    @@
    expression E;
    @@
    - 	flush_workqueue(E);
    	destroy_workqueue(E);
    
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/0fea349c808c6cfbf549b0e33701320c7860c8b7.1634234221.git.christophe.jaillet@wanadoo.fr
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    tititiou36 authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    c573d58 View commit details
    Browse the repository at this point in the history
  158. nbd: don't handle response without a corresponding request message

    While handling a response message from server, nbd_read_stat() will
    try to get request by tag, and then complete the request. However,
    this is problematic if nbd haven't sent a corresponding request
    message:
    
    t1                      t2
                            submit_bio
                             nbd_queue_rq
                              blk_mq_start_request
    recv_work
     nbd_read_stat
      blk_mq_tag_to_rq
     blk_mq_complete_request
                              nbd_send_cmd
    
    Thus add a new cmd flag 'NBD_CMD_INFLIGHT', it will be set in
    nbd_send_cmd() and checked in nbd_read_stat().
    
    Noted that this patch can't fix that blk_mq_tag_to_rq() might
    return a freed request, and this will be fixed in following
    patches.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-2-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    4e6eef5 View commit details
    Browse the repository at this point in the history
  159. nbd: make sure request completion won't concurrent

    commit cddce01 ("nbd: Aovid double completion of a request")
    try to fix that nbd_clear_que() and recv_work() can complete a
    request concurrently. However, the problem still exists:
    
    t1                    t2                     t3
    
    nbd_disconnect_and_put
     flush_workqueue
                          recv_work
                           blk_mq_complete_request
                            blk_mq_complete_request_remote -> this is true
                             WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)
                              blk_mq_raise_softirq
                                                 blk_done_softirq
                                                  blk_complete_reqs
                                                   nbd_complete_rq
                                                    blk_mq_end_request
                                                     blk_mq_free_request
                                                      WRITE_ONCE(rq->state, MQ_RQ_IDLE)
      nbd_clear_que
       blk_mq_tagset_busy_iter
        nbd_clear_req
                                                       __blk_mq_free_request
                                                        blk_mq_put_tag
         blk_mq_complete_request -> complete again
    
    There are three places where request can be completed in nbd:
    recv_work(), nbd_clear_que() and nbd_xmit_timeout(). Since they
    all hold cmd->lock before completing the request, it's easy to
    avoid the problem by setting and checking a cmd flag.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-3-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    07175cb View commit details
    Browse the repository at this point in the history
  160. nbd: check sock index in nbd_read_stat()

    The sock that clent send request in nbd_send_cmd() and receive reply
    in nbd_read_stat() should be the same.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-4-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    fcf3d63 View commit details
    Browse the repository at this point in the history
  161. nbd: don't start request if nbd_queue_rq() failed

    commit 6a468d5 ("nbd: don't start req until after the dead
    connection logic") move blk_mq_start_request() from nbd_queue_rq()
    to nbd_handle_cmd() to skip starting request if the connection is
    dead. However, request is still started in other error paths.
    
    Currently, blk_mq_end_request() will be called immediately if
    nbd_queue_rq() failed, thus start request in such situation is
    useless. So remove blk_mq_start_request() from error paths in
    nbd_handle_cmd().
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-5-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    0de2b7a View commit details
    Browse the repository at this point in the history
  162. nbd: clean up return value checking of sock_xmit()

    Check if sock_xmit() return 0 is useless because it'll never return
    0, comment it and remove such checkings.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-6-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    f52c0e0 View commit details
    Browse the repository at this point in the history
  163. nbd: partition nbd_read_stat() into nbd_read_reply() and nbd_handle_r…

    …eply()
    
    Prepare to fix uaf in nbd_read_stat(), no functional changes.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20210916093350.1410403-7-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    3fe1db6 View commit details
    Browse the repository at this point in the history
  164. nbd: fix uaf in nbd_handle_reply()

    There is a problem that nbd_handle_reply() might access freed request:
    
    1) At first, a normal io is submitted and completed with scheduler:
    
    internel_tag = blk_mq_get_tag -> get tag from sched_tags
     blk_mq_rq_ctx_init
      sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
    ...
    blk_mq_get_driver_tag
     __blk_mq_get_driver_tag -> get tag from tags
     tags->rq[tag] = sched_tag->static_rq[internel_tag]
    
    So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
    to the request: sched_tags->static_rq[internal_tag]. Even if the
    io is finished.
    
    2) nbd server send a reply with random tag directly:
    
    recv_work
     nbd_handle_reply
      blk_mq_tag_to_rq(tags, tag)
       rq = tags->rq[tag]
    
    3) if the sched_tags->static_rq is freed:
    
    blk_mq_sched_free_requests
     blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
      -> step 2) access rq before clearing rq mapping
      blk_mq_clear_rq_mapping(set, tags, hctx_idx);
      __free_pages() -> rq is freed here
    
    4) Then, nbd continue to use the freed request in nbd_handle_reply
    
    Fix the problem by get 'q_usage_counter' before blk_mq_tag_to_rq(),
    thus request is ensured not to be freed because 'q_usage_counter' is
    not zero.
    
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20210916141810.2325276-1-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Yu Kuai authored and axboe committed Oct 18, 2021
    Configuration menu
    Copy the full SHA
    8663b21 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2021

  1. block: ataflop: fix breakage introduced at blk-mq refactoring

    Refactoring of the Atari floppy driver when converting to blk-mq
    has broken the state machine in not-so-subtle ways:
    
    finish_fdc() must be called when operations on the floppy device
    have completed. This is crucial in order to relase the ST-DMA
    lock, which protects against concurrent access to the ST-DMA
    controller by other drivers (some DMA related, most just related
    to device register access - broken beyond compare, I know).
    
    When rewriting the driver's old do_request() function, the fact
    that finish_fdc() was called only when all queued requests had
    completed appears to have been overlooked. Instead, the new
    request function calls finish_fdc() immediately after the last
    request has been queued. finish_fdc() executes a dummy seek after
    most requests, and this overwrites the state machine's interrupt
    hander that was set up to wait for completion of the read/write
    request just prior. To make matters worse, finish_fdc() is called
    before device interrupts are re-enabled, making certain that the
    read/write interupt is missed.
    
    Shifting the finish_fdc() call into the read/write request
    completion handler ensures the driver waits for the request to
    actually complete. With a queue depth of 2, we won't see long
    request sequences, so calling finish_fdc() unconditionally just
    adds a little overhead for the dummy seeks, and keeps the code
    simple.
    
    While we're at it, kill ataflop_commit_rqs() which does nothing
    but run finish_fdc() unconditionally, again likely wiping out an
    in-flight request.
    
    Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
    Fixes: 6ec3938 ("ataflop: convert to blk-mq")
    CC: linux-block@vger.kernel.org
    CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
    Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Michael Schmitz authored and axboe committed Oct 19, 2021
    Configuration menu
    Copy the full SHA
    86d46fd View commit details
    Browse the repository at this point in the history
  2. nvme: move command clear into the various setup helpers

    We don't have to worry about doing extra memsets by moving it outside
    the protection of RQF_DONTPREP, as nvme doesn't do partial completions.
    
    This is in preparation for making the read/write fast path not do a full
    memset of the command.
    
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 19, 2021
    Configuration menu
    Copy the full SHA
    9c3d292 View commit details
    Browse the repository at this point in the history
  3. nvme: don't memset() the normal read/write command

    This memset in the fast path costs a lot of cycles on my setup. Here's a
    top-of-profile of doing ~6.7M IOPS:
    
    +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
    +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
    +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
    +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
    
    and a perf diff with this patch:
    
         0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
    
    reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
    cycle consumer to something that's mostly irrelevant.
    
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    axboe committed Oct 19, 2021
    Configuration menu
    Copy the full SHA
    a9a7e30 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2021

  1. nbd: Fix use-after-free in pid_show

    I got issue as follows:
    [  263.886511] BUG: KASAN: use-after-free in pid_show+0x11f/0x13f
    [  263.888359] Read of size 4 at addr ffff8880bf0648c0 by task cat/746
    [  263.890479] CPU: 0 PID: 746 Comm: cat Not tainted 4.19.90-dirty #140
    [  263.893162] Call Trace:
    [  263.893509]  dump_stack+0x108/0x15f
    [  263.893999]  print_address_description+0xa5/0x372
    [  263.894641]  kasan_report.cold+0x236/0x2a8
    [  263.895696]  __asan_report_load4_noabort+0x25/0x30
    [  263.896365]  pid_show+0x11f/0x13f
    [  263.897422]  dev_attr_show+0x48/0x90
    [  263.898361]  sysfs_kf_seq_show+0x24d/0x4b0
    [  263.899479]  kernfs_seq_show+0x14e/0x1b0
    [  263.900029]  seq_read+0x43f/0x1150
    [  263.900499]  kernfs_fop_read+0xc7/0x5a0
    [  263.903764]  vfs_read+0x113/0x350
    [  263.904231]  ksys_read+0x103/0x270
    [  263.905230]  __x64_sys_read+0x77/0xc0
    [  263.906284]  do_syscall_64+0x106/0x360
    [  263.906797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    Reproduce this issue as follows:
    1. nbd-server 8000 /tmp/disk
    2. nbd-client localhost 8000 /dev/nbd1
    3. cat /sys/block/nbd1/pid
    Then trigger use-after-free in pid_show.
    
    Reason is after do step '2', nbd-client progress is already exit. So
    it's task_struct already freed.
    To solve this issue, revert part of 6521d39's modify and remove
    useless 'recv_task' member of nbd_device.
    
    Fixes: 6521d39 ("nbd: Remove variable 'pid'")
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Link: https://lore.kernel.org/r/20211020073959.2679255-1-yebin10@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ye Bin authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    0c98057 View commit details
    Browse the repository at this point in the history
  2. s390/dasd: handle request magic consistently as unsigned int

    Get rid of the rather odd casts to character pointer of the
    dasd_ccw_req magic member and simply use the unsigned int value
    unmodified everywhere.
    
    Acked-by: Jan Höppner <hoeppner@linux.ibm.com>
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-2-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    hcahca authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    169bbda View commit details
    Browse the repository at this point in the history
  3. s390/dasd: fix kernel doc comment

    Fix this:
    
    drivers/s390/block/dasd_ioctl.c:666: warning:
     Function parameter or member 'disk' not described in 'dasd_biodasdinfo'
    drivers/s390/block/dasd_ioctl.c:666: warning:
     Function parameter or member 'info' not described in 'dasd_biodasdinfo'
    
    Acked-by: Jan Höppner <hoeppner@linux.ibm.com>
    Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-3-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    hcahca authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    10c78e5 View commit details
    Browse the repository at this point in the history
  4. s390/dasd: split up dasd_eckd_read_conf

    Move the cabling check out of dasd_eckd_read_conf and split it up into
    separate functions to improve readability and re-use functions.
    
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-4-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stefan Haberland authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    2359696 View commit details
    Browse the repository at this point in the history
  5. s390/dasd: move dasd_eckd_read_fc_security

    dasd_eckd_read_conf is called multiple times during device setup but the
    fc_security feature needs to be read only once. So move it into the calling
    function.
    
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-5-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stefan Haberland authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    74e2f21 View commit details
    Browse the repository at this point in the history
  6. s390/dasd: summarize dasd configuration data in a separate structure

    Summarize the dasd configuration data in a separate structure so that
    functions that need temporary config data do not need to allocate the
    whole eckd_private structure.
    
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-6-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stefan Haberland authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    542e30c View commit details
    Browse the repository at this point in the history
  7. s390/dasd: fix missing path conf_data after failed allocation

    dasd_eckd_path_available_action() does a memory allocation to store
    the per path configuration data permanently.
    In the unlikely case that this allocation fails there is no conf_data
    stored for the corresponding path.
    
    This is OK since this is not necessary for an operational path but some
    features like control unit initiated reconfiguration (CUIR) do not work.
    
    To fix this add the path to the 'to be verified pathmask' again and
    schedule the handler again.
    
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-7-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stefan Haberland authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    9dffede View commit details
    Browse the repository at this point in the history
  8. s390/dasd: fix possibly missed path verification

    __dasd_device_check_path_events() calls the discipline path event handler.
    This handler can leave the 'to be verified pathmask' populated for an
    additional verification.
    
    There is a race window where the worker has finished before
    dasd_path_clear_all_verify() is called which resets the tbvpm.
    
    Due to this there could be outstanding path verifications missed.
    
    Fix by clearing the pathmasks before calling the handler and add them
    again in case of an error.
    
    Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Jan Hoeppner <hoeppner@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211020115124.1735254-8-sth@linux.ibm.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Stefan Haberland authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    a8e5d49 View commit details
    Browse the repository at this point in the history
  9. md: bcache: Fix spelling of 'acquire'

    acqurie -> acquire
    
    Signed-off-by: Ding Senjie <dingsenjie@yulong.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-2-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ding Senjie authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    a307e2a View commit details
    Browse the repository at this point in the history
  10. bcache: reserve never used bits from bkey.high

    There sre 3 bits in member high of struct bkey are never used, and no
    plan to support them in future,
    - HEADER_SIZE, start at bit 58, length 2 bits
    - KEY_PINNED,  start at bit 55, length 1 bit
    
    No any kernel code, or user space tool references or accesses the three
    bits. Therefore it is possible and feasible to reserve the valuable bits
    from bkey.high. They can be used in future for other purpose.
    
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-3-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Coly Li authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    0a2b3e3 View commit details
    Browse the repository at this point in the history
  11. bcache: fix error info in register_bcache()

    In register_bcache(), there are several cases we didn't set
    correct error info (return value and/or error message):
    - if kzalloc() fails, it needs to return ENOMEM and print
    "cannot allocate memory";
    - if register_cache() fails, it's better to propagate its
    return value rather than using default EINVAL.
    
    Signed-off-by: Chao Yu <yuchao0@huawei.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-4-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    chaseyu authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    d55f7cb View commit details
    Browse the repository at this point in the history
  12. bcache: move calc_cached_dev_sectors to proper place on backing devic…

    …e detach
    
    Calculation of cache_set's cached sectors is done by travelling
    cached_devs list as shown below:
    
    static void calc_cached_dev_sectors(struct cache_set *c)
    {
    ...
            list_for_each_entry(dc, &c->cached_devs, list)
                    sectors += bdev_sectors(dc->bdev);
    
            c->cached_dev_sectors = sectors;
    }
    
    But cached_dev won't be unlinked from c->cached_devs list until we call
    following list_move(&dc->list, &uncached_devices),
    so previous fix in 'commit 4601014
    ("bcache: recal cached_dev_sectors on detach")' is wrong, now we move
    it to its right place.
    
    Signed-off-by: Lin Feng <linf@wangsu.com>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-5-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    linfeng2999 authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    0259d44 View commit details
    Browse the repository at this point in the history
  13. bcache: remove the cache_dev_name field from struct cache

    Just use the %pg format specifier to print the name directly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-6-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    7e84c21 View commit details
    Browse the repository at this point in the history
  14. bcache: remove the backing_dev_name field from struct cached_dev

    Just use the %pg format specifier to print the name directly.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-7-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    0f5cd78 View commit details
    Browse the repository at this point in the history
  15. bcache: use bvec_kmap_local in bch_data_verify

    Using local kmaps slightly reduces the chances to stray writes, and
    the bvec interface cleans up the code a little bit.
    
    Also switch from page_address to bvec_kmap_local for cbv to be on the
    safe side and to avoid pointlessly poking into bvec internals.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-8-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    00387bd View commit details
    Browse the repository at this point in the history
  16. bcache: remove bch_crc64_update

    bch_crc64_update is an entirely pointless wrapper around crc64_be.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211020143812.6403-9-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    39fa7a9 View commit details
    Browse the repository at this point in the history
  17. nvme: generate uevent once a multipath namespace is operational again

    When fast_io_fail_tmo is set I/O will be aborted while recovery is
    still ongoing. This causes MD to set the namespace to failed, and
    no futher I/O will be submitted to that namespace.
    
    However, once the recovery succeeds and the namespace becomes
    operational again the NVMe subsystem doesn't send a notification,
    so MD cannot automatically reinstate operation and requires
    manual interaction.
    
    This patch will send a KOBJ_CHANGE uevent per multipathed namespace
    once the underlying controller transitions to LIVE, allowing an automatic
    MD reassembly with these udev rules:
    
    /etc/udev/rules.d/65-md-auto-re-add.rules:
    SUBSYSTEM!="block", GOTO="md_end"
    
    ACTION!="change", GOTO="md_end"
    ENV{ID_FS_TYPE}!="linux_raid_member", GOTO="md_end"
    PROGRAM="/sbin/md_raid_auto_readd.sh $devnode"
    LABEL="md_end"
    
    /sbin/md_raid_auto_readd.sh:
    
    MDADM=/sbin/mdadm
    DEVNAME=$1
    
    export $(${MDADM} --examine --export ${DEVNAME})
    
    if [ -z "${MD_UUID}" ]; then
        exit 1
    fi
    
    UUID_LINK=$(readlink /dev/disk/by-id/md-uuid-${MD_UUID})
    MD_DEVNAME=${UUID_LINK##*/}
    export $(${MDADM} --detail --export /dev/${MD_DEVNAME})
    if [ -z "${MD_METADATA}" ] ; then
        exit 1
    fi
    if [ $(cat /sys/block/${MD_DEVNAME}/md/degraded) != 1 ]; then
        echo "${MD_DEVNAME}: array not degraded, nothing to do"
        exit 0
    fi
    MD_STATE=$(cat /sys/block/${MD_DEVNAME}/md/array_state)
    if [ ${MD_STATE} != "clean" ] ; then
        echo "${MD_DEVNAME}: array state ${MD_STATE}, cannot re-add"
        exit 1
    fi
    MD_VARNAME="MD_DEVICE_dev_${DEVNAME##*/}_ROLE"
    if [ ${!MD_VARNAME} = "spare" ] ; then
        ${MDADM} --manage /dev/${MD_DEVNAME} --re-add ${DEVNAME}
    fi
    
    Changes to v2:
    - Add udev rules example to description
    Changes to v1:
    - use disk_uevent() as suggested by hch
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    f6f09c1 View commit details
    Browse the repository at this point in the history
  18. nvme-fc: add support for ->map_queues

    NVMe FC don't have support for map queues, unlike the PCI, RDMA and TCP
    transports.  Add a ->map_queues callout for the LLDDs to provide such
    functionality.
    
    Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
    Signed-off-by: Nilesh Javali <njavali@marvell.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Saurav Kashyap authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    01d8381 View commit details
    Browse the repository at this point in the history
  19. qla2xxx: add ->map_queues support for nvme

    Implement ->map queues and use the block layer blk_mq_pci_map_queues
    helper for mapping queues to CPUs.
    
    With this mapping minimum 10%+ increase in performance is noticed.
    
    Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
    Signed-off-by: Nilesh Javali <njavali@marvell.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Saurav Kashyap authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    2b2af50 View commit details
    Browse the repository at this point in the history
  20. nvmet: fix use-after-free when a port is removed

    When a port is removed through configfs, any connected controllers
    are starting teardown flow asynchronously and can still send commands.
    This causes a use-after-free bug for any command that dereferences
    req->port (like in nvmet_parse_io_cmd).
    
    To fix this, wait for all the teardown scheduled works to complete
    (like release_work at rdma/tcp drivers). This ensures there are no
    active controllers when the port is eventually removed.
    
    Signed-off-by: Israel Rukshin <israelr@nvidia.com>
    Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    israelru authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    e3e19dc View commit details
    Browse the repository at this point in the history
  21. nvmet-rdma: fix use-after-free when a port is removed

    When removing a port, all its controllers are being removed, but there
    are queues on the port that doesn't belong to any controller (during
    connection time). This causes a use-after-free bug for any command
    that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
    should be destroyed before freeing the port via configfs. Destroy the
    remaining queues after the RDMA-CM was destroyed guarantees that no
    new queue will be created.
    
    Signed-off-by: Israel Rukshin <israelr@nvidia.com>
    Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    israelru authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    fcf73a8 View commit details
    Browse the repository at this point in the history
  22. nvmet-tcp: fix use-after-free when a port is removed

    When removing a port, all its controllers are being removed, but there
    are queues on the port that doesn't belong to any controller (during
    connection time). This causes a use-after-free bug for any command
    that dereferences req->port (like in nvmet_alloc_ctrl). Those queues
    should be destroyed before freeing the port via configfs. Destroy
    the remaining queues after the accept_work was cancelled guarantees
    that no new queue will be created.
    
    Signed-off-by: Israel Rukshin <israelr@nvidia.com>
    Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    israelru authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    2351ead View commit details
    Browse the repository at this point in the history
  23. nvme-rdma: limit the maximal queue size for RDMA controllers

    Corrent limit of 1024 isn't valid for some of the RDMA based ctrls. In
    case the target expose a cap of larger amount of entries (e.g. 1024),
    the initiator may fail to create a QP with this size. Thus limit to a
    value that works for all RDMA adapters.
    
    Future general solution should use RDMA/core API to calculate this size
    according to device capabilities and number of WRs needed per NVMe IO
    request.
    
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    44c3c62 View commit details
    Browse the repository at this point in the history
  24. nvmet: add get_max_queue_size op for controllers

    Some transports, such as RDMA, would like to set the queue size
    according to device/port/ctrl characteristics. Add a new nvmet transport
    op that is called during ctrl initialization. This will not effect
    transports that don't implement this option.
    
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    6d1555c View commit details
    Browse the repository at this point in the history
  25. nvmet-rdma: implement get_max_queue_size controller op

    Limit the maximal queue size for RDMA controllers. Today, the target
    reports a limit of 1024 and this limit isn't valid for some of the RDMA
    based controllers. For now, limit RDMA transport to 128 entries (the
    max queue depth configured for Linux NVMe/RDMA host).
    
    Future general solution should use RDMA/core API to calculate this size
    according to device capabilities and number of WRs needed per NVMe IO
    request.
    
    Reported-by: Mark Ruijter <mruijter@primelogic.nl>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    c7d792f View commit details
    Browse the repository at this point in the history
  26. nvmet: make discovery NQN configurable

    TPAR8013 allows for unique discovery NQNs, so make the discovery
    controller NQN configurable by exposing a subsys attribute
    'discovery_nqn'.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    626851e View commit details
    Browse the repository at this point in the history
  27. nvme: add CNTRLTYPE definitions for 'identify controller'

    Update the 'identify controller' structure to define the newly added
    CNTRLTYPE field.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    e15a8a9 View commit details
    Browse the repository at this point in the history
  28. nvmet: add nvmet_is_disc_subsys() helper

    Add a helper function to determine if a given subsystem is a discovery
    subsystem.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    a294711 View commit details
    Browse the repository at this point in the history
  29. nvmet: set 'CNTRLTYPE' in the identify controller data

    Set the correct 'CNTRLTYPE' field in the identify controller data.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    d3aef70 View commit details
    Browse the repository at this point in the history
  30. nvme: expose subsystem type in sysfs attribute 'subsystype'

    With unique discovery controller NQNs we cannot distinguish the
    subsystem type by the NQN alone, but need to check the subsystem
    type, too.
    So expose the subsystem type in a new sysfs attribute 'subsystype'.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    954ae16 View commit details
    Browse the repository at this point in the history
  31. nvme: Add connect option 'discovery'

    Add a connect option 'discovery' to specify that the connection
    should be made to a discovery controller, not a normal I/O controller.
    With discovery controllers supporting unique subsystem NQNs we
    cannot easily distinguish by the subsystem NQN if this should be
    a discovery connection, but we need this information to blank out
    options not supported by discovery controllers.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    20e8b68 View commit details
    Browse the repository at this point in the history
  32. nvme: display correct subsystem NQN

    With discovery controllers supporting unique subsystem NQNs the
    actual subsystem NQN might be different from that one passed in
    via the connect args. So add a helper to display the resulting
    subsystem NQN.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    e5ea42f View commit details
    Browse the repository at this point in the history
  33. nvmet: use macro definition for setting nmic value

    This makes the code more readable.
    
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    571b544 View commit details
    Browse the repository at this point in the history
  34. nvmet: use macro definitions for setting cmic value

    This makes the code more readable.
    
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    d56ae18 View commit details
    Browse the repository at this point in the history
  35. nvme-multipath: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Since we now can tell for sure when a disk was added, move
    setting the bit NVME_NSHEAD_DISK_LIVE only when we did
    add the disk successfully.
    
    Nothing to do here as the cleanup is done elsewhere. We take
    care and use test_and_set_bit() because it is protects against
    two nvme paths simultaneously calling device_add_disk() on the
    same namespace head.
    
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mcgrof authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    1138458 View commit details
    Browse the repository at this point in the history
  36. nvme-rdma: fix error code in nvme_rdma_setup_ctrl

    In case that icdoff is not zero or mandatory keyed sgls are not
    supported by the NVMe/RDMA target, we'll go to error flow but we'll
    return 0 to the caller. Fix it by returning an appropriate error code.
    
    Fixes: c66e299 ("nvme-rdma: centralize controller setup sequence")
    Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    mgurtovoy authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    0974812 View commit details
    Browse the repository at this point in the history
  37. nvme-pci: clear shadow doorbell memory on resets

    The host memory doorbell and event buffers need to be initialized on
    each reset so the driver doesn't observe stale values from the previous
    instantiation.
    
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Tested-by: John Levon <john.levon@nutanix.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    keithbusch authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    58847f1 View commit details
    Browse the repository at this point in the history
  38. nvme: drop scan_lock and always kick requeue list when removing names…

    …paces
    
    When reading the partition table on initial scan hits an I/O error the
    I/O will hang with the scan_mutex held:
    
    [<0>] do_read_cache_page+0x49b/0x790
    [<0>] read_part_sector+0x39/0xe0
    [<0>] read_lba+0xf9/0x1d0
    [<0>] efi_partition+0xf1/0x7f0
    [<0>] bdev_disk_changed+0x1ee/0x550
    [<0>] blkdev_get_whole+0x81/0x90
    [<0>] blkdev_get_by_dev+0x128/0x2e0
    [<0>] device_add_disk+0x377/0x3c0
    [<0>] nvme_mpath_set_live+0x130/0x1b0 [nvme_core]
    [<0>] nvme_mpath_add_disk+0x150/0x160 [nvme_core]
    [<0>] nvme_alloc_ns+0x417/0x950 [nvme_core]
    [<0>] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core]
    [<0>] nvme_scan_work+0x168/0x310 [nvme_core]
    [<0>] process_one_work+0x231/0x420
    
    and trying to delete the controller will deadlock as it tries to grab
    the scan mutex:
    
    [<0>] nvme_mpath_clear_ctrl_paths+0x25/0x80 [nvme_core]
    [<0>] nvme_remove_namespaces+0x31/0xf0 [nvme_core]
    [<0>] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core]
    
    As we're now properly ordering the namespace list there is no need to
    hold the scan_mutex in nvme_mpath_clear_ctrl_paths() anymore.
    And we always need to kick the requeue list as the path will be marked
    as unusable and I/O will be requeued _without_ a current path.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    2b81a5f View commit details
    Browse the repository at this point in the history
  39. nvmet: use struct_size over open coded arithmetic

    As noted in the "Deprecated Interfaces, Language Features, Attributes,
    and Conventions" documentation [1], size calculations (especially
    multiplication) should not be performed in memory allocator (or similar)
    function arguments due to the risk of them overflowing. This could lead
    to values wrapping around and a smaller allocation being made than the
    caller was expecting. Using those allocations could lead to linear
    overflows of heap memory and other misbehaviors.
    
    In this case this is not actually dynamic size: all the operands
    involved in the calculation are constant values. However it is better to
    refactor this anyway, just to keep the open-coded math idiom out of
    code.
    
    So, use the struct_size() helper to do the arithmetic instead of the
    argument "size + count * size" in the kmalloc() function.
    
    This code was detected with the help of Coccinelle and audited and fixed
    manually.
    
    [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
    
    Signed-off-by: Len Baker <len.baker@gmx.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Len Baker authored and Christoph Hellwig committed Oct 20, 2021
    Configuration menu
    Copy the full SHA
    117d5b6 View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2021

  1. Merge tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme into…

    … for-5.16/drivers
    
    Pull NVMe updates from Christoph:
    
    "nvme updates for Linux 5.16
    
     - fix a multipath partition scanning deadlock (Hannes Reinecke)
     - generate uevent once a multipath namespace is operational again
       (Hannes Reinecke)
     - support unique discovery controller NQNs (Hannes Reinecke)
     - fix use-after-free when a port is removed (Israel Rukshin)
     - clear shadow doorbell memory on resets (Keith Busch)
     - use struct_size (Len Baker)
     - add error handling support for add_disk (Luis Chamberlain)
     - limit the maximal queue size for RDMA controllers (Max Gurtovoy)
     - use a few more symbolic names (Max Gurtovoy)
     - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
     - add support for ->map_queues on FC (Saurav Kashyap)"
    
    * tag 'nvme-5.16-2021-10-21' of git://git.infradead.org/nvme: (23 commits)
      nvmet: use struct_size over open coded arithmetic
      nvme: drop scan_lock and always kick requeue list when removing namespaces
      nvme-pci: clear shadow doorbell memory on resets
      nvme-rdma: fix error code in nvme_rdma_setup_ctrl
      nvme-multipath: add error handling support for add_disk()
      nvmet: use macro definitions for setting cmic value
      nvmet: use macro definition for setting nmic value
      nvme: display correct subsystem NQN
      nvme: Add connect option 'discovery'
      nvme: expose subsystem type in sysfs attribute 'subsystype'
      nvmet: set 'CNTRLTYPE' in the identify controller data
      nvmet: add nvmet_is_disc_subsys() helper
      nvme: add CNTRLTYPE definitions for 'identify controller'
      nvmet: make discovery NQN configurable
      nvmet-rdma: implement get_max_queue_size controller op
      nvmet: add get_max_queue_size op for controllers
      nvme-rdma: limit the maximal queue size for RDMA controllers
      nvmet-tcp: fix use-after-free when a port is removed
      nvmet-rdma: fix use-after-free when a port is removed
      nvmet: fix use-after-free when a port is removed
      ...
    axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    cbab6ae View commit details
    Browse the repository at this point in the history
  2. block: aoe: fixup coccinelle warnings

    coccicheck complains about the use of snprintf() in sysfs show
    functions:
    WARNING  use scnprintf or sprintf
    
    Use sysfs_emit instead of scnprintf or sprintf makes more sense.
    
    Reported-by: Zeal Robot <zealci@zte.com.cn>
    Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
    Link: https://lore.kernel.org/r/20211021064931.1047687-1-ye.guojin@zte.com.cn
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Ye Guojin authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    ff06ed7 View commit details
    Browse the repository at this point in the history
  3. dm: add add_disk() error handling

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    There are two calls to dm_setup_md_queue() which can fail then,
    one on dm_early_create() and we can easily see that the error path
    there calls dm_destroy in the error path. The other use case is on
    the ioctl table_load case. If that fails userspace needs to call
    the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
    failure.
    
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    e7089f6 View commit details
    Browse the repository at this point in the history
  4. bcache: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    This driver doesn't do any unwinding with blk_cleanup_disk()
    even on errors after add_disk() and so we follow that
    tradition.
    
    Acked-by: Coly Li <colyli@suse.de>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-5-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    2961c3b View commit details
    Browse the repository at this point in the history
  5. xen-blkfront: add error handling support for add_disk()

    We never checked for errors on device_add_disk() as this function
    returned void. Now that this is fixed, use the shiny new error
    handling. The function xlvbd_alloc_gendisk() typically does the
    unwinding on error on allocating the disk and creating the tag,
    but since all that error handling was stuffed inside
    xlvbd_alloc_gendisk() we must repeat the tag free'ing as well.
    
    We set the info->rq to NULL to ensure blkif_free() doesn't crash
    on blk_mq_stop_hw_queues() on device_add_disk() error as the queue
    will be long gone by then.
    
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-6-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    293a7c5 View commit details
    Browse the repository at this point in the history
  6. m68k/emu/nfblock: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-7-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    21fd880 View commit details
    Browse the repository at this point in the history
  7. um/drivers/ubd_kern: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    ubd_disk_register() never returned an error, so just fix
    that now and let the caller handle the error condition.
    
    Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-8-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    66638f1 View commit details
    Browse the repository at this point in the history
  8. rnbd: add error handling support for add_disk()

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Acked-by: Jack Wang <jinpu.wang@ionos.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-9-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    2e9e31b View commit details
    Browse the repository at this point in the history
  9. mtd: add add_disk() error handling

    We never checked for errors on add_disk() as this function
    returned void. Now that this is fixed, use the shiny new
    error handling.
    
    Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/20211015233028.2167651-10-mcgrof@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    mcgrof authored and axboe committed Oct 21, 2021
    Configuration menu
    Copy the full SHA
    83b863f View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2021

  1. block: remove support for cryptoloop and the xor transfer

    Support for cyrptoloop has been officially marked broken and deprecated
    in favor of dm-crypt (which supports the same broken algorithms if
    needed) in Linux 2.6.4 (released in March 2004), and support for it has
    been entirely removed from losetup in util-linux 2.23 (released in April
    2013).  The XOR transfer has never been more than a toy to demonstrate
    the transfer in the bad old times of crypto export restrictions.
    Remove them as they have some nasty interactions with loop device life
    times due to the iteration over all loop devices in
    loop_unregister_transfer.
    
    Suggested-by: Milan Broz <gmazyland@gmail.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20211019075639.2333969-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Christoph Hellwig authored and axboe committed Oct 22, 2021
    Configuration menu
    Copy the full SHA
    47e9624 View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2021

  1. block: ataflop: more blk-mq refactoring fixes

    As it turns out, my earlier patch in commit 86d46fd (block:
    ataflop: fix breakage introduced at blk-mq refactoring) was
    incomplete. This patch fixes any remaining issues found during
    more testing and code review.
    
    Requests exceeding 4 k are handled in 4k segments but
    __blk_mq_end_request() is never called on these (still
    sectors outstanding on the request). With redo_fd_request()
    removed, there is no provision to kick off processing of the
    next segment, causing requests exceeding 4k to hang. (By
    setting /sys/block/fd0/queue/max_sectors_k <= 4 as workaround,
    this behaviour can be avoided).
    
    Instead of reintroducing redo_fd_request(), requeue the remainder
    of the request by calling blk_mq_requeue_request() on incomplete
    requests (i.e. when blk_update_request() still returns true), and
    rely on the block layer to queue the residual as new request.
    
    Both error handling and formatting needs to release the
    ST-DMA lock, so call finish_fdc() on these (this was previously
    handled by redo_fd_request()). finish_fdc() may be called
    legitimately without the ST-DMA lock held - make sure we only
    release the lock if we actually held it. In a similar way,
    early exit due to errors in ataflop_queue_rq() must release
    the lock.
    
    After minor errors, fd_error sets up to recalibrate the drive
    but never re-runs the current operation (another task handled by
    redo_fd_request() before). Call do_fd_action() to get the next
    steps (seek, retry read/write) underway.
    
    Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
    Fixes: 6ec3938 (ataflop: convert to blk-mq)
    CC: linux-block@vger.kernel.org
    Link: https://lore.kernel.org/r/20211024002013.9332-1-schmitzmic@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Michael Schmitz authored and axboe committed Oct 25, 2021
    Configuration menu
    Copy the full SHA
    d28e4df View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2021

  1. nvme: add new discovery log page entry definitions

    TP8014 adds a new SUBTYPE value and a new field EFLAGS for the
    discovery log page entry.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    785d584 View commit details
    Browse the repository at this point in the history
  2. nvmet: switch check for subsystem type

    Invert the check for discovery subsystem type to allow for additional
    discovery subsystem types.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    598e759 View commit details
    Browse the repository at this point in the history
  3. nvmet: register discovery subsystem as 'current'

    Register the discovery subsystem as the 'current' discovery subsystem,
    and add a new discovery log page entry for it.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    hreinecke authored and Christoph Hellwig committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    2953b30 View commit details
    Browse the repository at this point in the history
  4. nvmet: use flex_array_size and struct_size

    In an effort to avoid open-coded arithmetic in the kernel [1], use the
    flex_array_size() and struct_size() helpers instead of an open-coded
    calculation.
    
    [1] KSPP/linux#160
    
    Signed-off-by: Len Baker <len.baker@gmx.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Len Baker authored and Christoph Hellwig committed Oct 27, 2021
    Configuration menu
    Copy the full SHA
    d156cfc View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2021

  1. Merge tag 'nvme-5.16-2021-10-28' of git://git.infradead.org/nvme into…

    … for-5.16/drivers
    
    Pull NVMe updates from Christoph:
    
    "nvme updates for Linux 5.16
    
     - support the current discovery subsystem entry (Hannes Reinecke)
     - use flex_array_size and struct_size (Len Baker)"
    
    * tag 'nvme-5.16-2021-10-28' of git://git.infradead.org/nvme:
      nvmet: use flex_array_size and struct_size
      nvmet: register discovery subsystem as 'current'
      nvmet: switch check for subsystem type
      nvme: add new discovery log page entry definitions
    axboe committed Oct 28, 2021
    Configuration menu
    Copy the full SHA
    ca77879 View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2021

  1. bcache: move uapi header bcache.h to bcache code directory

    The header file include/uapi/linux/bcache.h is not really a user space
    API heaer. This file defines the ondisk format of bcache internal meta
    data but no one includes it from user space, bcache-tools has its own
    copy of this header with minor modification.
    
    Therefore, this patch moves include/uapi/linux/bcache.h to bcache code
    directory as drivers/md/bcache/bcache_ondisk.h.
    
    Suggested-by: Arnd Bergmann <arnd@kernel.org>
    Suggested-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Coly Li authored and axboe committed Oct 29, 2021
    Configuration menu
    Copy the full SHA
    cf2197c View commit details
    Browse the repository at this point in the history
  2. bcache: replace snprintf in show functions with sysfs_emit

    coccicheck complains about the use of snprintf() in sysfs show functions.
    
    Fix the following coccicheck warning:
    drivers/md/bcache/sysfs.h:54:12-20: WARNING: use scnprintf or sprintf.
    
    Implement sysfs_print() by sysfs_emit() and remove snprint() since no one
    uses it any more.
    
    Suggested-by: Coly Li <colyli@suse.de>
    Signed-off-by: Qing Wang <wangqing@vivo.com>
    Signed-off-by: Coly Li <colyli@suse.de>
    Link: https://lore.kernel.org/r/20211029060930.119923-3-colyli@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Qing Wang authored and axboe committed Oct 29, 2021
    Configuration menu
    Copy the full SHA
    1b86db5 View commit details
    Browse the repository at this point in the history
  3. block: ataflop: Fix warning comparing pointer to 0

    Fix the following coccicheck warning:
    
    ./drivers/block/ataflop.c:1464:20-21: WARNING comparing pointer to 0.
    
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
    Link: https://lore.kernel.org/r/1635501029-81391-1-git-send-email-jiapeng.chong@linux.alibaba.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Jiapeng Chong authored and axboe committed Oct 29, 2021
    Configuration menu
    Copy the full SHA
    df75db1 View commit details
    Browse the repository at this point in the history
  4. null_blk: Fix handling of submit_queues and poll_queues attributes

    Commit 0a593fb ("null_blk: poll queue support") introduced the poll
    queue feature to null_blk. After this change, null_blk device has both
    submit queues and poll queues, and null_map_queues() callback maps the
    both queues for corresponding hardware contexts. The commit also added
    the device configuration attribute 'poll_queues' in same manner as the
    existing attribute 'submit_queues'. These attributes allow to modify the
    numbers of queues. However, when the new values are stored to these
    attributes, the values are just handled only for the corresponding
    queue. When number of submit_queue is updated, number of poll_queue is
    not counted, or vice versa.  This caused inconsistent number of queues
    and queue mapping and resulted in null-ptr-dereference. This failure was
    observed in blktests block/029 and block/030.
    
    To avoid the inconsistency, fix the attribute updates to care both
    submit_queues and poll_queues. Introduce the helper function
    nullb_update_nr_hw_queues() to handle stores to the both two attributes.
    Add poll_queues field to the struct nullb_device to track the number in
    same manner as submit_queues. Add two more fields prev_submit_queues and
    prev_poll_queues to keep the previous values before change. In case the
    block layer failed to update the nr_hw_queues, refer the previous values
    in null_map_queues() to map queues in same manner as before change.
    
    Also add poll_queues value checks in nullb_update_nr_hw_queues() and
    null_validate_conf(). They ensure the poll_queues value of each device
    is within the range from 1 to module parameter value of poll_queues.
    
    Fixes: 0a593fb ("null_blk: poll queue support")
    Reported-by: Yi Zhang <yi.zhang@redhat.com>
    Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
    Link: https://lore.kernel.org/r/20211029103926.845635-1-shinichiro.kawasaki@wdc.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    kawasaki authored and axboe committed Oct 29, 2021
    Configuration menu
    Copy the full SHA
    15dfc66 View commit details
    Browse the repository at this point in the history