Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: __asm_copy_to-from_user() having overrun copy #31

Closed
wants to merge 3 commits into from

Commits on Jul 19, 2021

  1. Configuration menu
    Copy the full SHA
    f9e73a0 View commit details
    Browse the repository at this point in the history
  2. Revert "riscv: __asm_copy_to-from_user: Optimize unaligned memory acc…

    …ess and pipeline stall"
    
    This reverts commit ca6eaaa.
    mcd500 committed Jul 19, 2021
    Configuration menu
    Copy the full SHA
    7117327 View commit details
    Browse the repository at this point in the history
  3. riscv: __asm_copy_to-from_user: Optimize unaligned memory access and …

    …pipeline stall
    
    This patch will reduce cpu usage dramatically in kernel space especially
    for application which use sys-call with large buffer size, such as
    network applications. The main reason behind this is that every
    unaligned memory access will raise exceptions and switch between s-mode
    and m-mode causing large overhead.
    
    First copy in bytes until reaches the first word aligned boundary in
    destination memory address. This is the preparation before the bulk
    aligned word copy.
    
    The destination address is aligned now, but oftentimes the source
    address is not in an aligned boundary. To reduce the unaligned memory
    access, it reads the data from source in aligned boundaries, which will
    cause the data to have an offset, and then combines the data in the next
    iteration by fixing offset with shifting before writing to destination.
    The majority of the improving copy speed comes from this shift copy.
    
    In the lucky situation that the both source and destination address are
    on the aligned boundary, perform load and store with register size to
    copy the data. Without the unrolling, it will reduce the speed since the
    next store instruction for the same register using from the load will
    stall the pipeline. If the size of copy is too small for unrolled copy
    perform regular word copy.
    
    At last, copying the remainder in one byte at a time.
    
    The motivation to create the patch was to improve network performance on
    beaglev beta board. By observing with perf, the memcpy and
    __asm_copy_to_user had heavy cpu usage and the network speed was limited
    at around 680Mbps on 1Gbps lan.
    
    Typical network applications use system calls with a large buffer on
    send/recv() and sendto/recvfrom() for the optimization.
    
    The bench result, when patching only copy_user. The memcpy is without
    Matteo's patches but listing the both since they are the top two largest
    overhead.
    
    All results are from the same base kernel, same rootfs and same BeagleV
    beta board.
    
    Results of iperf3 have speedup on UDP with the copy_user patch alone.
    
    --- UDP send ---
    306 Mbits/sec      362 Mbits/sec
    305 Mbits/sec      362 Mbits/sec
    
    --- UDP recv ---
    772 Mbits/sec      787 Mbits/sec
    773 Mbits/sec      784 Mbits/sec
    
    Comparison by "perf top -Ue task-clock" while running iperf3.
    
    --- TCP recv ---
     * Before
      40.40%  [kernel]  [k] memcpy
      33.09%  [kernel]  [k] __asm_copy_to_user
     * With patch
      50.35%  [kernel]  [k] memcpy
      13.76%  [kernel]  [k] __asm_copy_to_user
    
    --- TCP send ---
     * Before
      19.96%  [kernel]  [k] memcpy
       9.84%  [kernel]  [k] __asm_copy_to_user
     * With patch
      14.27%  [kernel]  [k] memcpy
       7.37%  [kernel]  [k] __asm_copy_to_user
    
    --- UDP recv ---
     * Before
      44.45%  [kernel]  [k] memcpy
      31.04%  [kernel]  [k] __asm_copy_to_user
     * With patch
      55.62%  [kernel]  [k] memcpy
      11.22%  [kernel]  [k] __asm_copy_to_user
    
    --- UDP send ---
     * Before
      25.18%  [kernel]  [k] memcpy
      22.50%  [kernel]  [k] __asm_copy_to_user
     * With patch
      28.90%  [kernel]  [k] memcpy
       9.49%  [kernel]  [k] __asm_copy_to_user
    
    Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
    mcd500 committed Jul 19, 2021
    Configuration menu
    Copy the full SHA
    ecc572a View commit details
    Browse the repository at this point in the history