Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative page fault #359

Closed
wants to merge 9 commits into from
Closed

Speculative page fault #359

wants to merge 9 commits into from

Commits on Oct 25, 2016

  1. UBUNTU: SAUCE: (no-up) disable -pie when gcc has it enabled by default

    In Ubuntu 16.10, gcc's defaults have been set to build Position
    Independent Executables (PIE) on amd64 and ppc64le (gcc was configured
    this way for s390x in Ubuntu 16.04 LTS). This breaks the kernel build on
    amd64. The following patch disables pie for x86 builds (though not yet
    verified to work with gcc configured to build PIE by default i386 --
    we're not planning to enable it for that architecture).
    
    The intent is for this patch to go upstream after expanding it to
    additional architectures where needed, but I wanted to ensure that
    we could build 16.10 kernels first. I've successfully built kernels
    and booted them with this patch applied using the 16.10 compiler.
    
    Patch is against yakkety.git, but also applies with minor movement
    (no fuzz) against current linus.git.
    
    Signed-off-by: Steve Beattie <steve.beattie@canonical.com>
    [apw@canonical.com: shifted up so works in arch/<arch/Makefile.]
    BugLink: http://bugs.launchpad.net/bugs/1574982
    Signed-off-by: Andy Whitcroft <apw@canonical.com>
    Acked-by: Tim Gardner <tim.gardner@canonical.com>
    Acked-by: Stefan Bader <stefan.bader@canonical.com>
    Signed-off-by: Kamal Mostafa <kamal@canonical.com>
    Signed-off-by: Andy Whitcroft <apw@canonical.com>
    
    Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    stevebeattie authored and Laurent Dufour committed Oct 25, 2016
    Configuration menu
    Copy the full SHA
    9e85adc View commit details
    Browse the repository at this point in the history

Commits on Nov 17, 2016

  1. mm: Dont assume page-table invariance during faults

    One of the side effects of speculating on faults (without holding
    mmap_sem) is that we can race with free_pgtables() and therefore we
    cannot assume the page-tables will stick around.
    
    Remove the relyance on the pte pointer.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    fb8d946 View commit details
    Browse the repository at this point in the history
  2. mm: Prepare for FAULT_FLAG_SPECULATIVE

    When speculating faults (without holding mmap_sem) we need to validate
    that the vma against which we loaded pages is still valid when we're
    ready to install the new PTE.
    
    Therefore, replace the pte_offset_map_lock() calls that (re)take the
    PTL with pte_map_lock() which can fail in case we find the VMA changed
    since we started the fault.
    
    Instead of passing around the endless list of function arguments,
    replace the lot with a single structure so we can change context
    without endless function signature changes.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    [port to 4.8 kernel]
    Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    a07d2d6 View commit details
    Browse the repository at this point in the history
  3. mm: Introduce pte_spinlock

    This is need because in handle_pte_fault() pte_offset_map() called
    and then fe->ptl is fetched and spin_locked.
    
    This was previously embedded in the call to pte_offset_map_lock().
    Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    b8a92f6 View commit details
    Browse the repository at this point in the history
  4. mm: VMA sequence count

    Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
    counts such that we can easily test if a VMA is changed.
    
    The unmap_page_range() one allows us to make assumptions about
    page-tables; when we find the seqcount hasn't changed we can assume
    page-tables are still valid.
    
    The flip side is that we cannot distinguish between a vma_adjust() and
    the unmap_page_range() -- where with the former we could have
    re-checked the vma bounds against the address.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    964906c View commit details
    Browse the repository at this point in the history
  5. SRCU free VMAs

    Manage the VMAs with SRCU such that we can do a lockless VMA lookup.
    
    We put the fput(vma->vm_file) in the SRCU callback, this keeps files
    valid during speculative faults, this is possible due to the delayed
    fput work by Al Viro -- do we need srcu_barrier() in unmount
    someplace?
    
    We guard the mm_rb tree with a seqlock (XXX could be a seqcount but
    we'd have to disable preemption around the write side in order to make
    the retry loop in __read_seqcount_begin() work) such that we can know
    if the rb tree walk was correct. We cannot trust the restult of a
    lockless tree walk in the face of concurrent tree rotations; although
    we can trust on the termination of such walks -- tree rotations
    guarantee the end result is a tree again after all.
    
    Furthermore, we rely on the WMB implied by the
    write_seqlock/count_begin() to separate the VMA initialization and the
    publishing stores, analogous to the RELEASE in rcu_assign_pointer().
    We also rely on the RMB from read_seqretry() to separate the vma load
    from further loads like the smp_read_barrier_depends() in regular
    RCU.
    
    We must not touch the vmacache while doing SRCU lookups as that is not
    properly serialized against changes. We update gap information after
    publishing the VMA, but A) we don't use that and B) the seqlock
    read side would fix that anyhow.
    
    We clear vma->vm_rb for nodes removed from the vma tree such that we
    can easily detect such 'dead' nodes, we rely on the WMB from
    write_sequnlock() to separate the tree removal and clearing the node.
    
    Provide find_vma_srcu() which wraps the required magic.
    
    XXX: mmap()/munmap() heavy workloads might suffer from the global lock
    in call_srcu() -- this is fixable with a 'better' SRCU implementation.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    538abc3 View commit details
    Browse the repository at this point in the history
  6. mm: Provide speculative fault infrastructure

    Provide infrastructure to do a speculative fault (not holding
    mmap_sem).
    
    The not holding of mmap_sem means we can race against VMA
    change/removal and page-table destruction. We use the SRCU VMA freeing
    to keep the VMA around. We use the VMA seqcount to detect change
    (including umapping / page-table deletion) and we use gup_fast() style
    page-table walking to deal with page-table races.
    
    Once we've obtained the page and are ready to update the PTE, we
    validate if the state we started the fault with is still valid, if
    not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
    PTE and we're done.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    b7b7cc5 View commit details
    Browse the repository at this point in the history
  7. mm: Fix pte_spinlock for speculative page fault

    Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    02983e3 View commit details
    Browse the repository at this point in the history
  8. mm,x86: Add speculative pagefault handling

    Try a speculative fault before acquiring mmap_sem, if it returns with
    VM_FAULT_RETRY continue with the mmap_sem acquisition and do the
    traditional fault.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Peter Zijlstra authored and Laurent Dufour committed Nov 17, 2016
    Configuration menu
    Copy the full SHA
    cb682a5 View commit details
    Browse the repository at this point in the history