-
Notifications
You must be signed in to change notification settings - Fork 54.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speculative page fault #359
Commits on Oct 25, 2016
-
UBUNTU: SAUCE: (no-up) disable -pie when gcc has it enabled by default
In Ubuntu 16.10, gcc's defaults have been set to build Position Independent Executables (PIE) on amd64 and ppc64le (gcc was configured this way for s390x in Ubuntu 16.04 LTS). This breaks the kernel build on amd64. The following patch disables pie for x86 builds (though not yet verified to work with gcc configured to build PIE by default i386 -- we're not planning to enable it for that architecture). The intent is for this patch to go upstream after expanding it to additional architectures where needed, but I wanted to ensure that we could build 16.10 kernels first. I've successfully built kernels and booted them with this patch applied using the 16.10 compiler. Patch is against yakkety.git, but also applies with minor movement (no fuzz) against current linus.git. Signed-off-by: Steve Beattie <steve.beattie@canonical.com> [apw@canonical.com: shifted up so works in arch/<arch/Makefile.] BugLink: http://bugs.launchpad.net/bugs/1574982 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Configuration menu - View commit details
-
Copy full SHA for 9e85adc - Browse repository at this point
Copy the full SHA 9e85adcView commit details
Commits on Nov 17, 2016
-
mm: Dont assume page-table invariance during faults
One of the side effects of speculating on faults (without holding mmap_sem) is that we can race with free_pgtables() and therefore we cannot assume the page-tables will stick around. Remove the relyance on the pte pointer. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for fb8d946 - Browse repository at this point
Copy the full SHA fb8d946View commit details -
mm: Prepare for FAULT_FLAG_SPECULATIVE
When speculating faults (without holding mmap_sem) we need to validate that the vma against which we loaded pages is still valid when we're ready to install the new PTE. Therefore, replace the pte_offset_map_lock() calls that (re)take the PTL with pte_map_lock() which can fail in case we find the VMA changed since we started the fault. Instead of passing around the endless list of function arguments, replace the lot with a single structure so we can change context without endless function signature changes. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [port to 4.8 kernel] Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for a07d2d6 - Browse repository at this point
Copy the full SHA a07d2d6View commit details -
This is need because in handle_pte_fault() pte_offset_map() called and then fe->ptl is fetched and spin_locked. This was previously embedded in the call to pte_offset_map_lock().
Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for b8a92f6 - Browse repository at this point
Copy the full SHA b8a92f6View commit details -
Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence counts such that we can easily test if a VMA is changed. The unmap_page_range() one allows us to make assumptions about page-tables; when we find the seqcount hasn't changed we can assume page-tables are still valid. The flip side is that we cannot distinguish between a vma_adjust() and the unmap_page_range() -- where with the former we could have re-checked the vma bounds against the address. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for 964906c - Browse repository at this point
Copy the full SHA 964906cView commit details -
Manage the VMAs with SRCU such that we can do a lockless VMA lookup. We put the fput(vma->vm_file) in the SRCU callback, this keeps files valid during speculative faults, this is possible due to the delayed fput work by Al Viro -- do we need srcu_barrier() in unmount someplace? We guard the mm_rb tree with a seqlock (XXX could be a seqcount but we'd have to disable preemption around the write side in order to make the retry loop in __read_seqcount_begin() work) such that we can know if the rb tree walk was correct. We cannot trust the restult of a lockless tree walk in the face of concurrent tree rotations; although we can trust on the termination of such walks -- tree rotations guarantee the end result is a tree again after all. Furthermore, we rely on the WMB implied by the write_seqlock/count_begin() to separate the VMA initialization and the publishing stores, analogous to the RELEASE in rcu_assign_pointer(). We also rely on the RMB from read_seqretry() to separate the vma load from further loads like the smp_read_barrier_depends() in regular RCU. We must not touch the vmacache while doing SRCU lookups as that is not properly serialized against changes. We update gap information after publishing the VMA, but A) we don't use that and B) the seqlock read side would fix that anyhow. We clear vma->vm_rb for nodes removed from the vma tree such that we can easily detect such 'dead' nodes, we rely on the WMB from write_sequnlock() to separate the tree removal and clearing the node. Provide find_vma_srcu() which wraps the required magic. XXX: mmap()/munmap() heavy workloads might suffer from the global lock in call_srcu() -- this is fixable with a 'better' SRCU implementation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for 538abc3 - Browse repository at this point
Copy the full SHA 538abc3View commit details -
mm: Provide speculative fault infrastructure
Provide infrastructure to do a speculative fault (not holding mmap_sem). The not holding of mmap_sem means we can race against VMA change/removal and page-table destruction. We use the SRCU VMA freeing to keep the VMA around. We use the VMA seqcount to detect change (including umapping / page-table deletion) and we use gup_fast() style page-table walking to deal with page-table races. Once we've obtained the page and are ready to update the PTE, we validate if the state we started the fault with is still valid, if not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the PTE and we're done. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for b7b7cc5 - Browse repository at this point
Copy the full SHA b7b7cc5View commit details -
mm: Fix pte_spinlock for speculative page fault
Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for 02983e3 - Browse repository at this point
Copy the full SHA 02983e3View commit details -
mm,x86: Add speculative pagefault handling
Try a speculative fault before acquiring mmap_sem, if it returns with VM_FAULT_RETRY continue with the mmap_sem acquisition and do the traditional fault. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Peter Zijlstra authored and Laurent Dufour committedNov 17, 2016 Configuration menu - View commit details
-
Copy full SHA for cb682a5 - Browse repository at this point
Copy the full SHA cb682a5View commit details