Sync up with Linus #88

dabrace · 2015-08-05T15:19:48Z

No description provided.

Previously there was a trivial panic unshare -n /bin/bash <<EOF ip addr add dev lo face::1/128 ipvsadm -A -t [face::1]:15213 ipvsadm -a -t [face::1]:15213 -r b00c::1 echo boom | nc face::1 15213 EOF This patch allows us to replicate the net logic above and simply capture the skb_dst(skb)->dev and use that for the purpose of the invocation. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

Michael Vallaly reports about wrong source address used in rare cases for tunneled traffic. Looks like __ip_vs_get_out_rt in 3.10+ is providing uninitialized dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses kmalloc. While we retry after seeing EINVAL from routing for data that does not look like valid local address, it still succeeded when this memory was previously used from other dests and with different local addresses. As result, we can use valid local address that is not suitable for our real server. Fix it by providing 0.0.0.0 every time our cache is refreshed. By this way we will get preferred source address from routing. Reported-by: Michael Vallaly <lvs@nolatency.com> Fixes: 026ace0 ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

I overlooked the svc->sched_data usage from schedulers when the services were converted to RCU in 3.10. Now the rare ipvsadm -E command can change the scheduler but due to the reverse order of ip_vs_bind_scheduler and ip_vs_unbind_scheduler we provide new sched_data to the old scheduler resulting in a crash. To fix it without changing the scheduler methods we have to use synchronize_rcu() only for the editing case. It means all svc->scheduler readers should expect a NULL value. To avoid breakage for the service listing and ipvsadm -R we can use the "none" name to indicate that scheduler is not assigned, a state when we drop new connections. Reported-by: Alexander Vasiliev <a.vasylev@404-group.com> Fixes: ceec4c3 ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell <agartrell@fb.com> Fixes: 41063e9 ("ipv4: Early TCP socket demux.") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: 749c42b ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

Reset XPS's sender_cpu on forwarding. Signed-off-by: Julian Anastasov <ja@ssi.bg> Fixes: 2bd8248 ("xps: fix xps for stacked devices") Signed-off-by: Simon Horman <horms@verge.net.au>

Mixer control for widgets can't be created if the info is NULL. So assign the correct info for this. Signed-off-by: Jeeja KP <jeeja.kp@intel.com> Signed-off-by: Subhransu S. Prusty <subhransu.s.prusty@intel.com> Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Signed-off-by: Mark Brown <broonie@kernel.org>

09a2c73 ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition") removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was unused in the kernel. But that breaks user programs that were using it (QEMU in particular). Restore the PCI_MSIX_FLAGS_BIRMASK definition. [bhelgaas: changelog] Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.13+

The usage_count variable was read before it was set to the correct value, due to which the firmware load was failing. Because of this IPC messages sent to the firmware were timing out causing a delay of about 1 second while playing audio from the internal speakers. With this patch the usage_count is read after the function call pm_runtime_get_sync which will increment the usage_count variable and the firmware load is successful and all the IPC messages are processed correctly. Signed-off-by: Shilpa Sreeramalu <shilpa.sreeramalu@intel.com> Signed-off-by: Fang, Yang A <yang.a.fang@intel.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org

GPMC smart idle is not really broken but it does not support smart idle with wakeup. Fixes: 556708f ("ARM: OMAP: DRA7: hwmod: Make gpmc software supervised as the smart idle is broken") Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Paul Walmsley <paul@pwsan.com>

This fixes kernel panic on boot, if rt5645->codec is NULL when rt5645_jack_detect_work is first called. rt5645_jack_detect_work needs rt5645->codec to be initialized to setup dapm pins. Also, reporting jack events is useless, as the jacks cannot be set before the codec is ready. Since we manually call the interrupt handler in rt5645_set_jack_detect, the initial jack state will be reported correctly, and dapm pins will be setup at that time. Signed-off-by: Nicolas Boichat <drinkcat@chromium.org> Signed-off-by: Mark Brown <broonie@kernel.org>

When dom0 is being ballooned balloon_process() will hold the balloon mutex until it is finished. This will block e.g. creation of new domains as the device backends for the new domain need some autoballooned pages for the ring buffers. Avoid this by releasing the balloon mutex from time to time during ballooning. Adjust the comment above balloon_process() regarding multiple instances of balloon_process(). Instead of open coding it, just use cond_resched(). Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Quoting Daniel Borkmann: "When adding connection tracking template rules to a netns, f.e. to configure netfilter zones, the kernel will endlessly busy-loop as soon as we try to delete the given netns in case there's at least one template present, which is problematic i.e. if there is such bravery that the priviledged user inside the netns is assumed untrusted. Minimal example: ip netns add foo ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 ip netns del foo What happens is that when nf_ct_iterate_cleanup() is being called from nf_conntrack_cleanup_net_list() for a provided netns, we always end up with a net->ct.count > 0 and thus jump back to i_see_dead_people. We don't get a soft-lockup as we still have a schedule() point, but the serving CPU spins on 100% from that point onwards. Since templates are normally allocated with nf_conntrack_alloc(), we also bump net->ct.count. The issue why they are not yet nf_ct_put() is because the per netns .exit() handler from x_tables (which would eventually invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is called in the dependency chain at a *later* point in time than the per netns .exit() handler for the connection tracker. This is clearly a chicken'n'egg problem: after the connection tracker .exit() handler, we've teared down all the connection tracking infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be invoked at a later point in time during the netns cleanup, as that would lead to a use-after-free. At the same time, we cannot make x_tables depend on the connection tracker module, so that the xt_ct_tg_destroy() would be invoked earlier in the cleanup chain." Daniel confirms this has to do with the order in which modules are loaded or having compiled nf_conntrack as modules while x_tables built-in. So we have no guarantees regarding the order in which netns callbacks are executed. Fix this by allocating the templates through kmalloc() from the respective SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache. Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch is marked as unlikely since conntrack templates are rarely allocated and only from the configuration plane path. Note that templates are not kept in any list to avoid further dependencies with nf_conntrack anymore, thus, the tmpl larval list is removed. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Daniel Borkmann <daniel@iogearbox.net>

…ux/kernel/git/horms/ipvs Simon Horman says: ==================== IPVS Fixes for v4.2 please consider this fix for v4.2. For reasons that are not clear to me it is a bumper crop. It seems to me that they are all relevant to stable. Please let me know if you need my help to get the fixes into stable. * ipvs: fix ipv6 route unreach panic This problem appears to be present since IPv6 support was added to IPVS in v2.6.28. * ipvs: skb_orphan in case of forwarding This appears to resolve a problem resulting from a side effect of 41063e9 ("ipv4: Early TCP socket demux.") which was included in v3.6. * ipvs: do not use random local source address for tunnels This appears to resolve a problem introduced by 026ace0 ("ipvs: optimize dst usage for real server") in v3.10. * ipvs: fix crash if scheduler is changed This appears to resolve a problem introduced by ceec4c3 ("ipvs: convert services to rcu") in v3.10. Julian has provided backports of the fix: * [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04008.html * [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04007.html Please let me know how you would like to handle guiding these backports into stable. * ipvs: fix crash with sync protocol v0 and FTP This appears to resolve a problem introduced by 749c42b ("ipvs: reduce sync rate with time thresholds") in v3.5 ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

On an absent filesystem (one served by another server), we need to be able to handle requests for certain attributest (like fs_locations, so the client can find out which server does have the filesystem), but others we can't. We forgot to take that into account when adding another attribute bitmask work for the SECURITY_LABEL attribute. There an export entry with the "refer" option can result in: [ 88.414272] kernel BUG at fs/nfsd/nfs4xdr.c:2249! [ 88.414828] invalid opcode: 0000 [#1] SMP [ 88.415368] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nfsd xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iosf_mbi ppdev btrfs coretemp crct10dif_pclmul crc32_pclmul crc32c_intel xor ghash_clmulni_intel raid6_pq vmw_balloon parport_pc parport i2c_piix4 shpchp vmw_vmci acpi_cpufreq auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi mptscsih serio_raw mptbase e1000 scsi_transport_spi ata_generic pata_acpi [last unloaded: nfsd] [ 88.417827] CPU: 0 PID: 2116 Comm: nfsd Not tainted 4.0.7-300.fc22.x86_64 #1 [ 88.418448] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 88.419093] task: ffff880079146d50 ti: ffff8800785d8000 task.ti: ffff8800785d8000 [ 88.419729] RIP: 0010:[<ffffffffa04b3c10>] [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd] [ 88.420376] RSP: 0000:ffff8800785db998 EFLAGS: 00010206 [ 88.421027] RAX: 0000000000000001 RBX: 000000000018091a RCX: ffff88006668b980 [ 88.421676] RDX: 00000000fffef7fc RSI: 0000000000000000 RDI: ffff880078d05000 [ 88.422315] RBP: ffff8800785dbb58 R08: ffff880078d043f8 R09: ffff880078d4a000 [ 88.422968] R10: 0000000000010000 R11: 0000000000000002 R12: 0000000000b0a23a [ 88.423612] R13: ffff880078d05000 R14: ffff880078683100 R15: ffff88006668b980 [ 88.424295] FS: 0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000 [ 88.424944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.425597] CR2: 00007f40bc370f90 CR3: 0000000035af5000 CR4: 00000000001407f0 [ 88.426285] Stack: [ 88.426921] ffff8800785dbaa8 ffffffffa049e4af ffff8800785dba08 ffffffff813298f0 [ 88.427585] ffff880078683300 ffff8800769b0de8 0000089d00000001 0000000087f805e0 [ 88.428228] ffff880000000000 ffff880079434a00 0000000000000000 ffff88006668b980 [ 88.428877] Call Trace: [ 88.429527] [<ffffffffa049e4af>] ? exp_get_by_name+0x7f/0xb0 [nfsd] [ 88.430168] [<ffffffff813298f0>] ? inode_doinit_with_dentry+0x210/0x6a0 [ 88.430807] [<ffffffff8123833e>] ? d_lookup+0x2e/0x60 [ 88.431449] [<ffffffff81236133>] ? dput+0x33/0x230 [ 88.432097] [<ffffffff8123f214>] ? mntput+0x24/0x40 [ 88.432719] [<ffffffff812272b2>] ? path_put+0x22/0x30 [ 88.433340] [<ffffffffa049ac87>] ? nfsd_cross_mnt+0xb7/0x1c0 [nfsd] [ 88.433954] [<ffffffffa04b54e0>] nfsd4_encode_dirent+0x1b0/0x3d0 [nfsd] [ 88.434601] [<ffffffffa04b5330>] ? nfsd4_encode_getattr+0x40/0x40 [nfsd] [ 88.435172] [<ffffffffa049c991>] nfsd_readdir+0x1c1/0x2a0 [nfsd] [ 88.435710] [<ffffffffa049a530>] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd] [ 88.436447] [<ffffffffa04abf30>] nfsd4_encode_readdir+0x120/0x220 [nfsd] [ 88.437011] [<ffffffffa04b58cd>] nfsd4_encode_operation+0x7d/0x190 [nfsd] [ 88.437566] [<ffffffffa04aa6dd>] nfsd4_proc_compound+0x24d/0x6f0 [nfsd] [ 88.438157] [<ffffffffa0496103>] nfsd_dispatch+0xc3/0x220 [nfsd] [ 88.438680] [<ffffffffa006f0cb>] svc_process_common+0x43b/0x690 [sunrpc] [ 88.439192] [<ffffffffa0070493>] svc_process+0x103/0x1b0 [sunrpc] [ 88.439694] [<ffffffffa0495a57>] nfsd+0x117/0x190 [nfsd] [ 88.440194] [<ffffffffa0495940>] ? nfsd_destroy+0x90/0x90 [nfsd] [ 88.440697] [<ffffffff810bb728>] kthread+0xd8/0xf0 [ 88.441260] [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180 [ 88.441762] [<ffffffff81789e58>] ret_from_fork+0x58/0x90 [ 88.442322] [<ffffffff810bb650>] ? kthread_worker_fn+0x180/0x180 [ 88.442879] Code: 0f 84 93 05 00 00 83 f8 ea c7 85 a0 fe ff ff 00 00 27 30 0f 84 ba fe ff ff 85 c0 0f 85 a5 fe ff ff e9 e3 f9 ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 be 04 00 00 00 4c 89 ef 4c 89 8d 68 fe [ 88.444052] RIP [<ffffffffa04b3c10>] nfsd4_encode_fattr+0x820/0x1f00 [nfsd] [ 88.444658] RSP <ffff8800785db998> [ 88.445232] ---[ end trace 6cb9d0487d94a29f ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>

If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file. Fix commit c5c707f "nfsd: implement pNFS layout recalls". Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>

As a follow-up to recent changes to Exynos mipi video phy driver, introducing support for PMU regmap in commit e4b3d38 ("phy: exynos-video-mipi: Fix regression by adding support for PMU regmap") add a syscon phandle to video-phy node to bring back to life both MIPI DSI display and MIPI CSIS-2 camera sensor on Exynos3250. Signed-off-by: Beata Michalska <b.michalska@samsung.com> Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kukjin Kim <kgene@kernel.org>

For Exynos4210 platforms, add CPU operating points and CPU regulator supply properties for migrating from Exynos specific cpufreq driver to using generic cpufreq driver. Cc: Doug Anderson <dianders@chromium.org> Cc: Andreas Faerber <afaerber@suse.de> Cc: Sachin Kamat <sachin.kamat@linaro.org> Cc: Andreas Farber <afaerber@suse.de> Cc: Javier Martinez Canillas <javier.martinez@collabora.co.uk> Signed-off-by: Thomas Abraham <thomas.ab@samsung.com> [b.zolnierkie: removed exynos5250 and exynos5420 support for now] Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> [k.kozlowski: Rebased, moved cpu nodes to alphabetical position] Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Kukjin Kim <kgene@kernel.org>

Commit fdb6eb0 ("ASoC: dapm: Modify widget stream name according to prefix") fixed the case where a DAPM route between a DAI widget and a DAC/ADC/AIF widget with a matching stream name was not created when the DAPM context was using a prefix. Unfortunately the patch introduced a few issues on its own like leaking the dynamically allocated stream name memory and also not checking whether the allocation succeeded in the first place. It is also incomplete in that it still does not handle the case where stream name of the widget is a substring of the stream name of the DAI, which is explicitly allowed and works fine if no DAPM prefix is used. Revert the commit and take a slightly different approach to solving the issue. Instead of comparing the widget's stream name to the name of the DAI widget compare it to the stream name of the DAI widget. The stream name of the DAI widget is identical to the name of the DAI widget except that it wont have the DAPM prefix added. So this approach behaves identical regardless to whether the DAPM context uses a prefix or not. We don't have to worry about potentially matching with a widget with the same stream name, but from a different DAPM context with a different prefix, since the code already makes sure that both the DAI widget and the matched widget are from the same DAPM context. Fixes: fdb6eb0 ("ASoC: dapm: Modify widget stream name according to prefix") Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org

…nel/git/kgene/linux-samsung into fixes Merge "Samsung fixes for v4.2" from Kukjin Kim: From Krzysztof Kozlowski: 1. Fix exynos3250 MIPI DSI display and MIPI CSIS-2 camera sensorx after adding support for PMU regmap in exynos-video-mipi driver (issue introduced in v4.0). 2. Bring back cpufreq for exynos4210 after incomplete switch to cpufreq-dt driver in 4.2 merge window. The necessary DT changes for exynos4210 cpufreq was not applied to the same tree as rest of patchset because of multiple conflicts between clk and arm-soc trees. Unfortunately without the change the exynos4210 boards loose cpufreq feature. * tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: dts: add CPU OPP and regulator supply property for exynos4210 ARM: dts: Update video-phy node with syscon phandle for exynos3250 Signed-off-by: Olof Johansson <olof@lixom.net>

Currently, below code actually does not update any bit because SGTL5000_SMALL_POP is 0. snd_soc_update_bits(codec, SGTL5000_CHIP_REF_CTRL, SGTL5000_SMALL_POP, 1); The SGTL5000_SMALL_POP should be BIT(0) rather than 0, fix it. Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-By: Alexander Stein <alexander.stein@systec-electronic.com> Signed-off-by: Mark Brown <broonie@kernel.org>

The regmap_write in ssm4567_set_dai_fmt accidentally clears the TDM_BCLKS field which was set earlier by ssm4567_set_tdm_slot. This patch fixes it by using regmap_update_bits with proper mask. Signed-off-by: Ben Zhang <benzh@chromium.org> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org

When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: 5d0aa2c (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Initialize the ARCH_* overrides before including the arch Makefile, to avoid picking up the values from the environment. The variables can still be overriden on the make command line, but this won't happen by accident. Signed-off-by: Michal Marek <mmarek@suse.com>

Running `make modules_install` ordinarily will overwrite existing modules. This is the desired behavior, and is how pretty much every other `make install` target works. However, if CONFIG_MODULE_COMPRESS is enabled, modules are passed through gzip and xz which then do the file writing. Both gzip and xz will error out if the file already exists, unless -f is passed. This patch adds -f so that the behavior is uniform. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Michal Marek <mmarek@suse.com>

Liu Bo <bo.li.liu@oracle.com> reported a lockdep warning of delayed_iput_sem in xfstests generic/241: [ 2061.345955] ============================================= [ 2061.346027] [ INFO: possible recursive locking detected ] [ 2061.346027] 4.1.0+ #268 Tainted: G W [ 2061.346027] --------------------------------------------- [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock: [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] but task is already holding lock: [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] other info that might help us debug this: [ 2061.346027] Possible unsafe locking scenario: [ 2061.346027] CPU0 [ 2061.346027] ---- [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] *** DEADLOCK *** It is rarely happened, about 1/400 in my test env. The reason is recursion of btrfs_run_delayed_iputs(): cleaner_kthread -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock *2 -> iput() -> ... -> btrfs_commit_transaction() -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock (dead lock) *2 *1: recursion of btrfs_run_delayed_iputs() *2: warning of lockdep about delayed_iput_sem When fs is in high stress, new iputs may added into fs_info->delayed_iputs list when btrfs_run_delayed_iputs() is running, which cause second btrfs_run_delayed_iputs() run into down_read(&fs_info->delayed_iput_sem) again, and cause above lockdep warning. Actually, it will not cause real problem because both locks are read lock, but to avoid lockdep warning, we can do a fix. Fix: Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for cleaner_kthread thread to break above recursion path. cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code, and don't need to call btrfs_run_delayed_iputs() again in btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow. Test: No above lockdep warning after patch in 1200 generic/241 tests. Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>

…_tree_block() fail When read_tree_block() failed, we can see following dmesg: [ 134.371389] BUG: unable to handle kernel NULL pointer dereference at 0000000000000063 [ 134.372236] IP: [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90 [ 134.372236] PGD 0 [ 134.372236] Oops: 0000 [#1] SMP [ 134.372236] Modules linked in: [ 134.372236] CPU: 0 PID: 2289 Comm: mount Not tainted 4.2.0-rc1_HEAD_c65b99f046843d2455aa231747b5a07a999a9f3d_+ #115 [ 134.372236] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014 [ 134.372236] task: ffff88003b6e1a00 ti: ffff880011e60000 task.ti: ffff880011e60000 [ 134.372236] RIP: 0010:[<ffffffff813a4a51>] [<ffffffff813a4a51>] free_extent_buffer+0x21/0x90 ... [ 134.372236] Call Trace: [ 134.372236] [<ffffffff81379aa1>] free_root_extent_buffers+0x91/0xb0 [ 134.372236] [<ffffffff81379c3d>] free_root_pointers+0x17d/0x190 [ 134.372236] [<ffffffff813801b0>] open_ctree+0x1ca0/0x25b0 [ 134.372236] [<ffffffff8144d017>] ? disk_name+0x97/0xb0 [ 134.372236] [<ffffffff813558aa>] btrfs_mount+0x8fa/0xab0 ... Reason: read_tree_block() changed to return error number on fail, and this value(not NULL) is set to tree_root->node, then subsequent code will run to: free_root_pointers() ->free_root_extent_buffers() ->free_extent_buffer() ->atomic_read((extent_buffer *)(-E_XXX)->refs); and trigger above error. Fix: Set tree_root->node to NULL on fail to make error_handle code happy. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>

sorry I indented to use btrfs_err() and I have no idea how btrfs_error() got there. infact I was thinking about these kind of oversights since these two func are too closely named. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>

Omar reported that after commit 4fbcdf6 ("Btrfs: fix -ENOSPC when finishing block group creation"), introduced in 4.2-rc1, the following test was failing due to exhaustion of the system array in the superblock: #!/bin/bash truncate -s 100T big.img mkfs.btrfs big.img mount -o loop big.img /mnt/loop num=5 sz=10T for ((i = 0; i < $num; i++)); do echo fallocate $i $sz fallocate -l $sz /mnt/loop/testfile$i done btrfs filesystem sync /mnt/loop for ((i = 0; i < $num; i++)); do echo rm $i rm /mnt/loop/testfile$i btrfs filesystem sync /mnt/loop done umount /mnt/loop This made btrfs_add_system_chunk() fail with -EFBIG due to excessive allocation of system block groups. This happened because the test creates a large number of data block groups per transaction and when committing the transaction we start the writeout of the block group caches for all the new new (dirty) block groups, which results in pre-allocating space for each block group's free space cache using the same transaction handle. That in turn often leads to creation of more block groups, and all get attached to the new_bgs list of the same transaction handle to the point of getting a list with over 1500 elements, and creation of new block groups leads to the need of reserving space in the chunk block reserve and often creating a new system block group too. So that made us quickly exhaust the chunk block reserve/system space info, because as of the commit mentioned before, we do reserve space for each new block group in the chunk block reserve, unlike before where we would not and would at most allocate one new system block group and therefore would only ensure that there was enough space in the system space info to allocate 1 new block group even if we ended up allocating thousands of new block groups using the same transaction handle. That worked most of the time because the computed required space at check_system_chunk() is very pessimistic (assumes a chunk tree height of BTRFS_MAX_LEVEL/8 and that all nodes/leafs in a path will be COWed and split) and since the updates to the chunk tree all happen at btrfs_create_pending_block_groups it is unlikely that a path needs to be COWed more than once (unless writepages() for the btree inode is called by mm in between) and that compensated for the need of creating any new nodes/leads in the chunk tree. So fix this by ensuring we don't accumulate a too large list of new block groups in a transaction's handles new_bgs list, inserting/updating the chunk tree for all accumulated new block groups and releasing the unused space from the chunk block reserve whenever the list becomes sufficiently large. This is a generic solution even though the problem currently can only happen when starting the writeout of the free space caches for all dirty block groups (btrfs_start_dirty_block_groups()). Reported-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chris Mason <clm@fb.com>

@offset

…able The existing code stores the amount of memory allocated for a TCE table. At the moment it uses @offset which is a virtual offset in the TCE table which is only correct for a one level tables and it does not include memory allocated for intermediate levels. When multilevel TCE table is requested, WARN_ON in tce_iommu_create_table() prints a warning. This adds an additional counter to pnv_pci_ioda2_table_do_alloc_pages() to count actually allocated memory. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

…nconsistencies raid1_end_read_request() assumes that the In_sync bits are consistent with the ->degaded count. raid1_spare_active updates the In_sync bit before the ->degraded count and so exposes an inconsistency, as does error() So extend the spinlock in raid1_spare_active() and error() to hide those inconsistencies. This should probably be part of Commit: 34cab6f ("md/raid1: fix test for 'was read error from last working device'.") as it addresses the same issue. It fixes the same bug and should go to -stable for same reasons. Fixes: 7607305 ("md/raid1: clean up read_balance.") Cc: stable@vger.kernel.org (v3.0+) Signed-off-by: NeilBrown <neilb@suse.com>

In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a mdu_bitmap_file_t called "file". 5769 file = kmalloc(sizeof(*file), GFP_NOIO); 5770 if (!file) 5771 return -ENOMEM; This structure is copied to user space at the end of the function. 5786 if (err == 0 && 5787 copy_to_user(arg, file, sizeof(*file))) 5788 err = -EFAULT But if bitmap is disabled only the first byte of "file" is initialized with zero, so it's possible to read some bytes (up to 4095) of kernel space memory from user space. This is an information leak. 5775 /* bitmap disabled, zero the first byte and copy out */ 5776 if (!mddev->bitmap_info.file) 5777 file->pathname[0] = '\0'; Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr> Signed-off-by: NeilBrown <neilb@suse.com>

I have a report of drop_one_stripe() called from raid5_cache_scan() apparently finding ->max_nr_stripes == 0. This should not be allowed. So add a test to keep max_nr_stripes above min_nr_stripes. Also use a 'mask' rather than a 'mod' in drop_one_stripe to ensure 'hash' is valid even if max_nr_stripes does reach zero. Fixes: edbe83a ("md/raid5: allow the stripe_cache to grow and shrink.") Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569) Reported-by: Tomas Papan <tomas.papan@gmail.com> Signed-off-by: NeilBrown <neilb@suse.com>

Without this patch YAMA will not work at all if it is chosen as the primary LSM instead of being "stacked". Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>

…nel/git/rusty/linux Pull module fix from Rusty Russell: "Single overzealous locking assertion fix" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: weaken locking assertion for oops path.

Pull crypto fixes from Herbert Xu: "This fixes the following issues: - a bogus BUG_ON in ixp4xx that can be triggered by a dst buffer that is an SG list. - the error handling in hwrngd may cause a crash in case of an error. - fix a race condition in qat registration when multiple devices are present" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: core - correct error check of kthread_run call crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer crypto: qat - Fix invalid synchronization between register/unregister sym algs

…/git/jmorris/linux-security Pull security layer fix from James Morris: "Yama initialization fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Adding YAMA hooks also when YAMA is not stacked.

…/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "There are two critical regression fixes for CephFS from Zheng, and an RBD completion fix for layered images from Ilya" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix copyup completion race ceph: always re-send cap flushes when MDS recovers ceph: fix ceph_encode_locks_to_buffer()

…kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "A refcounting bugfix for the i2c-core, bugfixes for the generic bus recovery algorithm and for its omap-user, making binary file attributes for EEPROMs behave POSIX compliant, and a small typo fix while we are here" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: fix leaked device refcount on of_find_i2c_* error path i2c: Fix typo in i2c-bfin-twi.c i2c: omap: fix bus recovery setup i2c: core: only use set_scl for bus recovery after calling prepare_recovery misc: eeprom: at24: clean up at24_bin_write() i2c: slave eeprom: clean up sysfs bin attribute read()/write()

With legacy helpers all the routing was already set up when calling best_encoder and so could be inspected. But with atomic it's staged, hence we need a new atomic compliant callback for drivers which need to inspect the requested state and can't just decided the best encoder statically. This is needed to fix up i915 dp mst where we need to pick the right encoder depending upon the requested CRTC for the connector. v2: Don't forget to amend the kerneldoc Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

In commit 8c7b5cc Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Date: Tue Apr 21 17:13:19 2015 +0300 drm/i915: Use atomic helpers for computing changed flags we've switched over to the atomic version to compute the crtc->encoder->connector routing from the i915 variant. That one relies upon the ->best_encoder callback, but the i915-private version relied upon intel_find_encoder. Which didn't matter except for dp mst, where the encoder depends upon the selected crtc. Fix this functional bug by implemented a correct atomic-state based encoder selector for dp mst. Note that we can't get rid of the legacy best_encoder callback since the fbdev emulation uses that still. That means it's incorrect there still, but that's been the case ever since i915 dp mst support was merged so not a regression. Best to fix that by converting fbdev over to atomic too. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Apparently been in there since forever and fairly easy to hit when hotplugging really fast. I can do that since my mst hub has a manual button to flick the hpd line for reprobing. The resulting WARNING spam isn't pretty. Cc: Dave Airlie <airlied@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

We've had a few issues with atomic where subtle bugs in the encoder picking logic lead to accidental self-stealing of the encoder, resulting in a NULL connector_state->crtc in update_connector_routing and subsequent. Linus applied some duct-tape for an mst regression in commit 27667f4 Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Wed Jul 29 22:18:16 2015 -0700 i915: temporary fix for DP MST docking station NULL pointer dereference But that was incomplete (the code will still oops when debuggin is enabled) and mangled the state even further. So instead WARN and bail out as the more future-proof option. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

…git/mmarek/kbuild Pull kbuild fixes from Michal Marek: "Two fixes for kbuild: - The new ARCH_{CPP,A,C}FLAGS variables are reset before including the arch Makefile - Fix calling make modules_install twice when module compression is enabled" * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: Makefile: Force gzip and xz on module install kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment

An event channel bound to a CPU that was offlined may still be linked on that CPU's queue. If this event channel is closed and reused, subsequent events will be lost because the event channel is never unlinked and thus cannot be linked onto the correct queue. When a channel is closed and the event is still linked into a queue, ensure that it is unlinked before completing. If the CPU to which the event channel bound is online, spin until the event is handled by that CPU. If that CPU is offline, it can't handle the event, so clear the event queue during the close, dropping the events. This fixes the missing interrupts (and subsequent disk stalls etc.) when offlining a CPU. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

…ux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - don't lose interrupts when offlining CPUs - fix gntdev oops during unmap - drop the balloon lock occasionally to allow domain create/destroy * tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events/fifo: Handle linked events when closing a port xen: release lock occasionally during ballooning xen/gntdevt: Fix race condition in gntdev_release()

…rg/drm-intel Pull drm mst fixes from Daniel Vetter: "Special pull request for mst fixes since most of the patches touch code outside of i915 proper. DRM parts have also been reviewed by Thierry (nvidia) since Dave's enjoying vacations" * tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel: drm/atomic-helpers: Make encoder picking more robust drm/dp-mst: Remove debug WARN_ON drm/i915: Fixup dp mst encoder selection drm/atomic-helper: Add an atomice best_encoder callback

…rnel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "This is a trivial fix for a change that broke user program compilation (QEMU in this case)" * tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition

Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 #3 io_schedule_timeout at ffffffff815aad6a #4 bit_wait_io at ffffffff815abfc6 #5 __wait_on_bit at ffffffff815abda5 #6 wait_on_page_bit at ffffffff8111fd4f #7 shrink_page_list at ffffffff81135445 #8 shrink_inactive_list at ffffffff81135845 #9 shrink_lruvec at ffffffff81135ead #10 shrink_zone at ffffffff811360c3 #11 shrink_zones at ffffffff81136eff #12 do_try_to_free_pages at ffffffff8113712f #13 try_to_free_mem_cgroup_pages at ffffffff811372be #14 try_charge at ffffffff81189423 #15 mem_cgroup_try_charge at ffffffff8118c6f5 #16 __add_to_page_cache_locked at ffffffff8112137d #17 add_to_page_cache_lru at ffffffff81121618 #18 pagecache_get_page at ffffffff8112170b #19 grow_dev_page at ffffffff811c8297 #20 __getblk_slow at ffffffff811c91d6 #21 __getblk_gfp at ffffffff811c92c1 #22 ext4_ext_grow_indepth at ffffffff8124565c #23 ext4_ext_create_new_leaf at ffffffff81246ca8 #24 ext4_ext_insert_extent at ffffffff81246f09 #25 ext4_ext_map_blocks at ffffffff8124a848 #26 ext4_map_blocks at ffffffff8121a5b7 #27 mpage_map_one_extent at ffffffff8121b1fa #28 mpage_map_and_submit_extent at ffffffff8121f07b #29 ext4_writepages at ffffffff8121f6d5 #30 do_writepages at ffffffff8112c490 #31 __filemap_fdatawrite_range at ffffffff81120199 #32 filemap_flush at ffffffff8112041c #33 ext4_alloc_da_blocks at ffffffff81219da1 #34 ext4_rename at ffffffff81229b91 #35 ext4_rename2 at ffffffff81229e32 #36 vfs_rename at ffffffff811a08a5 #37 SYSC_renameat2 at ffffffff811a3ffc #38 sys_renameat2 at ffffffff811a408e #39 sys_rename at ffffffff8119e51e #40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: stable@vger.kernel.org # 3.9+ [tytso@mit.edu: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Pull nfsd fixes from Bruce Fields. * 'for-4.2' of git://linux-nfs.org/~bfields/linux: nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid nfsd: Fix a file leak on nfsd4_layout_setlease failure nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem

Pull md fixes from Neil Brown: "Three more fixes for md in 4.2 Mostly corner-case stuff. One of these patches is for a CVE: CVE-2015-5697 I'm not convinced it is serious (data leak from CAP_SYS_ADMIN ioctl) but as people seem to want to back-port it, I've included a minimal version here. The remainder of that patch from Benjamin is code-cleanup and will arrive in the 4.3 merge window" * tag 'md/4.2-rc5-fixes' of git://neil.brown.name/md: md/raid5: don't let shrink_slab shrink too far. md: use kzalloc() when bitmap is disabled md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

Sync up with Linus

fdo#70354 - comment #88. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

There is at least one Chelsio 10Gb card which uses VPD area to store some non-standard blocks (example below). However pci_vpd_size() returns the length of the first block only assuming that there can be only one VPD "End Tag". Since 4e1a635 ("vfio/pci: Use kernel VPD access functions"), VFIO blocks access beyond that offset, which prevents the guest "cxgb3" driver from probing the device. The host system does not have this problem as its driver accesses the config space directly without pci_read_vpd(). Add a quirk to override the VPD size to a bigger value. The maximum size is taken from EEPROMSIZE in drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag as the cxgb3 driver does as the driver supports writing to EEPROM/VPD and when it writes, it only checks for 8192 bytes boundary. The quirk is registered for all devices supported by the cxgb3 driver. This adds a quirk to the PCI layer (not to the cxgb3 driver) as the cxgb3 driver itself accesses VPD directly and the problem only exists with the vfio-pci driver (when cxgb3 is not running on the host and may not be even loaded) which blocks accesses beyond the first block of VPD data. However vfio-pci itself does not have quirks mechanism so we add it to PCI. This is the controller: Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port Adapter [1425:0030] This is what I parsed from its VPD: === b'\x82*\x0010 Gigabit Ethernet-SR PCI Express Adapter\x90J\x00EC\x07D76809 FN\x0746K' 0000 Large item 42 bytes; name 0x2 Identifier String b'10 Gigabit Ethernet-SR PCI Express Adapter' 002d Large item 74 bytes; name 0x10 #00 [EC] len=7: b'D76809 ' #0a [FN] len=7: b'46K7897' #14 [PN] len=7: b'46K7897' #1e [MN] len=4: b'1037' #25 [FC] len=4: b'5769' #2c [SN] len=12: b'YL102035603V' #3b [NA] len=12: b'00145E992ED1' 007a Small item 1 bytes; name 0xf End Tag 0c00 Large item 16 bytes; name 0x2 Identifier String b'S310E-SR-X ' 0c13 Large item 234 bytes; name 0x10 #00 [PN] len=16: b'TBD ' #13 [EC] len=16: b'110107730D2 ' #26 [SN] len=16: b'97YL102035603V ' #39 [NA] len=12: b'00145E992ED1' #48 [V0] len=6: b'175000' #51 [V1] len=6: b'266666' #5a [V2] len=6: b'266666' #63 [V3] len=6: b'2000 ' #6c [V4] len=2: b'1 ' #71 [V5] len=6: b'c2 ' #7a [V6] len=6: b'0 ' #83 [V7] len=2: b'1 ' #88 [V8] len=2: b'0 ' #8d [V9] len=2: b'0 ' #92 [VA] len=2: b'0 ' #97 [RV] len=80: b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'... 0d00 Large item 252 bytes; name 0x11 #00 [VC] len=16: b'122310_1222 dp ' #13 [VD] len=16: b'610-0001-00 H1\x00\x00' #26 [VE] len=16: b'122310_1353 fp ' #39 [VF] len=16: b'610-0001-00 H1\x00\x00' #4c [RW] len=173: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'... 0dff Small item 0 bytes; name 0xf End Tag 10f3 Large item 13315 bytes; name 0x62 !!! unknown item name 98: b'\xd0\x03\x00@`\x0c\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00' === Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

sysbot/KMSAN reported an uninit-value in recvmsg() that I tracked down to tipc_sk_set_orig_addr(), missing srcaddr->member.scope initialization. This patches moves srcaddr->sock.scope init to follow fields order and ease future verifications. BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline] BUG: KMSAN: uninit-value in move_addr_to_user+0x32e/0x530 net/socket.c:226 CPU: 0 PID: 4549 Comm: syz-executor287 Not tainted 4.17.0-rc3+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 kmsan_internal_check_memory+0x135/0x1e0 mm/kmsan/kmsan.c:1157 kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199 copy_to_user include/linux/uaccess.h:184 [inline] move_addr_to_user+0x32e/0x530 net/socket.c:226 ___sys_recvmsg+0x4e2/0x810 net/socket.c:2285 __sys_recvmsg net/socket.c:2328 [inline] __do_sys_recvmsg net/socket.c:2338 [inline] __se_sys_recvmsg net/socket.c:2335 [inline] __x64_sys_recvmsg+0x325/0x460 net/socket.c:2335 do_syscall_64+0x154/0x220 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4455e9 RSP: 002b:00007fe3bd36ddb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 00000000004455e9 RDX: 0000000000002002 RSI: 0000000020000400 RDI: 0000000000000003 RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff98ce4b6f R14: 00007fe3bd36e9c0 R15: 0000000000000003 Local variable description: ----addr@___sys_recvmsg Variable was created at: ___sys_recvmsg+0xd5/0x810 net/socket.c:2246 __sys_recvmsg net/socket.c:2328 [inline] __do_sys_recvmsg net/socket.c:2338 [inline] __se_sys_recvmsg net/socket.c:2335 [inline] __x64_sys_recvmsg+0x325/0x460 net/socket.c:2335 Byte 19 of 32 is uninitialized Fixes: 31c82a2 ("tipc: add second source address to recvmsg()/recvfrom()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>

brd_free() may be called in failure path on one brd instance which disk isn't added yet, so release handler of gendisk may free the associated request_queue early and causes the following use-after-free[1]. This patch fixes this issue by associating gendisk with request_queue just before adding disk. [1] KASAN: use-after-free Read in del_timer_syncNon-volatile memory driver v1.3 Linux agpgart interface v0.103 [drm] Initialized vgem 1.0.0 20120112 for virtual device on minor 0 usbcore: registered new interface driver udl ================================================================== BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 Read of size 8 at addr ffff8801d1b6b540 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 del_timer_sync+0xb7/0x270 kernel/time/timer.c:1283 blk_cleanup_queue+0x413/0x710 block/blk-core.c:809 brd_free+0x5d/0x71 drivers/block/brd.c:422 brd_init+0x2eb/0x393 drivers/block/brd.c:518 do_one_initcall+0x145/0x957 init/main.c:890 do_initcall_level init/main.c:958 [inline] do_initcalls init/main.c:966 [inline] do_basic_setup init/main.c:984 [inline] kernel_init_freeable+0x5c6/0x6b9 init/main.c:1148 kernel_init+0x11/0x1ae init/main.c:1068 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:350 Reported-by: syzbot+3701447012fe951dabb2@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

When vb2_buffer_done is called the buffer is unbound from the request and put. The media_request_object_put also 'put's the request reference. If the application has already closed the request fd, then that means that the request reference at that point goes to 0 and the whole request is released. This means that the control handler associated with the request is also freed and that causes this kernel oops: [174705.995401] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 [174705.995411] in_atomic(): 1, irqs_disabled(): 1, pid: 28071, name: vivid-000-vid-o [174705.995416] 2 locks held by vivid-000-vid-o/28071: [174705.995420] #0: 000000001ea3a232 (&dev->mutex#3){....}, at: vivid_thread_vid_out+0x3f5/0x550 [vivid] [174705.995447] #1: 00000000e30a0d1e (&(&q->done_lock)->rlock){....}, at: vb2_buffer_done+0x92/0x1d0 [videobuf2_common] [174705.995460] Preemption disabled at: [174705.995461] [<0000000000000000>] (null) [174705.995472] CPU: 11 PID: 28071 Comm: vivid-000-vid-o Tainted: G W 4.20.0-rc1-test-no #88 [174705.995476] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [174705.995481] Call Trace: [174705.995500] dump_stack+0x46/0x60 [174705.995512] ___might_sleep.cold.79+0xe1/0xf1 [174705.995523] __mutex_lock+0x50/0x8f0 [174705.995531] ? find_held_lock+0x2d/0x90 [174705.995536] ? find_held_lock+0x2d/0x90 [174705.995542] ? find_held_lock+0x2d/0x90 [174705.995564] ? v4l2_ctrl_handler_free.part.13+0x44/0x1d0 [videodev] [174705.995576] v4l2_ctrl_handler_free.part.13+0x44/0x1d0 [videodev] [174705.995590] v4l2_ctrl_request_release+0x1c/0x30 [videodev] [174705.995600] media_request_clean+0x64/0xe0 [media] [174705.995609] media_request_release+0x19/0x40 [media] [174705.995617] vb2_buffer_done+0xef/0x1d0 [videobuf2_common] [174705.995630] vivid_thread_vid_out+0x2c1/0x550 [vivid] [174705.995645] ? vivid_stop_generating_vid_cap+0x1c0/0x1c0 [vivid] [174705.995653] kthread+0x113/0x130 [174705.995659] ? kthread_park+0x80/0x80 [174705.995667] ret_from_fork+0x35/0x40 The vb2_buffer_done function can be called from interrupt context, so anything that sleeps is not allowed. The solution is to increment the request refcount when the buffer is queued and decrement it when the buffer is dequeued. Releasing the request is fine if that happens from VIDIOC_DQBUF. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

syzbot reported: BUG: KMSAN: uninit-value in tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373 CPU: 0 PID: 66 Comm: kworker/u4:4 Not tainted 4.17.0-rc3+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: tipc_rcv tipc_conn_recv_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683 tipc_conn_rcv_sub+0x184/0x950 net/tipc/topsrv.c:373 tipc_conn_rcv_from_sock net/tipc/topsrv.c:409 [inline] tipc_conn_recv_work+0x3cd/0x560 net/tipc/topsrv.c:424 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279 kthread+0x539/0x720 kernel/kthread.c:239 ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412 Local variable description: ----s.i@tipc_conn_recv_work Variable was created at: tipc_conn_recv_work+0x65/0x560 net/tipc/topsrv.c:419 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145 In tipc_conn_rcv_from_sock(), it always supposes the length of message received from sock_recvmsg() is not smaller than the size of struct tipc_subscr. However, this assumption is false. Especially when the length of received message is shorter than struct tipc_subscr size, we will end up touching uninitialized fields in tipc_conn_rcv_sub(). Reported-by: syzbot+8951a3065ee7fd6d6e23@syzkaller.appspotmail.com Reported-by: syzbot+75e6e042c5bbf691fc82@syzkaller.appspotmail.com Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>

syzbot was able to trigger another soft lockup [1] I first thought it was the O(N^2) issue I mentioned in my prior fix (f657d22ee1f "net/x25: do not hold the cpu too long in x25_new_lci()"), but I eventually found that x25_bind() was not checking SOCK_ZAPPED state under socket lock protection. This means that multiple threads can end up calling x25_insert_socket() for the same socket, and corrupt x25_list [1] watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.2:10492] Modules linked in: irq event stamp: 27515 hardirqs last enabled at (27514): [<ffffffff81006673>] trace_hardirqs_on_thunk+0x1a/0x1c hardirqs last disabled at (27515): [<ffffffff8100668f>] trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (32): [<ffffffff8632ee73>] x25_get_neigh+0xa3/0xd0 net/x25/x25_link.c:336 softirqs last disabled at (34): [<ffffffff86324bc3>] x25_find_socket+0x23/0x140 net/x25/af_x25.c:341 CPU: 0 PID: 10492 Comm: syz-executor.2 Not tainted 5.0.0-rc7+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x50 kernel/kcov.c:97 Code: f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00 00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 <48> 8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c 92 7e 81 e2 RSP: 0018:ffff88806e94fc48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffff1100d84dac5 RBX: 0000000000000001 RCX: ffffc90006197000 RDX: 0000000000040000 RSI: ffffffff86324bf3 RDI: ffff88806c26d628 RBP: ffff88806e94fc48 R08: ffff88806c1c6500 R09: fffffbfff1282561 R10: fffffbfff1282560 R11: ffffffff89412b03 R12: ffff88806c26d628 R13: ffff888090455200 R14: dffffc0000000000 R15: 0000000000000000 FS: 00007f3a107e4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3a107e3db8 CR3: 00000000a5544000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __x25_find_socket net/x25/af_x25.c:327 [inline] x25_find_socket+0x7d/0x140 net/x25/af_x25.c:342 x25_new_lci net/x25/af_x25.c:355 [inline] x25_connect+0x380/0xde0 net/x25/af_x25.c:784 __sys_connect+0x266/0x330 net/socket.c:1662 __do_sys_connect net/socket.c:1673 [inline] __se_sys_connect net/socket.c:1670 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1670 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457e29 Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f3a107e3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29 RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000005 RBP: 000000000073c040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a107e46d4 R13: 00000000004be362 R14: 00000000004ceb98 R15: 00000000ffffffff Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 10493 Comm: syz-executor.3 Not tainted 5.0.0-rc7+ #88 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline] RIP: 0010:queued_write_lock_slowpath+0x143/0x290 kernel/locking/qrwlock.c:86 Code: 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 41 0f b6 55 00 <41> 38 d7 7c eb 84 d2 74 e7 48 89 df e8 cc aa 4e 00 eb dd be 04 00 RSP: 0018:ffff888085c47bd8 EFLAGS: 00000206 RAX: 0000000000000300 RBX: ffffffff89412b00 RCX: 1ffffffff1282560 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89412b00 RBP: ffff888085c47c70 R08: 1ffffffff1282560 R09: fffffbfff1282561 R10: fffffbfff1282560 R11: ffffffff89412b03 R12: 00000000000000ff R13: fffffbfff1282560 R14: 1ffff11010b88f7d R15: 0000000000000003 FS: 00007fdd04086700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdd04064db8 CR3: 0000000090be0000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: queued_write_lock include/asm-generic/qrwlock.h:104 [inline] do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203 __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline] _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312 x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267 x25_bind+0x273/0x340 net/x25/af_x25.c:703 __sys_bind+0x23f/0x290 net/socket.c:1481 __do_sys_bind net/socket.c:1492 [inline] __se_sys_bind net/socket.c:1490 [inline] __x64_sys_bind+0x73/0xb0 net/socket.c:1490 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457e29 Fixes: 90c2729 ("X.25 remove bkl in bind") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: andrew hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Alex Gartrell and others added 30 commits July 14, 2015 16:41

ipvs: fix crash with sync protocol v0 and FTP

5618485

Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: 749c42b ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: call skb_sender_cpu_clear

e3895c0

Reset XPS's sender_cpu on forwarding. Signed-off-by: Julian Anastasov <ja@ssi.bg> Fixes: 2bd8248 ("xps: fix xps for stacked devices") Signed-off-by: Simon Horman <horms@verge.net.au>

nfsd: Fix a file leak on nfsd4_layout_setlease failure

1ca4b88

If nfsd4_layout_setlease fails, nfsd will not put ls->ls_file. Fix commit c5c707f "nfsd: implement pNFS layout recalls". Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>

NeilBrown and others added 21 commits August 3, 2015 12:29

dabrace added a commit that referenced this pull request Aug 5, 2015

Merge pull request #88 from torvalds/master

eef8ff9

Sync up with Linus

dabrace merged commit eef8ff9 into dabrace:master Aug 5, 2015

dabrace pushed a commit that referenced this pull request Dec 2, 2015

drm/nouveau/pci: enable c800 magic for some unknown Samsung laptop

c294a05

fdo#70354 - comment #88. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync up with Linus #88

Sync up with Linus #88

dabrace commented Aug 5, 2015

Sync up with Linus #88

Sync up with Linus #88

Conversation

dabrace commented Aug 5, 2015