Uwe Kleine-König [Fri, 28 Oct 2016 00:47:10 +0000 (17:47 -0700)]
cris/arch-v32: cryptocop: print a hex number after a 0x prefix
It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.
Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Fri, 28 Oct 2016 00:47:07 +0000 (17:47 -0700)]
ipack: print a hex number after a 0x prefix
It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.
Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Jens Taprogge <jens.taprogge@taprogge.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Fri, 28 Oct 2016 00:47:04 +0000 (17:47 -0700)]
block: DAC960: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.
Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Fri, 28 Oct 2016 00:47:02 +0000 (17:47 -0700)]
fs: exofs: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x. So change to a hex number.
Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Mentz [Fri, 28 Oct 2016 00:46:59 +0000 (17:46 -0700)]
lib/genalloc.c: start search from start of chunk
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.
The shortcut
if (size > atomic_read(&chunk->avail))
continue;
makes the loop skip over chunks that do not have enough bytes left to
fulfill the request. There are two situations, though, where an
allocation might still fail:
(1) The available memory is not contiguous, i.e. the request cannot
be fulfilled due to external fragmentation.
(2) A race condition. Another thread runs the same code concurrently
and is quicker to grab the available memory.
In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed. This return value is then assigned to
start_bit. The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched. Today, the code fails to reset start_bit to 0. As a
result, prefixes of subsequent chunks are ignored. Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.
Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 28 Oct 2016 00:46:56 +0000 (17:46 -0700)]
mm: memcontrol: do not recurse in direct reclaim
On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:
[...]
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
memcg_charge_kmem+0x40/0x80
new_slab+0x2d9/0x5a0
__slab_alloc+0x2fd/0x44f
kmem_cache_alloc+0x193/0x1e0
alloc_extent_state+0x21/0xc0
__clear_extent_bit+0x2b5/0x400
try_release_extent_mapping+0x1a3/0x220
__btrfs_releasepage+0x31/0x70
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
mem_cgroup_try_charge+0x65/0x1c0
handle_mm_fault+0x117f/0x1510
__do_page_fault+0x177/0x420
do_page_fault+0xc/0x10
page_fault+0x22/0x30
On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.
But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.
Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim. Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.
Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Kepplinger [Fri, 28 Oct 2016 00:46:53 +0000 (17:46 -0700)]
CREDITS: update credit information for Martin Kepplinger
Content and employer changed.
Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.com
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Leon Yu [Fri, 28 Oct 2016 00:46:50 +0000 (17:46 -0700)]
proc: fix NULL dereference when reading /proc/<pid>/auxv
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL. Fix that by checking for NULL mm
and bailing out early. This is also the original behavior changed by
recent commit
c5317167854e ("proc: switch auxv to use of __mem_open()").
# cat /proc/2/auxv
Unable to handle kernel NULL pointer dereference at virtual address
000000a8
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
Hardware name: BCM2709
task:
ea3b0b00 task.stack:
e99b2000
PC is at auxv_read+0x24/0x4c
LR is at do_readv_writev+0x2fc/0x37c
Process cat (pid: 113, stack limit = 0xe99b2210)
Call chain:
auxv_read
do_readv_writev
vfs_readv
default_file_splice_read
splice_direct_to_actor
do_splice_direct
do_sendfile
SyS_sendfile64
ret_fast_syscall
Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Janis Danisevskis <jdanis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catalin Marinas [Fri, 28 Oct 2016 00:46:47 +0000 (17:46 -0700)]
mm: kmemleak: ensure that the task stack is not freed during scanning
Commit
68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
during kmemleak_scan() execution, leading to either a NULL pointer fault
(if task->stack is NULL) or kmemleak accessing already freed memory.
This patch uses the new try_get_task_stack() API to ensure that the task
stack is not freed during kmemleak stack scanning.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901.
Fixes: 68f24b08ee89 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK")
Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: CAI Qian <caiqian@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Vyukov [Fri, 28 Oct 2016 00:46:44 +0000 (17:46 -0700)]
lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level). Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks. This capacity was chosen empirically and it
is enough to run kernel normally.
However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes. I've
tested for long a time with number of top level entries bumped 4x
(4096). And I think I've seen overflow only once. But I don't have all
configs enabled and code coverage has not reached maximum yet. So bump
it 8x to 8192.
Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.
Here is some approx math.
128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of
alloc/free call sites are reachable with only one stack. But some
utility functions can have large fanout. Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.
Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Fri, 28 Oct 2016 00:46:41 +0000 (17:46 -0700)]
latent_entropy: raise CONFIG_FRAME_WARN by default
When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.
Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 28 Oct 2016 00:46:38 +0000 (17:46 -0700)]
kconfig.h: remove config_enabled() macro
The use of config_enabled() is ambiguous. For config options,
IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.
I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.
Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:
- arch/x86/mm/kaslr.c
replace config_enabled() with IS_ENABLED() because
CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
- include/asm-generic/export.h
replace config_enabled() with __is_defined().
Then, config_enabled() can be removed now.
Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.
Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aristeu Rozanski [Fri, 28 Oct 2016 00:46:35 +0000 (17:46 -0700)]
ipc: account for kmem usage on mqueue and msg
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.
The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster. This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e. accounted
for.
Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Reported-by: Stefan Schimanski <sttts@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Stefan Schimanski <sttts@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aruna Ramakrishna [Fri, 28 Oct 2016 00:46:32 +0000 (17:46 -0700)]
mm/slab: improve performance of gathering slabinfo stats
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds. During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).
This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache. This counter is
updated when a slab is created or destroyed. This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement. Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.
We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.
Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 28 Oct 2016 00:46:29 +0000 (17:46 -0700)]
mm: page_alloc: use KERN_CONT where appropriate
Recent changes to printk require KERN_CONT uses to continue logging
messages. So add KERN_CONT where necessary.
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4bcc595ccd80 ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexander Polakov [Fri, 28 Oct 2016 00:46:27 +0000 (17:46 -0700)]
mm/list_lru.c: avoid error-path NULL pointer deref
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:
After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().
And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexander Polakov <apolyakov@beget.ru>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Rutland [Fri, 28 Oct 2016 00:46:24 +0000 (17:46 -0700)]
h8300: fix syscall restarting
Back in commit
f56141e3e2d9 ("all arches, signal: move restart_block to
struct task_struct"), all architectures and core code were changed to
use task_struct::restart_block. However, when h8300 support was
subsequently restored in v4.2, it was not updated to account for this,
and maintains thread_info::restart_block, which is not kept in sync.
This patch drops the redundant restart_block from thread_info, and moves
h8300 to the common one in task_struct, ensuring that syscall restarting
always works as expected.
Fixes: f56141e3e2d9 ("all arches, signal: move restart_block to struct task_struct")
Link: http://lkml.kernel.org/r/1476714934-11635-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Konovalov [Fri, 28 Oct 2016 00:46:21 +0000 (17:46 -0700)]
kcov: properly check if we are in an interrupt
in_interrupt() returns a nonzero value when we are either in an
interrupt or have bh disabled via local_bh_disable(). Since we are
interested in only ignoring coverage from actual interrupts, do a proper
check instead of just calling in_interrupt().
As a result of this change, kcov will start to collect coverage from
within local_bh_disable()/local_bh_enable() sections.
Link: http://lkml.kernel.org/r/1476115803-20712-1-git-send-email-andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: James Morse <james.morse@arm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Fri, 28 Oct 2016 00:46:18 +0000 (17:46 -0700)]
mm/slab: fix kmemcg cache creation delayed issue
There is a bug report that SLAB makes extreme load average due to over
2000 kworker thread.
https://bugzilla.kernel.org/show_bug.cgi?id=172981
This issue is caused by kmemcg feature that try to create new set of
kmem_caches for each memcg. Recently, kmem_cache creation is slowed by
synchronize_sched() and futher kmem_cache creation is also delayed since
kmem_cache creation is synchronized by a global slab_mutex lock. So,
the number of kworker that try to create kmem_cache increases quietly.
synchronize_sched() is for lockless access to node's shared array but
it's not needed when a new kmem_cache is created. So, this patch rules
out that case.
Fixes: 801faf0db894 ("mm/slab: lockless decision to grow cache")
Link: http://lkml.kernel.org/r/1475734855-4837-1-git-send-email-iamjoonsoo.kim@lge.com
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 27 Oct 2016 17:08:58 +0000 (10:08 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes: one is a fatal section mismatch (reference to init
after it's discarded) and the other two are iscsi locking fixes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: NCR5380: no longer mark irq probing as __init
scsi: be2iscsi: Replace _bh with _irqsave/irqrestore
scsi: libiscsi: Fix locking in __iscsi_conn_send_pdu
Linus Torvalds [Thu, 27 Oct 2016 17:07:13 +0000 (10:07 -0700)]
Merge branch 'for-4.9-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"The AHCI MSI handling change in rc1 was a bit broken and caused disk
probing failures on some machines. These three patches should fix the
issues"
David Howells comments:
"My test machine fell foul of this using a PCIe M.2-attached SSD card.
The patches fix it for me"
* 'for-4.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: fix the single MSI-X case in ahci_init_one
ahci: fix nvec check
ahci: only try to use multi-MSI mode if there is more than 1 port
Linus Torvalds [Thu, 27 Oct 2016 17:05:31 +0000 (10:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes for this series, most notably the fix for the blk-mq
software queue regression in from this merge window.
Apart from that, a fix for an unlikely hang if a queue is flooded with
FUA requests from Ming, and a few small fixes for nbd and badblocks.
Lastly, a rename update for the proc softirq output, since the block
polling code was made generic"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: update hardware and software queues for sleeping alloc
block: flush: fix IO hang in case of flood fua req
nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
badblocks: badblocks_set/clear update unacked_exist
softirq: Display IRQ_POLL for irq-poll statistics
Linus Torvalds [Wed, 26 Oct 2016 17:15:30 +0000 (10:15 -0700)]
mm: remove per-zone hashtable of bitlock waitqueues
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:
wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().
The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).
It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.
As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.
Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Thu, 27 Oct 2016 15:49:19 +0000 (09:49 -0600)]
blk-mq: update hardware and software queues for sleeping alloc
If we end up sleeping due to running out of requests, we should
update the hardware and software queues in the map ctx structure.
Otherwise we could end up having rq->mq_ctx point to the pre-sleep
context, and risk corrupting ctx->rq_list since we'll be
grabbing the wrong lock when inserting the request.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Chris Mason <clm@fb.com>
Fixes: 63581af3f31e ("blk-mq: remove non-blocking pass in blk_mq_map_request")
Signed-off-by: Jens Axboe <axboe@fb.com>
Ming Lei [Wed, 26 Oct 2016 08:57:15 +0000 (16:57 +0800)]
block: flush: fix IO hang in case of flood fua req
This patch fixes one issue reported by Kent, which can
be triggered in bcachefs over sata disk. Actually it
is a generic issue in block flush vs. blk-tag.
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Christoph Hellwig [Tue, 25 Oct 2016 12:04:34 +0000 (14:04 +0200)]
ahci: fix the single MSI-X case in ahci_init_one
We need to make sure hpriv->irq is set properly if we don't use per-port
vectors, so switch from blindly assigning pdev->irq to using
pci_irq_vector, which handles all interrupt types correctly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Robert Richter <robert.richter@cavium.com>
Tested-by: Robert Richter <robert.richter@cavium.com>
Tested-by: David Daney <ddaney.cavm@gmail.com>
Fixes: 0b9e2988ab22 ("ahci: use pci_alloc_irq_vectors")
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Tue, 25 Oct 2016 04:34:13 +0000 (21:34 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a regression caused by the stack vmalloc change"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: core - Don't use a stack buffer in add_early_randomness()
Linus Torvalds [Tue, 25 Oct 2016 04:30:19 +0000 (21:30 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"This is the first batch of clk driver fixes for this release.
We have a handful of fixes for the uniphier clk driver that was
introduced recently, as well as Kconfig option hiding, module
autoloading markings, and a few fixes for clk_hw based registration
patches that went in this merge window"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: at91: Fix a return value in case of error
clk: uniphier: rename MIO clock to SD clock for Pro5, PXs2, LD20 SoCs
clk: uniphier: fix memory overrun bug
clk: hi6220: use CLK_OF_DECLARE_DRIVER for sysctrl and mediactrl clock init
clk: mvebu: armada-37xx-periph: Fix the clock gate flag
clk: bcm2835: Clamp the PLL's requested rate to the hardware limits.
clk: max77686: fix number of clocks setup for clk_hw based registration
clk: mvebu: armada-37xx-periph: Fix the clock provider registration
clk: core: add __init decoration for CLK_OF_DECLARE_DRIVER function
clk: mediatek: Add hardware dependency
clk: samsung: clk-exynos-audss: Fix module autoload
clk: uniphier: fix type of variable passed to regmap_read()
clk: uniphier: add system clock support for sLD3 SoC
Linus Torvalds [Tue, 25 Oct 2016 04:19:07 +0000 (21:19 -0700)]
Merge tag 'gpio-v4.9-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here is a set of GPIO fixes for the v4.9 kernel series:
- Fix up off-by one and line offset validation, info leak to
userspace, and reject invalid flags. Those are especially valuable
hardening patches from Lars-Peter Clausen, all tagged for stable.
- Fix module autoload for TS4800 and ATH79.
- Correct the IRQ handler for MPC8xxx to use handle_level_irq() as it
(a) reacts to edges not levels and (b) even implements .irq_ack().
We were missing IRQs here.
- Fix the error path for acpi_dev_gpio_irq_get()
- Fix a memory leak in the MXS driver.
- Fix an annoying typo in the STMPE driver.
- Put a dependency on sysfs to the mockup driver"
* tag 'gpio-v4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mpc8xxx: Correct irq handler function
gpio: ath79: Fix module autoload
gpio: ts4800: Fix module autoload
gpio: GPIO_GET_LINEEVENT_IOCTL: Reject invalid line and event flags
gpio: GPIO_GET_LINEHANDLE_IOCTL: Reject invalid line flags
gpio: GPIOHANDLE_GET_LINE_VALUES_IOCTL: Fix information leak
gpio: GPIO_GET_LINEEVENT_IOCTL: Validate line offset
gpio: GPIOHANDLE_GET_LINE_VALUES_IOCTL: Fix information leak
gpio: GPIO_GET_LINEHANDLE_IOCTL: Validate line offset
gpio: GPIO_GET_CHIPINFO_IOCTL: Fix information leak
gpio: GPIO_GET_CHIPINFO_IOCTL: Fix line offset validation
gpio / ACPI: fix returned error from acpi_dev_gpio_irq_get()
gpio: mockup: add sysfs dependency
gpio: stmpe: || vs && typo
gpio: mxs: Unmap region obtained by of_iomap
gpio/board.txt: point to gpiod_set_value
Linus Torvalds [Tue, 25 Oct 2016 02:52:24 +0000 (19:52 -0700)]
Merge tag 'for-linus-4.9-rc2-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from David Vrabel:
- advertise control feature flags in xenstore
- fix x86 build when XEN_PVHVM is disabled
* tag 'for-linus-4.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: check return value of xenbus_scanf()
xenbus: prefer list_for_each()
x86: xen: move cpu_up functions out of ifdef
xenbus: advertise control feature flags
Lorenzo Stoakes [Mon, 24 Oct 2016 09:57:25 +0000 (10:57 +0100)]
mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.
Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.
We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:
get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 25 Oct 2016 02:00:44 +0000 (19:00 -0700)]
proc: don't use FOLL_FORCE for reading cmdline and environment
Now that Lorenzo cleaned things up and made the FOLL_FORCE users
explicit, it becomes obvious how some of them don't really need
FOLL_FORCE at all.
So remove FOLL_FORCE from the proc code that reads the command line and
arguments from user space.
The mem_rw() function actually does want FOLL_FORCE, because gdd (and
possibly many other debuggers) use it as a much more convenient version
of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
conditional on actually being a ptracer. This does not actually do
that, just moves adds a comment to that effect and moves the gup_flags
settings next to each other.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
John W. Linville [Mon, 24 Oct 2016 19:13:25 +0000 (15:13 -0400)]
nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdown
Commit
0eadf37afc250 ("nbd: allow block mq to deal with timeouts")
changed normal usage of nbd->sock_lock to use spin_lock/spin_unlock
rather than the *_irq variants, but it missed this unlock in an
error path.
Found by Coverity, CID
1373871.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Fixes: 0eadf37afc250 ("nbd: allow block mq to deal with timeouts")
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Beulich [Mon, 24 Oct 2016 15:05:18 +0000 (09:05 -0600)]
xenbus: check return value of xenbus_scanf()
Don't ignore errors here: Set backend state to unknown when
unsuccessful.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Jan Beulich [Mon, 24 Oct 2016 15:03:49 +0000 (09:03 -0600)]
xenbus: prefer list_for_each()
This is more efficient than list_for_each_safe() when list modification
is accompanied by breaking out of the loop.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Arnd Bergmann [Wed, 12 Oct 2016 15:20:38 +0000 (17:20 +0200)]
x86: xen: move cpu_up functions out of ifdef
Three newly introduced functions are not defined when CONFIG_XEN_PVHVM is
disabled, but are still being used:
arch/x86/xen/enlighten.c:141:12: warning: ‘xen_cpu_up_prepare’ used but never defined
arch/x86/xen/enlighten.c:142:12: warning: ‘xen_cpu_up_online’ used but never defined
arch/x86/xen/enlighten.c:143:12: warning: ‘xen_cpu_dead’ used but never defined
Fixes: 4d737042d6c4 ("xen/x86: Convert to hotplug state machine")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Tue, 11 Oct 2016 11:34:16 +0000 (13:34 +0200)]
xenbus: advertise control feature flags
The Xen docs specify several flags which a guest can set to advertise
which values of the xenstore control/shutdown key it will recognize.
This patch adds code to write all the relevant feature-flag keys.
Based-on-patch-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Liu Gang [Fri, 21 Oct 2016 07:31:28 +0000 (15:31 +0800)]
gpio: mpc8xxx: Correct irq handler function
From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq"
has being used to handle GPIO interrupts in the PowerPC/Layerscape
platforms. But actually, almost all PowerPC/Layerscape platforms
assert an interrupt request upon either a high-to-low change or
any change on the state of the signal.
So the "handle_level_irq" is not reasonable for PowerPC/Layerscape
GPIO interrupt, it should be "handle_edge_irq". Otherwise the system
may lost some interrupts from the PIN's state changes.
Signed-off-by: Liu Gang <Gang.Liu@nxp.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Mon, 24 Oct 2016 00:10:14 +0000 (17:10 -0700)]
Linux 4.9-rc2
Linus Torvalds [Sun, 23 Oct 2016 23:58:55 +0000 (16:58 -0700)]
Merge tag 'upstream-4.9-rc2' of git://git.infradead.org/linux-ubifs
Pull UBI[FS] fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:
- Fallout from the merge window, refactoring UBI code introduced some
issues.
- Fixes for an UBIFS readdir bug which can cause getdents() to busy
loop for ever and a bug in the UBIFS xattr code"
* tag 'upstream-4.9-rc2' of git://git.infradead.org/linux-ubifs:
ubifs: Abort readdir upon error
UBI: Fix crash in try_recover_peb()
ubi: fix swapped arguments to call to ubi_alloc_aeb
ubifs: Fix xattr_names length in exit paths
ubifs: Rename ubifs_rename2
Linus Torvalds [Sun, 23 Oct 2016 23:52:19 +0000 (16:52 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A few bug fixes and add some missing KERN_CONT annotations"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: add missing KERN_CONT to a few more debugging uses
fscrypto: lock inode while setting encryption policy
ext4: correct endianness conversion in __xattr_check_inode()
fscrypto: make XTS tweak initialization endian-independent
ext4: do not advertise encryption support when disabled
jbd2: fix incorrect unlock on j_list_lock
ext4: super.c: Update logging style using KERN_CONT
Linus Torvalds [Sun, 23 Oct 2016 23:37:58 +0000 (16:37 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"Here are the outstanding target-pending fixes for v4.9-rc2.
This includes:
- Fix v4.1.y+ reference leak regression with concurrent TMR
ABORT_TASK + session shutdown. (Vaibhav Tandon)
- Enable tcm_fc w/ SCF_USE_CPUID to avoid host exchange timeouts
(Hannes)
- target/user error sense handling fixes. (Andy + MNC + HCH)
- Fix iscsi-target NOP_OUT error path iscsi_cmd descriptor leak
(Varun)
- Two EXTENDED_COPY SCSI status fixes for ESX VAAI (Dinesh Israni +
Nixon Vincent)
- Revert a v4.8 residual overflow change, that breaks sg_inq with
small allocation lengths.
There are a number of folks stress testing the v4.1.y regression fix
in their environments, and more folks doing iser-target I/O stress
testing atop recent v4.x.y code.
There is also one v4.2.y+ RCU conversion regression related to
explicit NodeACL configfs changes, that is still being tracked down"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target/tcm_fc: use CPU affinity for responses
target/tcm_fc: Update debugging statements to match libfc usage
target/tcm_fc: return detailed error in ft_sess_create()
target/tcm_fc: print command pointer in debug message
target: fix potential race window in target_sess_cmd_list_waiting()
Revert "target: Fix residual overflow handling in target_complete_cmd_with_length"
target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
iscsi-target: fix iscsi cmd leak
iscsi-target: fix spelling mistake "Unsolicitied" -> "Unsolicited"
target/user: Fix comments to not refer to data ring
target/user: Return an error if cmd data size is too large
target/user: Use sense_reason_t in tcmu_queue_cmd_ring
Linus Torvalds [Sun, 23 Oct 2016 23:21:28 +0000 (16:21 -0700)]
Merge tag 'hwmon-for-linus-v4.9-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Couple of hwmon fixes:
Fix a potential ERR_PTR dereference in max31790 driver, and handle
temperature readings below 0 in adm9240 driver"
* tag 'hwmon-for-linus-v4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (max31790) potential ERR_PTR dereference
hwmon: (adm9240) handle temperature readings below 0
Linus Torvalds [Sun, 23 Oct 2016 22:56:23 +0000 (15:56 -0700)]
Merge tag 'for-linus-4.9-2' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI updates from Corey Minyard:
"A small bug fix and a new driver for acting as an IPMI device.
I was on vacation during the merge window (a long vacation) but this
is a bug fix that should go in and a new driver that shouldn't hurt
anything.
This has been in linux-next for a month or so"
* tag 'for-linus-4.9-2' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: fix crash on reading version from proc after unregisted bmc
ipmi/bt-bmc: remove redundant return value check of platform_get_resource()
ipmi/bt-bmc: add a dependency on ARCH_ASPEED
ipmi: Fix ioremap error handling in bt-bmc
ipmi: add an Aspeed BT IPMI BMC driver
Javier Martinez Canillas [Tue, 18 Oct 2016 20:44:01 +0000 (17:44 -0300)]
gpio: ath79: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/gpio/gpio-ath79.ko | grep alias
$
After this patch:
$ modinfo drivers/gpio/gpio-ath79.ko | grep alias
alias: of:N*T*Cqca,ar9340-gpioC*
alias: of:N*T*Cqca,ar9340-gpio
alias: of:N*T*Cqca,ar7100-gpioC*
alias: of:N*T*Cqca,ar7100-gpio
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Aban Bedel <albeu@free.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Sat, 22 Oct 2016 17:23:15 +0000 (10:23 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"This updates contains:
- A revert which addresses a boot failure on ARM Sun5i platforms
- A new clocksource driver, which has been delayed beyond rc1 due to
an interrupt driver issue which was unearthed by this driver. The
debugging of that issue and the discussion about the proper
solution made this driver miss the merge window. There is no point
in delaying it for a full cycle as it completes the basic mainline
support for the new JCore platform and does not create any risk
outside of that platform"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init"
clocksource: Add J-Core timer/clocksource driver
of: Add J-Core timer bindings
Linus Torvalds [Sat, 22 Oct 2016 16:58:49 +0000 (09:58 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Three fixes, a hw-enablement and a cross-arch fix/enablement change:
- SGI/UV fix for older platforms
- x32 signal handling fix
- older x86 platform bootup APIC fix
- AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
(Multiply Accumulation Single precision instructions) enablement.
- move thread_info back into x86 specific code, to make life easier
for other architectures trying to make use of
CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/smp: Don't try to poke disabled/non-existent APIC
sched/core, x86: Make struct thread_info arch specific again
x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
x86/vmware: Skip timer_irq_works() check on VMware
Linus Torvalds [Sat, 22 Oct 2016 16:39:10 +0000 (09:39 -0700)]
Merge branch 'mm-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull vmap stack fixes from Ingo Molnar:
"This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
accesses that used to be just somewhat questionable are now totally
buggy.
These changes try to do it without breaking the ABI: the fields are
left there, they are just reporting zero, or reporting narrower
information (the maps file change)"
* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
fs/proc: Stop trying to report thread stacks
fs/proc: Stop reporting eip and esp in /proc/PID/stat
mm/numa: Remove duplicated include from mprotect.c
Linus Torvalds [Sat, 22 Oct 2016 16:33:51 +0000 (09:33 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Mostly irqchip driver fixes, plus a symbol export"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/irq: Export irq_set_parent()
irqchip/gic: Add missing \n to CPU IF adjustment message
irqchip/jcore: Don't show Kconfig menu item for driver
irqchip/eznps: Drop pointless static qualifier in nps400_of_init()
irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
irqchip/gic-v3-its: Fix 64bit GIC{R,ITS}_TYPER accesses
Linus Torvalds [Sat, 22 Oct 2016 16:32:10 +0000 (09:32 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Add Ard Biesheuvel as EFI co-maintainer, plus fix an ARM build bug
with older toolchains"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/arm: Fix absolute relocation detection for older toolchains
MAINTAINERS: Add myself as EFI maintainer
Ville Syrjälä [Sat, 22 Oct 2016 02:18:04 +0000 (05:18 +0300)]
x86/boot/smp: Don't try to poke disabled/non-existent APIC
Apparently trying to poke a disabled or non-existent APIC
leads to a box that doesn't even boot. Let's not do that.
No real clue if this is the right fix, but at least my
P3 machine boots again.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: dyoung@redhat.com
Cc: kexec@lists.infradead.org
Cc: stable@vger.kernel.org
Fixes: 2a51fe083eba ("arch/x86: Handle non enumerated CPU after physical hotplug")
Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sat, 22 Oct 2016 02:13:00 +0000 (19:13 -0700)]
Merge tag 'powerpc-4.9-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Prevent unlikely crash in copro_calculate_slb() (Frederic Barrat)
- cxl: Prevent adapter reset if an active context exists (Vaibhav Jain)
Fixes for code merged this cycle:
- Fix boot on systems with uncompressed kernel image (Heiner Kallweit)
- Drop dump_numa_memory_topology() (Michael Ellerman)
- Fix numa topology console print (Aneesh Kumar K.V)
- Ignore the pkey system calls for now (Stephen Rothwell)"
* tag 'powerpc-4.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Ignore the pkey system calls for now
powerpc: Fix numa topology console print
powerpc/mm: Drop dump_numa_memory_topology()
cxl: Prevent adapter reset if an active context exists
powerpc/boot: Fix boot on systems with uncompressed kernel image
powerpc/mm: Prevent unlikely crash in copro_calculate_slb()
Linus Torvalds [Sat, 22 Oct 2016 02:09:29 +0000 (19:09 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- avoid livelock when walking guest page tables
- fix HYP mode static keys without CC_HAVE_ASM_GOTO
MIPS:
- fix a build error without TRACEPOINTS_ENABLED
s390:
- reject a malformed userspace configuration
x86:
- suppress a warning without CONFIG_CPU_FREQ
- initialize whole irq_eoi array"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
arm/arm64: KVM: Map the BSS at HYP
arm64: KVM: Take S1 walks into account when determining S2 write faults
KVM: s390: reject invalid modes for runtime instrumentation
kvm: x86: memset whole irq_eoi
kvm/x86: Fix unused variable warning in kvm_timer_init()
KVM: MIPS: Add missing uaccess.h include
Linus Torvalds [Sat, 22 Oct 2016 02:06:59 +0000 (19:06 -0700)]
Merge tag 'nfs-for-4.9-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Just two bugfixes this time:
Stable bugfix:
- Fix last_write_offset incorrectly set to page boundary
Other bugfix:
- Fix missing-braces warning"
* tag 'nfs-for-4.9-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs4: fix missing-braces warning
pnfs/blocklayout: fix last_write_offset incorrectly set to page boundary
Linus Torvalds [Fri, 21 Oct 2016 22:54:45 +0000 (15:54 -0700)]
Merge tag 'acpi-4.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix an issue related to system resume in the new WDAT-based
watchdog driver and a return value of a stub function in the ACPI CPPC
framework.
Specifics:
- Update the ACPI WDAT-based watchdog driver to ping the hardware
during system resume to prevent a reset from occurring after the
resume is complete (Mika Westerberg).
- Fix the return value of the pcc_mbox_request_channel() stub for
CONFIG_PCC unset (Hoan Tran)"
* tag 'acpi-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
watchdog: wdat_wdt: Ping the watchdog on resume
mailbox: PCC: Fix return value of pcc_mbox_request_channel()
Shaohua Li [Thu, 20 Oct 2016 21:40:06 +0000 (14:40 -0700)]
badblocks: badblocks_set/clear update unacked_exist
When bandblocks_set acknowledges a range or badblocks_clear a range,
it's possible all badblocks are acknowledged. We should update
unacked_exist if this occurs.
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Tested-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Sagi Grimberg [Mon, 10 Oct 2016 12:10:51 +0000 (15:10 +0300)]
softirq: Display IRQ_POLL for irq-poll statistics
This library was moved to the generic area and was
renamed to irq-poll. Hence, update proc/softirqs output accordingly.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Rafael J. Wysocki [Fri, 21 Oct 2016 20:24:23 +0000 (22:24 +0200)]
Merge branches 'acpi-wdat' and 'acpi-cppc'
* acpi-wdat:
watchdog: wdat_wdt: Ping the watchdog on resume
* acpi-cppc:
mailbox: PCC: Fix return value of pcc_mbox_request_channel()
Thomas Gleixner [Fri, 21 Oct 2016 19:40:29 +0000 (21:40 +0200)]
Merge tag 'gic-fixes-for-4.9-rc2' of git://git./linux/kernel/git/maz/arm-platforms into irq/urgent
Pull GIC updates from Marc Zyngier:
- Fix for 32bit accesses that should be 64bit on 64bit machines
- Fix for a field decoding macro
- Beautify a warning message
Linus Torvalds [Fri, 21 Oct 2016 17:57:09 +0000 (10:57 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five small fixes.
Some of these, like the nested spinlock overwriting saved flags and
the Kasan use after free look serious, but they seem not to have been
picked up in testing or seen in the field.
The biggest user visible issue is probably the wrong device handler
for Clariion, which means that alua doesn't bind to the array like it
should"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ipr: Fix async error WARN_ON
scsi: zfcp: spin_lock_irqsave() is not nestable
scsi: Remove one useless stack variable
scsi: Fix use-after-free
scsi: Replace wrong device handler name for CLARiiON arrays
Linus Torvalds [Fri, 21 Oct 2016 17:54:01 +0000 (10:54 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes that missed the merge window, mostly due to me being
away around that time.
Nothing major here, a mix of nvme cleanups and fixes, and one fix for
the badblocks handling"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvmet: use symbolic constants for CNS values
nvme: use symbolic constants for CNS values
nvme.h: add an enum for cns values
nvme.h: don't use uuid_be
nvme.h: resync with nvme-cli
nvme: Add tertiary number to NVME_VS
nvme : Add sysfs entry for NVMe CMBs when appropriate
nvme: don't schedule multiple resets
nvme: Delete created IO queues on reset
nvme: Stop probing a removed device
badblocks: fix overlapping check for clearing
Linus Torvalds [Fri, 21 Oct 2016 17:48:58 +0000 (10:48 -0700)]
Merge tag 'pci-v4.9-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"This includes:
- Fix for a Layerscape driver issue that causes a use-before-set
crash
- Maintainer update for the Synopsis prototyping device driver"
* tag 'pci-v4.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: designware-plat: Update author email address
PCI: layerscape: Fix drvdata usage before assignment
PCI: designware-plat: Change maintainer to Jose Abreu
Radim Krčmář [Fri, 21 Oct 2016 16:49:53 +0000 (18:49 +0200)]
Merge tag 'kvm-arm-for-4.9-rc2' of git://git./linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for 4.9-rc2
- Handle faults generated by the page table walker as being writes
- Map the BSS at EL2
James Bottomley [Fri, 21 Oct 2016 16:40:02 +0000 (12:40 -0400)]
Merge remote-tracking branch 'mkp-scsi/4.9/scsi-fixes' into fixes
Marc Zyngier [Thu, 20 Oct 2016 09:17:21 +0000 (10:17 +0100)]
arm/arm64: KVM: Map the BSS at HYP
When used with a compiler that doesn't implement "asm goto"
(such as the AArch64 port of GCC 4.8), jump labels generate a
memory access to find out about the value of the key (instead
of just patching the code). The key itself is likely to be
stored in the BSS.
This is perfectly fine, except that we don't map the BSS at HYP,
leading to an exploding kernel at the first access. The obvious
fix is simply to map the BSS there (which should have been done
a long while ago, but hey...).
Reported-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Will Deacon [Thu, 29 Sep 2016 11:37:01 +0000 (12:37 +0100)]
arm64: KVM: Take S1 walks into account when determining S2 write faults
The WnR bit in the HSR/ESR_EL2 indicates whether a data abort was
generated by a read or a write instruction. For stage 2 data aborts
generated by a stage 1 translation table walk (i.e. the actual page
table access faults at EL2), the WnR bit therefore reports whether the
instruction generating the walk was a load or a store, *not* whether the
page table walker was reading or writing the entry.
For page tables marked as read-only at stage 2 (e.g. due to KSM merging
them with the tables from another guest), this could result in livelock,
where a page table walk generated by a load instruction attempts to
set the access flag in the stage 1 descriptor, but fails to trigger
CoW in the host since only a read fault is reported.
This patch modifies the arm64 kvm_vcpu_dabt_iswrite function to
take into account stage 2 faults in stage 1 walks. Since DBM cannot be
disabled at EL2 for CPUs that implement it, we assume that these faults
are always causes by writes, avoiding the livelock situation at the
expense of occasional, spurious CoWs.
We could, in theory, do a bit better by checking the guest TCR
configuration and inspecting the page table to see why the PTE faulted.
However, I doubt this is measurable in practice, and the threat of
livelock is real.
Cc: <stable@vger.kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Fri, 21 Oct 2016 16:14:35 +0000 (09:14 -0700)]
Merge tag 'drm-fixes-for-v4.9-rc2-part2' of git://people.freedesktop.org/~airlied/linux
Pull more drm fixes from Dave Airlie:
"Mainly some vmwgfx fixes, but also some fixes for armada, etnaviv and
fsl-dcu"
* tag 'drm-fixes-for-v4.9-rc2-part2' of git://people.freedesktop.org/~airlied/linux:
drm/fsl-dcu: enable pixel clock when enabling CRTC
drm/fsl-dcu: do not transfer registers in mode_set_nofb
drm/fsl-dcu: do not transfer registers on plane init
drm/fsl-dcu: enable TCON bypass mode by default
drm/vmwgfx: Adjust checks for null pointers in 13 functions
drm/vmwgfx: Use memdup_user() rather than duplicating its implementation
drm/vmwgfx: Use kmalloc_array() in vmw_surface_define_ioctl()
drm/vmwgfx: Avoid validating views on view destruction
drm/vmwgfx: Limit the user-space command buffer size
drm/vmwgfx: Remove a leftover debug printout
drm/vmwgfx: Allow resource relocations on byte boundaries
drm/vmwgfx: Enable SVGA_3D_CMD_DX_TRANSFER_FROM_BUFFER command
drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
drm/vmwgfx: Replace numeric parameter like 0444 with macro
drm/etnaviv: block 64K of address space behind each cmdstream
drm/etnaviv: ensure write caches are flushed at end of user cmdstream
drm/armada: fix clock counts
Joao Pinto [Fri, 21 Oct 2016 09:31:48 +0000 (10:31 +0100)]
PCI: designware-plat: Update author email address
Although I am leaving Synopsys, I would like to keep working with the linux
kernel community and help in what you might find useful. For that I am
sending this patch to change my contact e-mail.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Javier Martinez Canillas [Tue, 18 Oct 2016 20:44:02 +0000 (17:44 -0300)]
gpio: ts4800: Fix module autoload
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/gpio/gpio-ts4800.ko | grep alias
$
After this patch:
$ modinfo drivers/gpio/gpio-ts4800.ko | grep alias
alias: of:N*T*Ctechnologic,ts4800-gpioC*
alias: of:N*T*Ctechnologic,ts4800-gpio
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:06 +0000 (16:54 +0200)]
gpio: GPIO_GET_LINEEVENT_IOCTL: Reject invalid line and event flags
The GPIO_GET_LINEEVENT_IOCTL currently ignores unknown or undefined
linehandle and lineevent flags. From a backwards and forwards compatibility
viewpoint it is highly desirable to reject unknown flags though.
On one hand an application that is using newer flags and is running on
an older kernel has no way to detect if the new flags were handled
correctly if they are silently discarded.
On the other hand an application that (accidentally) passes undefined flags
will run fine on an older kernel, but may break on a newer kernel when
these flags get defined.
Ensure that requests that have undefined flags set are rejected with an
error, rather than silently discarding the undefined flags.
Cc: stable@vger.kernel.org
Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:05 +0000 (16:54 +0200)]
gpio: GPIO_GET_LINEHANDLE_IOCTL: Reject invalid line flags
The GPIO_GET_LINEHANDLE_IOCTL currently ignores unknown or undefined
linehandle flags. From a backwards and forwards compatibility viewpoint it
is highly desirable to reject unknown flags though.
On one hand an application that is using newer flags and is running on
an older kernel has no way to detect if the new flags were handled
correctly if they are silently discarded.
On the other hand an application that (accidentally) passes undefined flags
will run fine on an older kernel, but may break on a newer kernel when
these flags get defined.
Ensure that requests that have undefined flags set are rejected with an
error, rather than silently discarding the undefined flags.
Cc: stable@vger.kernel.org
Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:02 +0000 (16:54 +0200)]
gpio: GPIOHANDLE_GET_LINE_VALUES_IOCTL: Fix information leak
The GPIOHANDLE_GET_LINE_VALUES_IOCTL handler allocates a gpiohandle_data
struct on the stack and then passes it to copy_to_user(). But depending on
the number of requested line handles the struct is only partially
initialized.
This exposes the previous, potentially sensitive, stack content to the
issuing userspace application. To avoid this make sure that the struct is
fully initialized.
Cc: stable@vger.kernel.org
Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:03 +0000 (16:54 +0200)]
gpio: GPIO_GET_LINEEVENT_IOCTL: Validate line offset
The line offset that is used as an index into the descs array is provided
by userspace and might go beyond the bounds of the array. If that happens
undefined behavior will occur.
Make sure that the offset is within the bounds of the desc array and reject
any requests that specify a value outside of it.
Cc: stable@vger.kernel.org
Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:04 +0000 (16:54 +0200)]
gpio: GPIOHANDLE_GET_LINE_VALUES_IOCTL: Fix information leak
The GPIOHANDLE_GET_LINE_VALUES_IOCTL handler allocates a gpiohandle_data
struct on the stack and then passes it to copy_to_user(). But only the
first element of the values array in the struct is set, which leaves the
struct partially initialized.
This exposes the previous, potentially sensitive, stack content to the
issuing userspace application. To avoid this make sure that the struct is
fully initialized.
Cc: stable@vger.kernel.org
Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:01 +0000 (16:54 +0200)]
gpio: GPIO_GET_LINEHANDLE_IOCTL: Validate line offset
The line offset that is used as an index into the descs array is provided
by userspace and might go beyond the bounds of the array. If that happens
undefined behavior will occur.
Make sure that the offset is within the bounds of the desc array and reject
any requests that specify a value outside of it.
Cc: stable@vger.kernel.org
Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:54:00 +0000 (16:54 +0200)]
gpio: GPIO_GET_CHIPINFO_IOCTL: Fix information leak
The GPIO_GET_CHIPINFO_IOCTL handler allocates a gpiochip_info struct on the
stack and then passes it to copy_to_user(). But depending on the length of
the GPIO chip name and label the struct is only partially initialized.
This exposes the previous, potentially sensitive, stack content to the
issuing userspace application. To avoid this make sure that the struct is
fully initialized.
Cc: stable@vger.kernel.org
Fixes: 521a2ad6f862 ("gpio: add userspace ABI for GPIO line information")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Lars-Peter Clausen [Tue, 18 Oct 2016 14:53:59 +0000 (16:53 +0200)]
gpio: GPIO_GET_CHIPINFO_IOCTL: Fix line offset validation
The current line offset validation is off by one. Depending on the data
stored behind the descs array this can either cause undefined behavior or
disclose arbitrary, potentially sensitive, memory to the issuing userspace
application.
Make sure that offset is within the bounds of the desc array.
Cc: stable@vger.kernel.org
Fixes: 521a2ad6f862 ("gpio: add userspace ABI for GPIO line information")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Sudip Mukherjee [Thu, 6 Oct 2016 17:36:43 +0000 (23:06 +0530)]
kernel/irq: Export irq_set_parent()
The TPS65217 driver grew interrupt support which uses
irq_set_parent(). While it's not yet clear why this is used in the first
place, building the driver as a module fails with:
ERROR: ".irq_set_parent" [drivers/mfd/tps65217.ko] undefined!
The correctness of the driver change is still investigated, but for now
it's less trouble to export irq_set_parent() than dealing with the build
wreckage.
[ tglx: Rewrote changelog and made the export GPL ]
Fixes: 6556bdacf646 ("mfd: tps65217: Add support for IRQs")
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Marcin Niestroj <m.niestroj@grinn-global.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Lee Jones <lee.jones@linaro.org>
Link: http://lkml.kernel.org/r/1475775403-27207-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Hannes Reinecke [Mon, 22 Aug 2016 08:54:11 +0000 (10:54 +0200)]
target/tcm_fc: use CPU affinity for responses
The libfc stack assigns exchange IDs based on the CPU the request
was received on, so we need to send the responses via the same CPU.
Otherwise the send logic gets confuses and responses will be delayed,
causing exchange timeouts on the initiator side.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 22 Aug 2016 08:54:10 +0000 (10:54 +0200)]
target/tcm_fc: Update debugging statements to match libfc usage
Update the debug statements to match those from libfc.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 22 Aug 2016 08:54:09 +0000 (10:54 +0200)]
target/tcm_fc: return detailed error in ft_sess_create()
Not every failure is due to out-of-memory; the ACLs might not be
set, too. So return a detailed error code in ft_sess_create()
instead of just a NULL pointer.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 22 Aug 2016 08:54:08 +0000 (10:54 +0200)]
target/tcm_fc: print command pointer in debug message
When allocating a new command we should add the pointer to the
debug statements; that allows us to match this with other debug
statements for handling data.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Hannes Reinecke [Mon, 22 Aug 2016 08:54:07 +0000 (10:54 +0200)]
target: fix potential race window in target_sess_cmd_list_waiting()
target_sess_cmd_list_waiting() might hit on a condition where
the kref for the command is already 0, but the destructor has
not been called yet (or is stuck in waiting for a spin lock).
Rather than leaving the command on the list we should explicitly
remove it to avoid race issues later on.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Dave Airlie [Fri, 21 Oct 2016 03:27:55 +0000 (13:27 +1000)]
Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/lst/linux into drm-fixes
2 more patches to stabilize the new MMUv2 support.
* 'drm-etnaviv-fixes' of git://git.pengutronix.de/lst/linux:
drm/etnaviv: block 64K of address space behind each cmdstream
drm/etnaviv: ensure write caches are flushed at end of user cmdstream
Dave Airlie [Fri, 21 Oct 2016 03:26:58 +0000 (13:26 +1000)]
Merge branch 'drm-vmwgfx-fixes' of ssh://people.freedesktop.org/~syeh/repos_linux into drm-fixes
vmwgfx cleanups and fixes.
* 'drm-vmwgfx-fixes' of ssh://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Adjust checks for null pointers in 13 functions
drm/vmwgfx: Use memdup_user() rather than duplicating its implementation
drm/vmwgfx: Use kmalloc_array() in vmw_surface_define_ioctl()
drm/vmwgfx: Avoid validating views on view destruction
drm/vmwgfx: Limit the user-space command buffer size
drm/vmwgfx: Remove a leftover debug printout
drm/vmwgfx: Allow resource relocations on byte boundaries
drm/vmwgfx: Enable SVGA_3D_CMD_DX_TRANSFER_FROM_BUFFER command
drm/vmwgfx: Remove call to reservation_object_test_signaled_rcu before wait
drm/vmwgfx: Replace numeric parameter like 0444 with macro
Dave Airlie [Fri, 21 Oct 2016 03:26:15 +0000 (13:26 +1000)]
Merge branch 'drm-armada-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm into drm-fixes
One small fix for Armada, where the clock prepare/enable counts were
going awry.
* 'drm-armada-fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
drm/armada: fix clock counts
Dave Airlie [Fri, 21 Oct 2016 03:25:28 +0000 (13:25 +1000)]
Merge branch 'fixes-for-v4.9-rc2' of git.agner.ch/git/linux-drm-fsl-dcu into drm-fixes
This are some fixes which I hoped to still get into v4.9. I used to
test them here since about 2 weeks and Meng came around to test it
on the second platform making use of this IP too, so they are well
tested now.
* 'fixes-for-v4.9-rc2' of http://git.agner.ch/git/linux-drm-fsl-dcu:
drm/fsl-dcu: enable pixel clock when enabling CRTC
drm/fsl-dcu: do not transfer registers in mode_set_nofb
drm/fsl-dcu: do not transfer registers on plane init
drm/fsl-dcu: enable TCON bypass mode by default
Christophe JAILLET [Sun, 25 Sep 2016 11:53:58 +0000 (13:53 +0200)]
clk: at91: Fix a return value in case of error
If 'clk_hw_register()' fails, it is likely that we expect to return an
error instead of a valid pointer (which would mean success).
Fix commit
f5644f10dcfb ("clk: at91: Migrate to clk_hw based registration
and OF APIs")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Mika Westerberg [Thu, 20 Oct 2016 15:03:36 +0000 (18:03 +0300)]
watchdog: wdat_wdt: Ping the watchdog on resume
It turns out we need to ping the watchdog hardware on resume when we
re-program it. Otherwise this causes inadvertent reset to trigger
right after the resume is complete.
Fixes: 058dfc767008 (ACPI / watchdog: Add support for WDAT hardware watchdog)
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Thu, 20 Oct 2016 22:32:51 +0000 (15:32 -0700)]
Merge tag 'pm-4.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes the pointer arithmetics mess-up in the cpufreq core
introduced by one of recent commits and leading to all kinds of
breakage from kernel crashes to incorrect governor decisions (Sergey
Senozhatsky)"
* tag 'pm-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: fix overflow in cpufreq_table_find_index_dl()
Rafael J. Wysocki [Thu, 20 Oct 2016 21:24:58 +0000 (23:24 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq: fix overflow in cpufreq_table_find_index_dl()
Chen-Yu Tsai [Tue, 18 Oct 2016 05:49:18 +0000 (13:49 +0800)]
Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init"
struct clocksource is also used by the clk notifier callback, to
unregister and re-register the clocksource with a different clock rate.
clocksource_mmio_init does not pass back a pointer to the struct used,
and the clk notifier callback assumes that the struct clocksource in
struct sun5i_timer_clksrc is valid. This results in a kernel NULL
pointer dereference when the hstimer clock is changed:
Unable to handle kernel NULL pointer dereference at virtual address
00000004
[<
c03a4678>] (clocksource_unbind) from [<
c03a46d4>] (clocksource_unregister+0x2c/0x44)
[<
c03a46d4>] (clocksource_unregister) from [<
c0a6f350>] (sun5i_rate_cb_clksrc+0x34/0x3c)
[<
c0a6f350>] (sun5i_rate_cb_clksrc) from [<
c035ea50>] (notifier_call_chain+0x44/0x84)
[<
c035ea50>] (notifier_call_chain) from [<
c035edc0>] (__srcu_notifier_call_chain+0x44/0x60)
[<
c035edc0>] (__srcu_notifier_call_chain) from [<
c035edf4>] (srcu_notifier_call_chain+0x18/0x20)
[<
c035edf4>] (srcu_notifier_call_chain) from [<
c0670174>] (__clk_notify+0x70/0x7c)
[<
c0670174>] (__clk_notify) from [<
c06702c0>] (clk_propagate_rate_change+0xa4/0xc4)
[<
c06702c0>] (clk_propagate_rate_change) from [<
c0670288>] (clk_propagate_rate_change+0x6c/0xc4)
Revert the commit for now. clocksource_mmio_init can be made to pass back
a pointer, but the code churn and usage of an inner struct might not be
worth it.
Fixes: 157dfadef832 ("clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init")
Reported-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Cc: linux-sunxi@googlegroups.com
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20161018054918.26855-1-wens@csie.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Radim Krčmář [Thu, 20 Oct 2016 18:31:01 +0000 (20:31 +0200)]
Merge tag 'kvm-s390-master-4.9-1' of git://git./linux/kernel/git/kvms390/linux
KVM: s390: Fix for user-triggerable WARN_ON
A malicious user space can provide an invalid mode for runtime
instrumentation via the interfaces that are normally used on
the target host during migration. This would trigger a WARN_ON
via validity intercept. Let's detect this special case.
Rich Felker [Thu, 13 Oct 2016 21:51:06 +0000 (21:51 +0000)]
clocksource: Add J-Core timer/clocksource driver
At the hardware level, the J-Core PIT is integrated with the interrupt
controller, but it is represented as its own device and has an
independent programming interface. It provides a 12-bit countdown
timer, which is not presently used, and a periodic timer. The interval
length for the latter is programmable via a 32-bit throttle register
whose units are determined by a bus-period register. The periodic
timer is used to implement both periodic and oneshot clock event
modes; in oneshot mode the interrupt handler simply disables the timer
as soon as it fires.
Despite its device tree node representing an interrupt for the PIT,
the actual irq generated is programmable, not hard-wired. The driver
is responsible for programming the PIT to generate the hardware irq
number that the DT assigns to it.
On SMP configurations, J-Core provides cpu-local instances of the PIT;
no broadcast timer is needed. This driver supports the creation of the
necessary per-cpu clock_event_device instances.
A nanosecond-resolution clocksource is provided using the J-Core "RTC"
registers, which give a 64-bit seconds count and 32-bit nanoseconds
that wrap every second. The driver converts these to a full-range
32-bit nanoseconds count.
Signed-off-by: Rich Felker <dalias@libc.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Link: http://lkml.kernel.org/r/b591ff12cc5ebf63d1edc98da26046f95a233814.1476393790.git.dalias@libc.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Rich Felker [Thu, 13 Oct 2016 21:51:06 +0000 (21:51 +0000)]
of: Add J-Core timer bindings
Signed-off-by: Rich Felker <dalias@libc.org>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Link: http://lkml.kernel.org/r/8b107c292ed8cf8eed0fa283071fc8a930098628.1476393790.git.dalias@libc.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Christian Borntraeger [Wed, 28 Sep 2016 14:18:47 +0000 (16:18 +0200)]
KVM: s390: reject invalid modes for runtime instrumentation
Usually a validity intercept is a programming error of the host
because of invalid entries in the state description.
We can get a validity intercept if the mode of the runtime
instrumentation control block is wrong. As the host does not know
which modes are valid, this can be used by userspace to trigger
a WARN.
Instead of printing a WARN let's return an error to userspace as
this can only happen if userspace provides a malformed initial
value (e.g. on migration). The kernel should never warn on bogus
input. Instead let's log it into the s390 debug feature.
While at it, let's return -EINVAL for all validity intercepts as
this will trigger an error in QEMU like
error: kvm run failed Invalid argument
PSW=mask
0404c00180000000 addr
000000000063c226 cc 00
R00=
000000000000004f R01=
0000000000000004 R02=
0000000000760005 R03=
000000007fe0a000
R04=
000000000064ba2a R05=
000000049db73dd0 R06=
000000000082c4b0 R07=
0000000000000041
R08=
0000000000000002 R09=
000003e0804042a8 R10=
0000000496152c42 R11=
000000007fe0afb0
[...]
This will avoid an endless loop of validity intercepts.
Cc: stable@vger.kernel.org # v4.5+
Fixes: c6e5f166373a ("KVM: s390: implement the RI support of guest")
Acked-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Christoph Hellwig [Thu, 20 Oct 2016 15:15:41 +0000 (17:15 +0200)]
ahci: fix nvec check
commit
17a51f12 ("ahci: only try to use multi-MSI mode if there is more
than 1 port") lead to a case where nvec isn't initialized before it's
used. Fix this by moving the check into the n_ports conditional.
Reported-and-reviewed-by Colin Ian King <colin.king@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Thu, 20 Oct 2016 17:17:13 +0000 (10:17 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Most of these are CC'd for stable, but there are a few fixing issues
introduced during the recent merge window too.
There's also a fix for the xgene PMU driver, but it seemed daft to
send as a separate pull request, so I've included it here with the
rest of the fixes.
- Fix ACPI boot due to recent broken NUMA changes
- Fix remote enabling of CPU features requiring PSTATE bit manipulation
- Add address range check when emulating user cache maintenance
- Fix LL/SC loops that allow compiler to introduce memory accesses
- Fix recently added write_sysreg_s macro
- Ensure MDCR_EL2 is initialised on qemu targets without a PMU
- Avoid kaslr breakage due to MODVERSIONs and DYNAMIC_FTRACE
- Correctly drive recent ld when building relocatable Image
- Remove junk IS_ERR check from xgene PMU driver added during merge window
- pr_cont fixes after core changes in the merge window"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: remove pr_cont abuse from mem_init
arm64: fix show_regs fallout from KERN_CONT changes
arm64: kernel: force ET_DYN ELF type for CONFIG_RELOCATABLE=y
arm64: suspend: Reconfigure PSTATE after resume from idle
arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call
arm64: cpufeature: Schedule enable() calls instead of calling them via IPI
arm64: Cortex-A53 errata workaround: check for kernel addresses
arm64: percpu: rewrite ll/sc loops in assembly
arm64: swp emulation: bound LL/SC retries before rescheduling
arm64: sysreg: Fix use of XZR in write_sysreg_s
arm64: kaslr: keep modules close to the kernel when DYNAMIC_FTRACE=y
arm64: kernel: Init MDCR_EL2 even in the absence of a PMU
perf: xgene: Remove bogus IS_ERR() check
arm64: kernel: numa: fix ACPI boot cpu numa node mapping
arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y
Linus Torvalds [Thu, 20 Oct 2016 16:57:51 +0000 (09:57 -0700)]
Merge tag 'ceph-for-4.9-rc2' of git://github.com/ceph/ceph-client
Pull Ceph fixes from Ilya Dryomov:
"An rbd exclusive-lock edge case fix and several filesystem fixups.
Nikolay's error path patch is tagged for stable, everything else but
readdir vs frags race was introduced in this merge window"
* tag 'ceph-for-4.9-rc2' of git://github.com/ceph/ceph-client:
ceph: fix non static symbol warning
ceph: fix uninitialized dentry pointer in ceph_real_mount()
ceph: fix readdir vs fragmentation race
ceph: fix error handling in ceph_read_iter
rbd: don't retry watch reregistration if header object is gone
rbd: don't wait for the lock forever if blacklisted
Linus Torvalds [Thu, 20 Oct 2016 15:59:12 +0000 (08:59 -0700)]
Merge tag 'mmc-v4.9-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are some mmc fixes intended for v4.9 rc2.
This time I have also included a few changes for a memstick driver
which has a corresponding mmc driver. They use the same USB device as
parent, hence both needs to play nice with runtime PM, which they
didn't.
MMC core:
- Update MAINTAINERS as the mmc tree moved to kernel.org
- A few fixes for HS400es mode
- A few other minor fixes
MMC host:
- sdhci: Fix an issue when dealing with stop commands
- sdhci-pci: Fix a bus power failure issue
- sdhci-esdhc-imx: Correct two register accesses
- sdhci-of-arasan: Fix the 1.8V I/O signal switch behaviour
- rtsx_usb_sdmmc: Fix runtime PM issues
Other: (Because of no maintainer)
- memstick: rtsx_usb_ms: Fix runtime PM issues"
* tag 'mmc-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
MAINTAINERS: mmc: Move the mmc tree to kernel.org
memstick: rtsx_usb_ms: Manage runtime PM when accessing the device
memstick: rtsx_usb_ms: Runtime resume the device when polling for cards
mmc: rtsx_usb_sdmmc: Handle runtime PM while changing the led
mmc: rtsx_usb_sdmmc: Avoid keeping the device runtime resumed when unused
mmc: sdhci: cast unsigned int to unsigned long long to avoid unexpeted error
mmc: sdhci-esdhc-imx: Correct two register accesses
mmc: sdhci-pci: Fix bus power failing to enable for some Intel controllers
mmc: sdhci-pci: Let devices define their own sdhci_ops
mmc: sdhci: Rename sdhci_set_power() to sdhci_set_power_noreg()
mmc: sdhci: Fix SDHCI_QUIRK2_STOP_WITH_TC
mmc: core: Annotate cmd_hdr as __le32
mmc: sdhci-of-arasan: add sdhci_arasan_voltage_switch for arasan, 5.1
mmc: core: changes frequency to hs_max_dtr when selecting hs400es
mmc: core: switch to 1V8 or 1V2 for hs400es mode
mmc: block: add missing header dependencies
mmc: sdhci-of-arasan: Fix non static symbol warning