Rasmus Villemoes [Fri, 8 Mar 2019 00:27:52 +0000 (16:27 -0800)]
dynamic_debug: add static inline stub for ddebug_add_module
For symmetry with ddebug_remove_module, and to avoid a bit of ifdeffery
in module.c, move the declaration of ddebug_add_module inside #if
defined(CONFIG_DYNAMIC_DEBUG) and add a corresponding no-op stub in the
#else branch.
Link: http://lkml.kernel.org/r/20190212214150.4807-10-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:48 +0000 (16:27 -0800)]
dynamic_debug: move pr_err from module.c to ddebug_add_module
This serves two purposes: First, we get a diagnostic if (though
extremely unlikely), any of the calls of ddebug_add_module for built-in
code fails, effectively disabling dynamic_debug. Second, I want to make
struct _ddebug opaque, and avoid accessing any of its members outside
dynamic_debug.[ch].
Link: http://lkml.kernel.org/r/20190212214150.4807-9-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:45 +0000 (16:27 -0800)]
dynamic_debug: remove unused EXPORT_SYMBOLs
The only caller of ddebug_{add,remove}_module outside dynamic_debug.c is
kernel/module.c, which is obviously not itself modular (though it would
be an interesting exercise to make that happen...). I also fail to see
how these interfaces can be used by modules, in-tree or not.
Link: http://lkml.kernel.org/r/20190212214150.4807-8-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:41 +0000 (16:27 -0800)]
dynamic_debug: use pointer comparison in ddebug_remove_module
Now that we store the passed-in string directly in ddebug_add_module, we
can use pointer equality instead of strcmp. This is a little more
efficient, but more importantly, this also makes the code somewhat more
correct:
Currently, if one loads and then unloads a module whose name happens to
match the KBUILD_MODNAME of some built-in functionality (which need not
even be modular at all), all of their dynamic debug entries vanish along
with those of the actual module. For example, loading and unloading a
core.ko hides all pr_debugs from drivers/base/core.c and other built-in
files called core.c (incidentally, there is an in-tree module whose name
is core, but I just tested this with an out-of-tree trivial one).
Link: http://lkml.kernel.org/r/20190212214150.4807-7-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:37 +0000 (16:27 -0800)]
dynamic_debug: don't duplicate modname in ddebug_add_module
For built-in modules, we're already reusing the passed-in string via
kstrdup_const(). But for actual modules (i.e. when we're called from
dynamic_debug_setup in module.c), the passed-in string (which points at
the name[] array inside struct module) is also guaranteed to live at
least as long as the struct ddebug_table, since free_module() calls
ddebug_remove_module().
Link: http://lkml.kernel.org/r/20190212214150.4807-6-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:33 +0000 (16:27 -0800)]
dynamic_debug: consolidate DEFINE_DYNAMIC_DEBUG_METADATA definitions
Instead of defining DEFINE_DYNAMIC_DEBUG_METADATA in terms of a helper
DEFINE_DYNAMIC_DEBUG_METADATA_KEY, that needs another helper dd_key_init
to be properly defined, just make the various #ifdef branches define a
_DPRINTK_KEY_INIT that can be used directly, similar to
_DPRINTK_FLAGS_DEFAULT.
Link: http://lkml.kernel.org/r/20190212214150.4807-5-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:29 +0000 (16:27 -0800)]
linux/printk.h: use DYNAMIC_DEBUG_BRANCH in pr_debug_ratelimited
pr_debug_ratelimited tests the dynamic debug descriptor the
old-fashioned way, and doesn't utilize the static key/jump label
implementation when CONFIG_JUMP_LABEL is set. Use the
DYNAMIC_DEBUG_BRANCH which is defined appropriately.
Link: http://lkml.kernel.org/r/20190212214150.4807-4-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:25 +0000 (16:27 -0800)]
linux/net.h: use DYNAMIC_DEBUG_BRANCH in net_dbg_ratelimited
net_dbg_ratelimited tests the dynamic debug descriptor the old-fashioned
way, and doesn't utilize the static key/jump label implementation when
CONFIG_JUMP_LABEL is set. Use the DYNAMIC_DEBUG_BRANCH which is defined
appropriately.
Link: http://lkml.kernel.org/r/20190212214150.4807-3-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:21 +0000 (16:27 -0800)]
linux/device.h: use DYNAMIC_DEBUG_BRANCH in dev_dbg_ratelimited
Patch series "various dynamic_debug patches", v4.
This started as an experiment to see how hard it would be to change the
four pointers in struct _ddebug into relative offsets, a la
CONFIG_GENERIC_BUG_RELATIVE_POINTERS, thus saving 16 bytes per pr_debug
site (and thus exactly making up for the extra space used by the
introduction of jump labels in
9049fc74). I stumbled on a few things
that are probably worth fixing regardless of whether that goal is deemed
worthwhile.
Back at v3 (in November), I redid the implementation on top of the fancy
new asm-macros stuff. Luckily enough, v3 didn't get picked up, since
the asm-macros were backed out again. I still want to do the
relative-pointers thing eventually, but we're close to the merge window
opening, so here's just most of the "incidental" patches, some of which
also serve as preparation for the relative pointers.
This patch (of 4):
dev_dbg_ratelimited tests the dynamic debug descriptor the old-fashioned
way, and doesn't utilize the static key/jump label implementation when
CONFIG_JUMP_LABEL is set. Use the DYNAMIC_DEBUG_BRANCH which is defined
appropriately.
Link: http://lkml.kernel.org/r/20190212214150.4807-2-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jason Baron <jbaron@akamai.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadav Amit [Fri, 8 Mar 2019 00:27:18 +0000 (16:27 -0800)]
include/linux/pid.h: remove next_pidmap() declaration
Commit
95846ecf9dac ("pid: replace pid bitmap implementation with IDR
API") removed next_pidmap() but left its declaration.
Remove it. No functional change.
Link: http://lkml.kernel.org/r/20190213113736.21922-1-namit@vmware.com
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 8 Mar 2019 00:27:14 +0000 (16:27 -0800)]
linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>
<linux/kernel.h> tends to be cluttered because we often put various sort
of unrelated stuff in it. So, we have split out a sensible chunk of
code into a separate header from time to time.
This commit splits out the *_MAX and *_MIN defines.
The standard header <limits.h> contains various MAX, MIN constants
including numerial limits. [1]
I think it makes sense to move in-kernel MAX, MIN constants into
include/linux/limits.h.
We already have include/uapi/linux/limits.h to contain some user-space
constants. I changed its include guard to _UAPI_LINUX_LIMITS_H. This
change has no impact to the user-space because
scripts/headers_install.sh rips off the '_UAPI' prefix from the include
guards of exported headers.
[1] http://pubs.opengroup.org/onlinepubs/
009604499/basedefs/limits.h.html
Link: http://lkml.kernel.org/r/1549156242-20806-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Alex Elder <elder@linaro.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Fri, 8 Mar 2019 00:27:11 +0000 (16:27 -0800)]
linux/kernel.h: use 'short' to define USHRT_MAX, SHRT_MAX, SHRT_MIN
The commit log of
44f564a4bf6a ("ipc: add definitions of USHORT_MAX and
others") did not explain why it used (s16) and (u16) instead of (short)
and (unsigned short).
Let's use (short) and (unsigned short), which is more sensible, and more
consistent with the other MAX/MIN defines.
As you see in include/uapi/asm-generic/int-ll64.h, s16/u16 are
typedef'ed as signed/unsigned short. So, this commit does not have a
functional change.
Remove the unneeded parentheses around ~0U while we are here.
Link: http://lkml.kernel.org/r/1549156242-20806-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: Alex Elder <elder@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:07 +0000 (16:27 -0800)]
linux/fs.h: move member alignment check next to definition of struct filename
Instead of doing this compile-time check in some slightly arbitrary user
of struct filename, put it next to the definition.
Link: http://lkml.kernel.org/r/20190208203015.29702-3-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:03 +0000 (16:27 -0800)]
lib/vsprintf.c: move sizeof(struct printf_spec) next to its definition
At the time of commit
d048419311ff ("lib/vsprintf.c: expand field_width
to 24 bits"), there was no compiletime_assert/BUILD_BUG/.... variant
that could be used outside function scope. Now we have static_assert(),
so move the assertion next to the definition instead of hiding it in
some arbitrary function.
Also add the appropriate #include to avoid relying on build_bug.h being
pulled in via some arbitrary chain of includes.
Link: http://lkml.kernel.org/r/20190208203015.29702-2-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Fri, 8 Mar 2019 00:27:00 +0000 (16:27 -0800)]
build_bug.h: add wrapper for _Static_assert
BUILD_BUG_ON() is a little annoying, since it cannot be used outside
function scope. So one cannot put assertions about the sizeof() a
struct next to the struct definition, but has to hide that in some more
or less arbitrary function.
Since gcc 4.6 (which is now also the required minimum), there is support
for the C11 _Static_assert in all C modes, including gnu89. So add a
simple wrapper for that.
_Static_assert() requires a message argument, which is usually quite
redundant (and I believe that bug got fixed at least in newer C++
standards), but we can easily work around that with a little macro
magic, making it optional.
For example, adding
static_assert(sizeof(struct printf_spec) == 8);
in vsprintf.c and modifying that struct to violate it, one gets
./include/linux/build_bug.h:78:41: error: static assertion failed: "sizeof(struct printf_spec) == 8"
#define __static_assert(expr, msg, ...) _Static_assert(expr, "" msg "")
godbolt.org suggests that _Static_assert() has been support by clang
since at least 3.0.0.
Link: http://lkml.kernel.org/r/20190208203015.29702-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Fri, 8 Mar 2019 00:26:56 +0000 (16:26 -0800)]
scripts/spelling.txt: add more spellings to spelling.txt
Here are some of the more common spelling mistakes and typos that I've
found while fixing up spelling mistakes in the kernel over the past 4
months.
Link: http://lkml.kernel.org/r/20190114110215.1986-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Malaterre [Fri, 8 Mar 2019 00:26:53 +0000 (16:26 -0800)]
kernel/sys: annotate implicit fall through
There is a plan to build the kernel with -Wimplicit-fallthrough and this
place in the code produced a warning (W=1).
This commit remove the following warning:
kernel/sys.c:1748:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
Link: http://lkml.kernel.org/r/20190114203347.17530-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 8 Mar 2019 00:26:50 +0000 (16:26 -0800)]
kernel/hung_task.c: Use continuously blocked time when reporting.
Since commit
a2e514453861 ("kernel/hung_task.c: allow to set checking
interval separately from timeout") added hung_task_check_interval_secs,
setting a value different from hung_task_timeout_secs
echo 0 > /proc/sys/kernel/hung_task_panic
echo 120 > /proc/sys/kernel/hung_task_timeout_secs
echo 5 > /proc/sys/kernel/hung_task_check_interval_secs
causes confusing output as if the task was blocked for
hung_task_timeout_secs seconds from the previous report.
[ 399.395930] INFO: task kswapd0:75 blocked for more than 120 seconds.
[ 405.027637] INFO: task kswapd0:75 blocked for more than 120 seconds.
[ 410.659725] INFO: task kswapd0:75 blocked for more than 120 seconds.
[ 416.292860] INFO: task kswapd0:75 blocked for more than 120 seconds.
[ 421.932305] INFO: task kswapd0:75 blocked for more than 120 seconds.
Although we could update t->last_switch_time after sched_show_task(t) if
we want to report only every 120 seconds, reporting every 5 seconds
might not be very bad for monitoring after a problematic situation has
started. Thus, let's use continuously blocked time instead of updating
previously reported time.
[ 677.985011] INFO: task kswapd0:80 blocked for more than 122 seconds.
[ 693.856126] INFO: task kswapd0:80 blocked for more than 138 seconds.
[ 709.728075] INFO: task kswapd0:80 blocked for more than 154 seconds.
[ 725.600018] INFO: task kswapd0:80 blocked for more than 170 seconds.
[ 741.473133] INFO: task kswapd0:80 blocked for more than 186 seconds.
Link: http://lkml.kernel.org/r/1551175083-10669-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Valdis Kletnieks [Fri, 8 Mar 2019 00:26:46 +0000 (16:26 -0800)]
kernel/hung_task.c - fix sparse warnings
sparse complains:
CHECK kernel/hung_task.c
kernel/hung_task.c:28:19: warning: symbol 'sysctl_hung_task_check_count' was not declared. Should it be static?
kernel/hung_task.c:42:29: warning: symbol 'sysctl_hung_task_timeout_secs' was not declared. Should it be static?
kernel/hung_task.c:47:29: warning: symbol 'sysctl_hung_task_check_interval_secs' was not declared. Should it be static?
kernel/hung_task.c:49:19: warning: symbol 'sysctl_hung_task_warnings' was not declared. Should it be static?
kernel/hung_task.c:61:28: warning: symbol 'sysctl_hung_task_panic' was not declared. Should it be static?
kernel/hung_task.c:219:5: warning: symbol 'proc_dohung_task_timeout_secs' was not declared. Should it be static?
Add the appropriate header file to provide declarations.
Link: http://lkml.kernel.org/r/467.1548649525@turing-police.cc.vt.edu
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WangBo [Fri, 8 Mar 2019 00:26:43 +0000 (16:26 -0800)]
include/linux/types.h: use "unsigned int" instead of "unsigned"
Use "unsigned int" instead of "unsigned", to make code more clear.
Link: http://lkml.kernel.org/r/1551354739-6648-1-git-send-email-wdjjwb@163.com
Signed-off-by: WangBo <wang.bo116@zte.com.cn>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 8 Mar 2019 00:26:39 +0000 (16:26 -0800)]
<linux/kernel.h>: drop the gcc-3.3 'const' hack in roundup()
The single quotation marks around "const" were causing a documentation
markup warning with reST. Instead of fixing that warning, just delete
that comment line and the gcc-3.3 hack of using "const" in the roundup()
macro since gcc-3.3 is no longer supported for kernel builds.
I did around 20 different $arch builds with no problems, but we'll just
have to see if this causes problems for anyone else out there.
Link: http://lkml.kernel.org/r/ec5dcf72-7c3e-3513-af0c-4003ed598854@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Fri, 8 Mar 2019 00:26:36 +0000 (16:26 -0800)]
kernel/panic.c: taint: fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for
debugfs files.
Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().
Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
The _unsafe() part suggests that some of them "safeness
responsibilities" are now panic.c responsibilities. The patch is OK
since panic's clear_warn_once_fops struct file_operations is safe
against removal, so we don't have to use otherwise necessary
debugfs_file_get()/debugfs_file_put().
[sergey.senozhatsky.work@gmail.com: changelog addition]
Link: http://lkml.kernel.org/r/1545990861-158097-1-git-send-email-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jani Nikula [Fri, 8 Mar 2019 00:26:32 +0000 (16:26 -0800)]
kernel.h: unconditionally include asm/div64.h for do_div()
Include asm/div64.h for do_div() usage in DIV_ROUND_DOWN_ULL() and
DIV_ROUND_CLOSEST_ULL(). Remove the old CONFIG_LBDAF=y conditional
include.
Link: http://lkml.kernel.org/r/20181228153430.23763-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 7 Mar 2019 00:48:27 +0000 (16:48 -0800)]
Merge tag 'usb-5.1-rc1' of git://git./linux/kernel/git/gregkh/usb
Pull USB/PHY updates from Greg KH:
"Here is the big USB/PHY driver pull request for 5.1-rc1.
The usual set of gadget driver updates, phy driver updates, xhci
updates, and typec additions. Also included in here are a lot of small
cleanups and fixes and driver updates where needed.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (167 commits)
wusb: Remove unnecessary static function ckhdid_printf
usb: core: make default autosuspend delay configurable
usb: core: Fix typo in description of "authorized_default"
usb: chipidea: Refactor USB PHY selection and keep a single PHY
usb: chipidea: Grab the (legacy) USB PHY by phandle first
usb: chipidea: imx: set power polarity
dt-bindings: usb: ci-hdrc-usb2: add property power-active-high
usb: chipidea: imx: remove unused header files
usb: chipidea: tegra: Fix missed ci_hdrc_remove_device()
usb: core: add option of only authorizing internal devices
usb: typec: tps6598x: handle block writes separately with plain-I2C adapters
usb: xhci: Fix for Enabling USB ROLE SWITCH QUIRK on INTEL_SUNRISEPOINT_LP_XHCI
usb: xhci: fix build warning - missing prototype
usb: xhci: dbc: Fixing typo error.
usb: xhci: remove unused member 'parent' in xhci_regset struct
xhci: tegra: Prevent error pointer dereference
USB: serial: option: add Telit ME910 ECM composition
usb: core: Replace hardcoded check with inline function from usb.h
usb: core: skip interfaces disabled in devicetree
usb: typec: mux: remove redundant check on variable match
...
Linus Torvalds [Thu, 7 Mar 2019 00:35:12 +0000 (16:35 -0800)]
Merge tag 'tty-5.1-rc1' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the "big" patchset for the tty/serial driver layer for
5.1-rc1.
It's really not all that big, nothing major here.
There are a lot of tiny driver fixes and updates, combined with other
cleanups for different serial drivers and the vt layer. Full details
are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'tty-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (70 commits)
tty: xilinx_uartps: Correct return value in probe
serial: sprd: Modify the baud rate calculation formula
dt-bindings: serial: Add Milbeaut serial driver description
serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart
serial: 8250_pxa: honor the port number from devicetree
tty: hvc_xen: Mark expected switch fall-through
tty: n_gsm: Mark expected switch fall-throughs
tty: serial: msm_serial: Remove __init from msm_console_setup()
tty: serial: samsung: Enable baud clock during initialisation
serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO
tty: serial: remove redundant likely annotation
tty/n_hdlc: mark expected switch fall-through
serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
serial: 8250_pci: Fix number of ports for ACCES serial cards
vt: perform safe console erase in the right order
tty/nozomi: use pci_iomap instead of ioremap_nocache
tty/synclink: remove ISA support
serial: 8250_pci: Replace custom code with pci_match_id()
serial: max310x: Correction of the initial setting of the MODE1 bits for various supported ICs.
serial: mps2-uart: Add parentheses around conditional in mps2_uart_shutdown
...
Linus Torvalds [Thu, 7 Mar 2019 00:29:27 +0000 (16:29 -0800)]
Merge tag 'staging-5.1-rc1' of git://git./linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big staging/iio driver pull request for 5.1-rc1.
Lots of good IIO driver updates and cleanups in here as always.
Combined with the removal of the xgifb driver, we have a net "loss" of
over 9000 lines in the pull request, always a nice thing.
As the outreachy application process is currently happening, there are
loads of tiny checkpatch cleanup fixes all over the staging tree,
which accounts for the majority of the fixups"
* tag 'staging-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (341 commits)
staging: mt7621-dma: remove license boilerplate text
staging: mt7621-dma: add SPDX GPL-2.0+ license identifier
Staging: ks7010: Replace typecast to int
Staging: vt6655: Align a static function declaration
staging: speakup: fix line over 80 characters.
staging: mt7621-eth: Remove license boilerplate text
staging: mt7621-eth: Add SPDX license identifier
staging: ks7010: removed custom Michael MIC implementation.
staging: rtl8192e: Fix space and suspect issue
Staging: vt6655: Modify comment style of SPDX License Identifier
Staging: vt6655: Modify comment style for SPDX-License-Identifier
Staging: vt6655: Align a function declaration
Staging: vt6655: Alignment of function declaration
staging: rtl8712: Fix indentation issue
staging: wilc1000: fix incorrent type in initializer
staging: rtl8188eu: remove unused P2P_PRIVATE_IOCTL_SET_LEN
staging: rtl8188eu: remove unused enum P2P_PROTO_WK_ID
staging: rtl8723bs: Remove duplicated include from drv_types.h
Staging: vt6655: Alignment should match open parenthesis
staging: erofs: fix mis-acted TAIL merging behavior
...
Linus Torvalds [Wed, 6 Mar 2019 23:41:29 +0000 (15:41 -0800)]
iio: adc: fix warning in Qualcomm PM8xxx HK/XOADC driver
The pm8xxx_get_channel() implementation is unclear, and causes gcc to
suddenly generate odd warnings. The trigger for the warning (at least
for me) was the entirely unrelated commit
79a4e91d1bb2 ("device.h: Add
__cold to dev_<level> logging functions"), which apparently changes gcc
code generation in the caller function enough to cause this:
drivers/iio/adc/qcom-pm8xxx-xoadc.c: In function ‘pm8xxx_xoadc_probe’:
drivers/iio/adc/qcom-pm8xxx-xoadc.c:633:8: warning: ‘ch’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ret = pm8xxx_read_channel_rsv(adc, ch, AMUX_RSV4,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
&read_nomux_rsv4, true);
~~~~~~~~~~~~~~~~~~~~~~~
drivers/iio/adc/qcom-pm8xxx-xoadc.c:426:27: note: ‘ch’ was declared here
struct pm8xxx_chan_info *ch;
^~
because gcc for some reason then isn't able to see that the termination
condition for the "for( )" loop in that function is also the condition
for returning NULL.
So it's not _actually_ uninitialized, but the function is admittedly
just unnecessarily oddly written.
Simplify and clarify the function, making gcc also see that it always
returns a valid initialized value.
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: David Brown <david.brown@linaro.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 6 Mar 2019 22:52:48 +0000 (14:52 -0800)]
Merge tag 'driver-core-5.1-rc1' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
driver core: platform: remove misleading err_alloc label
platform: set of_node in platform_device_register_full()
firmware: hardcode the debug message for -ENOENT
driver core: Add missing description of new struct device_link field
driver core: Fix PM-runtime for links added during consumer probe
drivers/component: kerneldoc polish
async: Add cmdline option to specify drivers to be async probed
driver core: Fix possible supplier PM-usage counter imbalance
PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
driver: platform: Support parsing GpioInt 0 in platform_get_irq()
selftests: firmware: fix verify_reqs() return value
Revert "selftests: firmware: remove use of non-standard diff -Z option"
Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
device: Fix comment for driver_data in struct device
kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
sysfs: remove unused include of kernfs-internal.h
driver core: Postpone DMA tear-down until after devres release
driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
PM-runtime: Take suppliers into account in __pm_runtime_set_status()
device.h: Add __cold to dev_<level> logging functions
...
Linus Torvalds [Wed, 6 Mar 2019 22:18:59 +0000 (14:18 -0800)]
Merge tag 'char-misc-5.1-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver patch pull request for 5.1-rc1.
The largest thing by far is the new habanalabs driver for their AI
accelerator chip. For now it is in the drivers/misc directory but will
probably move to a new directory soon along with other drivers of this
type.
Other than that, just the usual set of individual driver updates and
fixes. There's an "odd" merge in here from the DRM tree that they
asked me to do as the MEI driver is starting to interact with the i915
driver, and it needed some coordination. All of those patches have
been properly acked by the relevant subsystem maintainers.
All of these have been in linux-next with no reported issues, most for
quite some time"
* tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (219 commits)
habanalabs: adjust Kconfig to fix build errors
habanalabs: use %px instead of %p in error print
habanalabs: use do_div for 64-bit divisions
intel_th: gth: Fix an off-by-one in output unassigning
habanalabs: fix little-endian<->cpu conversion warnings
habanalabs: use NULL to initialize array of pointers
habanalabs: fix little-endian<->cpu conversion warnings
habanalabs: soft-reset device if context-switch fails
habanalabs: print pointer using %p
habanalabs: fix memory leak with CBs with unaligned size
habanalabs: return correct error code on MMU mapping failure
habanalabs: add comments in uapi/misc/habanalabs.h
habanalabs: extend QMAN0 job timeout
habanalabs: set DMA0 completion to SOB 1007
habanalabs: fix validation of WREG32 to DMA completion
habanalabs: fix mmu cache registers init
habanalabs: disable CPU access on timeouts
habanalabs: add MMU DRAM default page mapping
habanalabs: Dissociate RAZWI info from event types
misc/habanalabs: adjust Kconfig to fix build errors
...
Linus Torvalds [Wed, 6 Mar 2019 22:10:46 +0000 (14:10 -0800)]
Merge tag 'sound-5.1-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"We had again a busy development cycle with many new drivers as well as
lots of core improvements / cleanups. Let's go for highlights:
ALSA core:
- PCM locking scheme was refactored for reducing a global rwlock
- PCM suspend is handled in the device type PM ops now; lots of
explicit calls were reduced by this action
- Cleanups about PCM buffer preallocation calls
- Kill NULL device object in memory allocations
- Lots of procfs API cleanups
ASoC core:
- Support for only powering up channels that are actively being used
- Cleanups / fixes of topology API
ASoC drivers:
- MediaTek BTCVSD for a Bluetooth radio chip, which is the first such
driver we've had upstream!
- Quite a few improvements to simplify the generic card drivers,
especially the merge of the SCU cards into the main generic drivers
- Lots of fixes for probing on Intel systems to follow more standard
styles
- A big refresh and cleanup of the Samsung drivers
- New drivers: Asahi Kasei Microdevices AK4497, Cirrus Logic CS4341
and CS35L26, Google ChromeOS embedded controllers, Ingenic JZ4725B,
MediaTek BTCVSD, MT8183 and MT6358, NXP MICFIL, Rockchip RK3328,
Spreadtrum DMA controllers, Qualcomm WCD9335, Xilinx S/PDIF and PCM
formatters
ALSA drivers:
- Improvements of Tegra HD-audio controller driver for supporting new
chips
- HD-audio codec quirks for ALC294 S4 resume, ASUS laptop, Chrome
headset button support and Dell workstations
- Improved DSD support on USB-audio
- Quirk for MOTU MicroBook II USB-audio
- Support for Fireface UCX support and Solid State Logic Duende
Classic/Mini"
* tag 'sound-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (461 commits)
ALSA: usb-audio: Add quirk for MOTU MicroBook II
ASoC: stm32: i2s: skip useless write in slave mode
ASoC: stm32: i2s: fix race condition in irq handler
ASoC: stm32: i2s: remove useless callback
ASoC: stm32: i2s: fix dma configuration
ASoC: stm32: i2s: fix stream count management
ASoC: stm32: i2s: fix 16 bit format support
ASoC: stm32: i2s: fix IRQ clearing
ASoC: qcom: Kconfig: fix dependency for sdm845
ASoC: Intel: Boards: Add Maxim98373 support
ASoC: rsnd: gen: fix SSI9 4/5/6/7 busif related register address
ALSA: firewire-motu: fix construction of PCM frame for capture direction
ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56
ALSA: hda: Extend i915 component bind timeout
ASoC: wm_adsp: Improve logging messages
ASoC: wm_adsp: Add support for multiple compressed buffers
ASoC: wm_adsp: Refactor compress stream initialisation
ASoC: wm_adsp: Reorder some functions for improved clarity
ASoC: wm_adsp: Factor out stripping padding from ADSP data
ASoC: cs35l36: Fix an IS_ERR() vs NULL checking bug
...
Linus Torvalds [Wed, 6 Mar 2019 21:49:54 +0000 (13:49 -0800)]
Merge tag 'devprop-5.1-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"Fix the length value used in the PROPERTY_ENTRY_STRING() macro and
make software nodes use the get_named_child_node() fwnode callback
(Heikki Krogerus)"
* tag 'devprop-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
software node: Implement get_named_child_node fwnode callback
device property: Fix the length used in PROPERTY_ENTRY_STRING()
Linus Torvalds [Wed, 6 Mar 2019 21:33:11 +0000 (13:33 -0800)]
Merge tag 'acpi-5.1-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These are ACPICA updates including ACPI 6.3 support among other
things, APEI updates including the ARM Software Delegated Exception
Interface (SDEI) support, ACPI EC driver fixes and cleanups and other
assorted improvements.
Specifics:
- Update the ACPICA code in the kernel to upstream revision
20190215
including ACPI 6.3 support and more:
* New predefined methods: _NBS, _NCH, _NIC, _NIH, and _NIG (Erik
Schmauss).
* Update of the PCC Identifier structure in PDTT (Erik Schmauss).
* Support for new Generic Affinity Structure subtable in SRAT
(Erik Schmauss).
* New PCC operation region support (Erik Schmauss).
* Support for GICC statistical profiling for MADT (Erik Schmauss).
* New Error Disconnect Recover notification support (Erik
Schmauss).
* New PPTT Processor Structure Flags fields support (Erik
Schmauss).
* ACPI 6.3 HMAT updates (Erik Schmauss).
* GTDT Revision 3 support (Erik Schmauss).
* Legacy module-level code (MLC) support removal (Erik Schmauss).
* Update/clarification of messages for control method failures
(Bob Moore).
* Warning on creation of a zero-length opregion (Bob Moore).
* acpiexec option to dump extra info for memory leaks (Bob Moore).
* More ACPI error to firmware error conversions (Bob Moore).
* Debugger fix (Bob Moore).
* Copyrights update (Bob Moore)
- Clean up sleep states support code in ACPICA (Christoph Hellwig)
- Rework in_nmi() handling in the APEI code and add suppor for the
ARM Software Delegated Exception Interface (SDEI) to it (James
Morse)
- Fix possible out-of-bounds accesses in BERT-related core (Ross
Lagerwall)
- Fix the APEI code parsing HEST that includes a Deferred Machine
Check subtable (Yazen Ghannam)
- Use DEFINE_DEBUGFS_ATTRIBUTE for APEI-related debugfs files
(YueHaibing)
- Switch the APEI ERST code to the new generic UUID API (Andy
Shevchenko)
- Update the MAINTAINERS entry for APEI (Borislav Petkov)
- Fix and clean up the ACPI EC driver (Rafael Wysocki, Zhang Rui)
- Fix DMI checks handling in the ACPI backlight driver and add the
"Lunch Box" chassis-type check to it (Hans de Goede)
- Add support for using ACPI table overrides included in built-in
initrd images (Shunyong Yang)
- Update ACPI device enumeration to treat the PWM2 device as "always
present" on Lenovo Yoga Book (Yauhen Kharuzhy)
- Fix up the enumeration of device objects with the PRP0001 device ID
(Andy Shevchenko)
- Clean up PPTT parsing error messages (John Garry)
- Clean up debugfs files creation handling (Greg Kroah-Hartman,
Rafael Wysocki)
- Clean up the ACPI DPTF Makefile (Masahiro Yamada)"
* tag 'acpi-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
ACPI / bus: Respect PRP0001 when retrieving device match data
ACPICA: Update version to
20190215
ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting
ACPICA: ACPI 6.3: add GTDT Revision 3 support
ACPICA: ACPI 6.3: HMAT updates
ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags
ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value
ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC
ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter
efi: cper: Fix possible out-of-bounds access
ACPI: APEI: Fix possible out-of-bounds access to BERT region
ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable
ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT
ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG
ACPICA: Update/clarify messages for control method failures
ACPICA: Debugger: Fix possible fault with the "test objects" command
ACPICA: Interpreter: Emit warning for creation of a zero-length op region
ACPICA: Remove legacy module-level code support
ACPI / x86: Make PWM2 device always present at Lenovo Yoga Book
ACPI / video: Extend chassis-type detection with a "Lunch Box" check
..
Linus Torvalds [Wed, 6 Mar 2019 20:59:46 +0000 (12:59 -0800)]
Merge tag 'pm-5.1-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are PM-runtime framework changes to use ktime instead of jiffies
for accounting, new PM core flag to mark devices that don't need any
form of power management, cpuidle updates including driver API
documentation and a new governor, cpufreq updates including a new
driver for Armada 8K, thermal cleanups and more, some energy-aware
scheduling (EAS) enabling changes, new chips support in the intel_idle
and RAPL drivers and assorted cleanups in some other places.
Specifics:
- Update the PM-runtime framework to use ktime instead of jiffies for
accounting (Thara Gopinath, Vincent Guittot)
- Optimize the autosuspend code in the PM-runtime framework somewhat
(Ladislav Michl)
- Add a PM core flag to mark devices that don't need any form of
power management (Sudeep Holla)
- Introduce driver API documentation for cpuidle and add a new
cpuidle governor for tickless systems (Rafael Wysocki)
- Add Jacobsville support to the intel_idle driver (Zhang Rui)
- Clean up a cpuidle core header file and the cpuidle-dt and ACPI
processor-idle drivers (Yangtao Li, Joseph Lo, Yazen Ghannam)
- Add new cpufreq driver for Armada 8K (Gregory Clement)
- Fix and clean up cpufreq core (Rafael Wysocki, Viresh Kumar, Amit
Kucheria)
- Add support for light-weight tear-down and bring-up of CPUs to the
cpufreq core and use it in the cpufreq-dt driver (Viresh Kumar)
- Fix cpu_cooling Kconfig dependencies, add support for CPU cooling
auto-registration to the cpufreq core and use it in multiple
cpufreq drivers (Amit Kucheria)
- Fix some minor issues and do some cleanups in the davinci,
e_powersaver, ap806, s5pv210, qcom and kryo cpufreq drivers
(Bartosz Golaszewski, Gustavo Silva, Julia Lawall, Paweł Chmiel,
Taniya Das, Viresh Kumar)
- Add a Hisilicon CPPC quirk to the cppc_cpufreq driver (Xiongfeng
Wang)
- Clean up the intel_pstate and acpi-cpufreq drivers (Erwan Velu,
Rafael Wysocki)
- Clean up multiple cpufreq drivers (Yangtao Li)
- Update cpufreq-related MAINTAINERS entries (Baruch Siach, Lukas
Bulwahn)
- Add support for exposing the Energy Model via debugfs and make
multiple cpufreq drivers register an Energy Model to support
energy-aware scheduling (Quentin Perret, Dietmar Eggemann, Matthias
Kaehlcke)
- Add Ice Lake mobile and Jacobsville support to the Intel RAPL
power-capping driver (Gayatri Kammela, Zhang Rui)
- Add a power estimation helper to the operating performance points
(OPP) framework and clean up a core function in it (Quentin Perret,
Viresh Kumar)
- Make minor improvements in the generic power domains (genpd), OPP
and system suspend frameworks and in the PM core (Aditya Pakki,
Douglas Anderson, Greg Kroah-Hartman, Rafael Wysocki, Yangtao Li)"
* tag 'pm-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (80 commits)
cpufreq: kryo: Release OPP tables on module removal
cpufreq: ap806: add missing of_node_put after of_device_is_available
cpufreq: acpi-cpufreq: Report if CPU doesn't support boost technologies
cpufreq: Pass updated policy to driver ->setpolicy() callback
cpufreq: Fix two debug messages in cpufreq_set_policy()
cpufreq: Reorder and simplify cpufreq_update_policy()
cpufreq: Add kerneldoc comments for two core functions
PM / core: Add support to skip power management in device/driver model
cpufreq: intel_pstate: Rework iowait boosting to be less aggressive
cpufreq: intel_pstate: Eliminate intel_pstate_get_base_pstate()
cpufreq: intel_pstate: Avoid redundant initialization of local vars
powercap/intel_rapl: add Ice Lake mobile
ACPI / processor: Set P_LVL{2,3} idle state descriptions
cpufreq / cppc: Work around for Hisilicon CPPC cpufreq
ACPI / CPPC: Add a helper to get desired performance
cpufreq: davinci: move configuration to include/linux/platform_data
cpufreq: speedstep: convert BUG() to BUG_ON()
cpufreq: powernv: fix missing check of return value in init_powernv_pstates()
cpufreq: longhaul: remove unneeded semicolon
cpufreq: pcc-cpufreq: remove unneeded semicolon
..
Linus Torvalds [Wed, 6 Mar 2019 18:31:36 +0000 (10:31 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...
Linus Torvalds [Wed, 6 Mar 2019 18:22:26 +0000 (10:22 -0800)]
Merge tag 'armsoc-late' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC late updates from Arnd Bergmann:
"Here are two branches that came relatively late during the linux-5.0
development cycle and have dependencies on the other branches:
- On the TI OMAP platform, the CPSW Ethernet PHY mode selection
driver is being replaced, this puts the final pieces in place
- On the DaVinci platform, the interrupt handling code in arch/arm
gets moved into a regular device driver in drivers/irqchip.
Since they both had some time in linux-next after the 5.0-rc8 release,
I'm sending them along with the other updates"
* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (38 commits)
net: ethernet: ti: cpsw: deprecate cpsw-phy-sel driver
ARM: davinci: remove intc related fields from davinci_soc_info
irqchip: davinci-cp-intc: move the driver to drivers/irqchip
ARM: davinci: cp-intc: remove redundant comments
ARM: davinci: cp-intc: drop GPL license boilerplate
ARM: davinci: cp-intc: use readl/writel_relaxed()
ARM: davinci: cp-intc: unify error handling
ARM: davinci: cp-intc: improve coding style
ARM: davinci: cp-intc: request the memory region before remapping it
ARM: davinci: cp-intc: use the new-style config structure
ARM: davinci: cp-intc: convert all hex numbers to lowercase
ARM: davinci: cp-intc: use a common prefix for all symbols
ARM: davinci: cp-intc: add the new config structures for da8xx SoCs
irqchip: davinci-cp-intc: add a new config structure
ARM: davinci: cp-intc: add a wrapper around cp_intc_init()
ARM: davinci: cp-intc: remove cp_intc.h
irqchip: davinci-aintc: move the driver to drivers/irqchip
ARM: davinci: aintc: remove unnecessary includes
ARM: davinci: aintc: remove the timer-specific irq_set_handler()
ARM: davinci: aintc: request memory region before remapping it
...
Linus Torvalds [Wed, 6 Mar 2019 18:15:42 +0000 (10:15 -0800)]
Merge tag 'armsoc-newsoc' of git://git./linux/kernel/git/soc/soc
Pull ARM new SoC family support from Arnd Bergmann:
"Two new SoC families are added this time.
Sugaya Taichi submitted support for the Milbeaut SoC family from
Socionext and explains:
"SC2000 is a SoC of the Milbeaut series. equipped with a DSP
optimized for computer vision. It also features advanced
functionalities such as 360-degree, real-time spherical stitching
with multi cameras, image stabilization for without mechanical
gimbals, and rolling shutter correction. More detail is below:
https://www.socionext.com/en/products/assp/milbeaut/SC2000.html"
Interestingly, this one has a history dating back to older chips made
by Socionext and previously Matsushita/Panasonic based on their own
mn10300 CPU architecture that was removed from the kernel last year.
Manivannan Sadhasivam adds support for another SoC family, this is the
Bitmain BM1880 chip used in the Sophon Edge TPU developer board.
The chip is intended for Deep Learning applications, and comes with
dual-core Arm Cortex-A53 to run Linux as well as a RISC-V
microcontroller core to control the tensor unit. For the moment, the
TPU is not accessible in mainline Linux, so we treat it as a generic
Arm SoC.
More information is available at
https://www.sophon.ai/"
* tag 'armsoc-newsoc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: multi_v7_defconfig: add ARCH_MILBEAUT and ARCH_MILBEAUT_M10V
ARM: configs: Add Milbeaut M10V defconfig
ARM: dts: milbeaut: Add device tree set for the Milbeaut M10V board
clocksource/drivers/timer-milbeaut: Introduce timer for Milbeaut SoCs
dt-bindings: timer: Add Milbeaut M10V timer description
ARM: milbeaut: Add basic support for Milbeaut m10v SoC
dt-bindings: Add documentation for Milbeaut SoCs
dt-bindings: arm: Add SMP enable-method for Milbeaut
dt-bindings: sram: milbeaut: Add binding for Milbeaut smp-sram
MAINTAINERS: Add entry for Bitmain SoC platform
arm64: dts: bitmain: Add Sophon Egde board support
arm64: dts: bitmain: Add BM1880 SoC support
arm64: Add ARCH_BITMAIN platform
dt-bindings: arm: Document Bitmain BM1880 SoC
Linus Torvalds [Wed, 6 Mar 2019 18:09:50 +0000 (10:09 -0800)]
Merge tag 'armsoc-defconfig' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC defconfig updates from Arnd Bergmann:
"We regenerated the defconfig files for samsung, shmobile, lpc18xx,
lpc32xx, omap2, and nhk8815.
Lots of additional drivers added on samsung and nhk8815, as well as
the new pl110 driver on all machines that have it.
The remaining changes are mostly to enable newly added drivers, and in
case of imx8mq together with the SoC getting merged"
* tag 'armsoc-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (47 commits)
ARM: spear3xx_defconfig: Activate PL111 DRM driver
ARM: nhk8815_defconfig: Add new options
ARM: nhk8815_defconfig: Update defconfig
ARM: pxa: remove CONFIG_SND_PXA2XX_AC97 in pxa_defconfig
ARM: defconfig: integrator: Switch to DRM
arm64: defconfig: Add IMX2+ watchdog
arm64: defconfig: Enable PFUZE100 regulator
arm64: defconfig: enable NXP FlexSPI driver
arm64: defconfig: Add i.MX8MQ boot necessary configs
arm64: defconfig: add imx8qxp support
arm64: defconfig: add i.MX system controller RTC support
arm64: defconfig: Enable Tegra TCU
arm64: defconfig: Enable MAX8973 regulator
ARM: socfpga_defconfig: enable BLK_DEV_LOOP config option
ARM: defconfig: lpc32xx: enable DRM simple panel driver
ARM: defconfig: lpc32xx: enable fixed voltage regulator support
arm64: defconfig: Enable SUN6I Camera sensor interface
arm64: defconfig: Enable I2C_GPIO
ARM: omap2plus_defconfig: Update for moved options
ARM: omap2plus_defconfig: Update for dropped options
...
Linus Torvalds [Wed, 6 Mar 2019 17:41:12 +0000 (09:41 -0800)]
Merge tag 'armsoc-drivers' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"As usual, the drivers/tee and drivers/reset subsystems get merged
here, with the expected set of smaller updates and some new hardware
support. The tee subsystem now supports device drivers to be attached
to a tee, the first example here is a random number driver with its
implementation in the secure world.
Three new power domain drivers get added for specific chip families:
- Broadcom BCM283x chips (used in Raspberry Pi)
- Qualcomm Snapdragon phone chips
- Xilinx ZynqMP FPGA SoCs
One new driver is added to talk to the BPMP firmware on NVIDIA
Tegra210
Existing drivers are extended for new SoC variants from NXP, NVIDIA,
Amlogic and Qualcomm"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (113 commits)
tee: optee: update optee_msg.h and optee_smc.h to dual license
tee: add cancellation support to client interface
dpaa2-eth: configure the cache stashing amount on a queue
soc: fsl: dpio: configure cache stashing destination
soc: fsl: dpio: enable frame data cache stashing per software portal
soc: fsl: guts: make fsl_guts_get_svr() static
hwrng: make symbol 'optee_rng_id_table' static
tee: optee: Fix unsigned comparison with less than zero
hwrng: Fix unsigned comparison with less than zero
tee: fix possible error pointer ctx dereferencing
hwrng: optee: Initialize some structs using memset instead of braces
tee: optee: Initialize some structs using memset instead of braces
soc: fsl: dpio: fix memory leak of a struct qbman on error exit path
clk: tegra: dfll: Make symbol 'tegra210_cpu_cvb_tables' static
soc: qcom: llcc-slice: Fix typos
qcom: soc: llcc-slice: Consolidate some code
qcom: soc: llcc-slice: Clear the global drv_data pointer on error
drivers: soc: xilinx: Add ZynqMP power domain driver
firmware: xilinx: Add APIs to control node status/power
dt-bindings: power: Add ZynqMP power domain bindings
...
Linus Torvalds [Wed, 6 Mar 2019 17:36:37 +0000 (09:36 -0800)]
Merge tag 'armsoc-dt' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC device tree updates from Arnd Bergmann:
"This is a smaller update than the past few times, but with just over
500 non-merge changesets still dwarfes the rest of the SoC tree.
Three new SoC platforms get added, each one a follow-up to an existing
product, and added here in combination with a reference platform:
- Renesas RZ/A2M (R7S9210) 32-bit Cortex-A9 Real-time imaging
processor:
https://www.renesas.com/eu/en/products/microcontrollers-microprocessors/rz/rza/rza2m.html
- Renesas RZ/G2E (r8a774c0) 64-bit Cortex-A53 SoC "for Rich Graphics
Applications":
https://www.renesas.com/eu/en/products/microcontrollers-microprocessors/rz/rzg/rzg2e.html
- NXP i.MX8QuadXPlus 64-bit Cortex-A35 SoC:
https://www.nxp.com/products/processors-and-microcontrollers/arm-based-processors-and-mcus/i.mx-applications-processors/i.mx-8-processors/i.mx-8x-family-arm-cortex-a35-3d-graphics-4k-video-dsp-error-correcting-code-on-ddr:i.MX8X
These are actual commercial products we now support with an in-kernel
device tree source file:
- Bosch Guardian is a product made by Bosch Power Tools GmbH, based
on the Texas Instruments AM335x chip
- Winterland IceBoard is a Texas Instruments AM3874 based machine
used in telescopes at the south pole and elsewhere, see commit
d031773169df2 for some pointers:
- Inspur on5263m5 is an x86 server platform with an Aspeed ast2500
baseboard management controller. This is for running on the BMC.
- Zodiac Digital Tapping Unit, apparently a kind of ethernet switch
used in airplanes.
- Phicomm K3 is a WiFi router based on Broadcom bcm47094
- Methode Electronics uDPU FTTdp distribution point unit
- X96 Max, a generic TV box based on Amlogic G12a (S905X2)
- NVIDIA Shield TV (Darcy) based on Tegra210
And then there are several new SBC, evaluation, development or modular
systems that we add:
- Three new Rockchips rk3399 based boards:
- FriendlyElec NanoPC-T4 and NanoPi M4
- Radxa ROCK Pi 4
- Five new i.MX6 family SoM modules and boards for industrial
products:
- Logic PD i.MX6QD SoM and evaluation baseboad
- Y Soft IOTA Draco/Hydra/Ursa family boards based on i.MX6DL
- Phytec phyCORE i.MX6 UltraLite SoM and evaluation module
- MYIR Tech MYD-LPC4357 development based on the NXP lpc4357
microcontroller
- Chameleon96, an Intel/Altera Cyclone5 based FPGA development system
in 96boards form factor
- Arm Fixed Virtual Platforms(FVP) Base RevC, a purely virtual
platform for corresponding to the latest "fast model"
- Another Raspberry Pi variant: Model 3 A+, supported both in 32-bit
and 64-bit mode.
- Oxalis Evalkit V100 based on NXP Layerscape LS1012a, in 96Boards
enterprise form factor
- Elgin RV1108 R1 development board based on 32-bit Rockchips RV1108
For already supported boards and SoCs, we often add support for new
devices after merging the drivers. This time, the largest changes
include updates for
- STMicroelectronics stm32mp1, which was now formally launched last
week
- Qualcomm Snapdragon 845, a high-end phone and low-end laptop chip
- Action Semi S700
- TI AM654x, their recently merged 64-bit SoC from the OMAP family
- Various Amlogic Meson SoCs
- Mediatek MT2712
- NVIDIA Tegra186 and Tegra210
- The ancient NXP lpc32xx family
- Samsung s5pv210, used in some older mobile phones
Many other chips see smaller updates and bugfixes beyond that"
* tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (506 commits)
ARM: dts: exynos: Fix max voltage for buck8 regulator on Odroid XU3/XU4
dt-bindings: net: ti: deprecate cpsw-phy-sel bindings
ARM: dts: am335x: switch to use phy-gmii-sel
ARM: dts: am4372: switch to use phy-gmii-sel
ARM: dts: dm814x: switch to use phy-gmii-sel
ARM: dts: dra7: switch to use phy-gmii-sel
arch: arm: dts: kirkwood-rd88f6281: Remove disabled marvell,dsa reference
ARM: dts: exynos: Add support for secondary DAI to Odroid XU4
ARM: dts: exynos: Add support for secondary DAI to Odroid XU3
ARM: dts: exynos: Disable ARM PMU on Odroid XU3-lite
ARM: dts: exynos: Add stdout path property to Arndale board
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
ARM: dts: exynos: Enable ADC on Odroid HC1
arm64: dts: sprd: Remove wildcard compatible string
arm64: dts: sprd: Add SC27XX fuel gauge device
arm64: dts: sprd: Add SC2731 charger device
arm64: dts: sprd: Add ADC calibration support
arm64: dts: sprd: Remove PMIC INTC irq trigger type
arm64: dts: rockchip: Enable tsadc device on rock960
ARM: dts: rockchip: add chosen node on veyron devices
...
Linus Torvalds [Wed, 6 Mar 2019 17:33:05 +0000 (09:33 -0800)]
Merge tag 'armsoc-soc' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC platform updates from Arnd Bergmann:
"The APM X-Gene platform is now maintained by folks from Ampere
computing that took over the product line a while ago, this gets
reflected in the MAINTAINERS file.
Cleanups continue on the older mach-davinci and mach-pxa platform, to
get them to be more like the modern ones. For pxa, we now remove the
Raumfeld platform code as it now works with device tree based booting.
i.MX adds a couple new features for the i.MX7ULP SoC
Mediatek gains support for a new SoC: MT7629 is a new wireless router
platform, following MT7623.
Aside from those, there are the usual minor cleanups and bugfixes
across several platforms"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (49 commits)
MAINTAINERS: Update Ampere email address
usb: ohci-da8xx: remove unused callbacks from platform data
ARM: davinci: da830-evm: remove legacy usb helpers
ARM: davinci: omapl138-hawk: remove legacy usb helpers
usb: ohci-da8xx: add vbus and overcurrent gpios
ARM: davinci: da830-evm: use gpio lookup entries for usb gpios
ARM: davinci: omapl138-hawk: use gpio lookup entries for usb gpios
usb: ohci-da8xx: add a helper pointer to &pdev->dev
usb: ohci-da8xx: add a new line after local variables
arm64: meson: enable g12a clock controller
MAINTAINERS: Add entry for uDPU board
ARM: davinci: da850-evm: use GPIO hogs instead of the legacy API
arm: mediatek: add MT7629 smp bring up code
Revert "ARM: mediatek: add MT7623a smp bringup code"
dt-bindings: soc: fix typo of MT8173 power dt-bindings
ARM: meson: remove COMMON_CLK_AMLOGIC selection
arm64: meson: remove COMMON_CLK_AMLOGIC selection
ARM: lpc32xx: remove platform data of ARM PL111 LCD controller
ARM: lpc32xx: remove platform data of ARM PL180 SD/MMC controller
ARM: lpc32xx: Use kmemdup to replace duplicating its implementation
...
Linus Torvalds [Wed, 6 Mar 2019 17:18:43 +0000 (09:18 -0800)]
Merge tag 'asm-generic-5.1' of git://git./linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"Only a few small changes this time:
- Michael S. Tsirkin cleans up linux/mman.h
- Mike Rapoport found a typo
I had originally merged another cleanup series for I/O accessors from
Hugo Lefeuvre as well, but dropped it after the discussion of the
barrier semantics and some conflicts. I expect this series to get
merged for a later release though"
* tag 'asm-generic-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic/page.h: fix typo in #error text requiring a real asm/page.h
arch: move common mmap flags to linux/mman.h
drm: tweak header name
x86/mpx: tweak header name
Linus Torvalds [Wed, 6 Mar 2019 17:07:08 +0000 (09:07 -0800)]
Merge tag 'y2038-fix' of git://git./linux/kernel/git/arnd/playground
Pull y2038 build fix for compat mode from Arnd Bergmann:
"Here is one more patch on top of the y2038 changes already pulled for
linux-5.1, for some reason this had escaped all testing"
* tag 'y2038-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
ipc: Fix building compat mode without sysvipc
Linus Torvalds [Wed, 6 Mar 2019 16:45:46 +0000 (08:45 -0800)]
Merge branch 'x86-alternatives-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 alternative instruction updates from Ingo Molnar:
"Small RDTSCP opimization, enabled by the newly added ALTERNATIVE_3(),
and other small improvements"
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/TSC: Use RDTSCP
x86/alternatives: Add an ALTERNATIVE_3() macro
x86/alternatives: Print containing function
x86/alternatives: Add macro comments
Linus Torvalds [Wed, 6 Mar 2019 16:14:05 +0000 (08:14 -0800)]
Merge branch 'sched-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- refcount conversions
- Solve the rq->leaf_cfs_rq_list can of worms for real.
- improve power-aware scheduling
- add sysctl knob for Energy Aware Scheduling
- documentation updates
- misc other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
kthread: Do not use TIMER_IRQSAFE
kthread: Convert worker lock to raw spinlock
sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
sched/fair: Remove unused 'sd' parameter from select_idle_smt()
sched/wait: Use freezable_schedule() when possible
sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
sched/fair: Explain LLC nohz kick condition
sched/fair: Simplify nohz_balancer_kick()
sched/topology: Fix percpu data types in struct sd_data & struct s_data
sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
sched/fair: Fix O(nr_cgroups) in the load balancing path
sched/fair: Optimize update_blocked_averages()
sched/fair: Fix insertion in rq->leaf_cfs_rq_list
sched/fair: Add tmp_alone_branch assertion
sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
sched/fair: Update scale invariance of PELT
sched/fair: Move the rq_of() helper function
sched/core: Convert task_struct.stack_refcount to refcount_t
...
Linus Torvalds [Wed, 6 Mar 2019 15:59:36 +0000 (07:59 -0800)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:
- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.
- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates
- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side
- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.
- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
...
Linus Torvalds [Wed, 6 Mar 2019 15:17:17 +0000 (07:17 -0800)]
Merge branch 'locking-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The biggest part of this tree is the new auto-generated atomics API
wrappers by Mark Rutland.
The primary motivation was to allow instrumentation without uglifying
the primary source code.
The linecount increase comes from adding the auto-generated files to
the Git space as well:
include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++--
include/asm-generic/atomic-long.h | 1174 ++++++++++---
include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++
include/linux/atomic.h | 1241 +------------
I preferred this approach, so that the full call stack of the (already
complex) locking APIs is still fully visible in 'git grep'.
But if this is excessive we could certainly hide them.
There's a separate build-time mechanism to determine whether the
headers are out of date (they should never be stale if we do our job
right).
Anyway, nothing from this should be visible to regular kernel
developers.
Other changes:
- Add support for dynamic keys, which removes a source of false
positives in the workqueue code, among other things (Bart Van
Assche)
- Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)
- qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)
- misc other updates and enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/lockdep: Shrink struct lock_class_key
locking/lockdep: Add module_param to enable consistency checks
lockdep/lib/tests: Test dynamic key registration
lockdep/lib/tests: Fix run_tests.sh
kernel/workqueue: Use dynamic lockdep keys for workqueues
locking/lockdep: Add support for dynamic keys
locking/lockdep: Verify whether lock objects are small enough to be used as class keys
locking/lockdep: Check data structure consistency
locking/lockdep: Reuse lock chains that have been freed
locking/lockdep: Fix a comment in add_chain_cache()
locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
locking/lockdep: Reuse list entries that are no longer in use
locking/lockdep: Free lock classes that are no longer in use
locking/lockdep: Update two outdated comments
locking/lockdep: Make it easy to detect whether or not inside a selftest
locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
locking/lockdep: Initialize the locks_before and locks_after lists earlier
locking/lockdep: Make zap_class() remove all matching lock order entries
locking/lockdep: Reorder struct lock_class members
locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
...
Linus Torvalds [Wed, 6 Mar 2019 15:13:56 +0000 (07:13 -0800)]
Merge branch 'efi-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Use 32-bit alignment for efi_guid_t
- Allow the SetVirtualAddressMap() call to be omitted
- Implement earlycon=efifb based on existing earlyprintk code
- Various minor fixes and code cleanups from Sai, Ard and me"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Fix build error due to enum collision between efi.h and ima.h
efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation
x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol
efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
efi: Replace GPL license boilerplate with SPDX headers
efi/fdt: Apply more cleanups
efi: Use 32-bit alignment for efi_guid_t
efi/memattr: Don't bail on zero VA if it equals the region's PA
x86/efi: Mark can_free_region() as an __init function
Arnd Bergmann [Thu, 28 Feb 2019 14:22:53 +0000 (15:22 +0100)]
ipc: Fix building compat mode without sysvipc
As John Stultz noticed, my y2038 syscall series caused a link
failure when CONFIG_SYSVIPC is disabled but CONFIG_COMPAT is
enabled:
arch/arm64/kernel/sys32.o:(.rodata+0x960): undefined reference to `__arm64_compat_sys_old_semctl'
arch/arm64/kernel/sys32.o:(.rodata+0x980): undefined reference to `__arm64_compat_sys_old_msgctl'
arch/arm64/kernel/sys32.o:(.rodata+0x9a0): undefined reference to `__arm64_compat_sys_old_shmctl'
Add the missing entries in kernel/sys_ni.c for the new system
calls.
Cc: Laura Abbott <labbott@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Souptick Joarder [Tue, 5 Mar 2019 23:50:45 +0000 (15:50 -0800)]
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
Remove duplicate header which is included twice.
Link: http://lkml.kernel.org/r/20190304182719.GA6606@jordon-HP-15-Notebook-PC
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:50:42 +0000 (15:50 -0800)]
proc: more robust bulk read test
/proc may not be mounted and test will exit successfully.
Ensure proc is mounted at /proc.
Link: http://lkml.kernel.org/r/20190209105613.GA10384@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:50:39 +0000 (15:50 -0800)]
proc: test /proc/*/maps, smaps, smaps_rollup, statm
Start testing VM related fiels found in per-process files.
Do it by jiting small executable which brings its address space to
precisely known state, then comparing /proc/*/maps, smaps, smaps_rollup,
and statm files to expected values.
Currently only x86_64 is supported.
[adobriyan@gmail.com: exit correctly in /proc/*/maps test]
Link: http://lkml.kernel.org/r/20190206073659.GB15311@avx2
Link: http://lkml.kernel.org/r/20190203165806.GA14568@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:50:35 +0000 (15:50 -0800)]
proc: use seq_puts() everywhere
seq_printf() without format specifiers == faster seq_puts()
Link: http://lkml.kernel.org/r/20190114200545.GC9680@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:50:32 +0000 (15:50 -0800)]
proc: read kernel cpu stat pointer once
Help gcc generate better code:
$ ./scripts/bloat-o-meter ../vmlinux-000 ../vmlinux-001
add/remove: 2/2 grow/shrink: 0/1 up/down: 92/-142 (-50)
Function old new delta
get_iowait_time.isra - 46 +46
get_idle_time.isra - 46 +46
show_stat 1489 1477 -12
get_iowait_time 65 - -65
get_idle_time 65 - -65
Link: http://lkml.kernel.org/r/20190114195907.GA9680@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhikang Zhang [Tue, 5 Mar 2019 23:50:29 +0000 (15:50 -0800)]
proc: remove unused argument in proc_pid_lookup()
[adobriyan@gmail.com: delete "extern" from prototype]
Link: http://lkml.kernel.org/r/20190114195635.GA9372@avx2
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Tue, 5 Mar 2019 23:50:25 +0000 (15:50 -0800)]
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
Remove unnecessary ERR_PTR()/PTR_ERR() cast in proc_setup_thread_self().
Link: http://lkml.kernel.org/r/20190124030150.8472-2-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chengguang Xu [Tue, 5 Mar 2019 23:50:22 +0000 (15:50 -0800)]
fs/proc/self.c: code cleanup for proc_setup_self()
Remove unnecessary ERR_PTR()/PTR_ERR() cast in proc_setup_self().
Link: http://lkml.kernel.org/r/20190124030150.8472-1-cgxu519@gmx.com
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:50:18 +0000 (15:50 -0800)]
proc: return exit code 4 for skipped tests
Test harness uses 4 for SKIP, not 2.
Link: http://lkml.kernel.org/r/20190108193108.GA12259@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 5 Mar 2019 23:50:14 +0000 (15:50 -0800)]
mm,mremap: bail out earlier in mremap_to under map pressure
When using mremap() syscall in addition to MREMAP_FIXED flag, mremap()
calls mremap_to() which does the following:
1) unmaps the destination region where we are going to move the map
2) If the new region is going to be smaller, we unmap the last part
of the old region
Then, we will eventually call move_vma() to do the actual move.
move_vma() checks whether we are at least 4 maps below max_map_count
before going further, otherwise it bails out with -ENOMEM. The problem
is that we might have already unmapped the vma's in steps 1) and 2), so
it is not possible for userspace to figure out the state of the vmas
after it gets -ENOMEM, and it gets tricky for userspace to clean up
properly on error path.
While it is true that we can return -ENOMEM for more reasons (e.g: see
may_expand_vm() or move_page_tables()), I think that we can avoid this
scenario if we check early in mremap_to() if the operation has high
chances to succeed map-wise.
Should that not be the case, we can bail out before we even try to unmap
anything, so we make sure the vma's are left untouched in case we are
likely to be short of maps.
The thumb-rule now is to rely on the worst-scenario case we can have.
That is when both vma's (old region and new region) are going to be
split in 3, so we get two more maps to the ones we already hold (one per
each). If current map count + 2 maps still leads us to 4 maps below the
threshold, we are going to pass the check in move_vma().
Of course, this is not free, as it might generate false positives when
it is true that we are tight map-wise, but the unmap operation can
release several vma's leading us to a good state.
Another approach was also investigated [1], but it may be too much
hassle for what it brings.
[1] https://lore.kernel.org/lkml/
20190219155320.tkfkwvqk53tfdojt@d104.suse.de/
Link: http://lkml.kernel.org/r/20190226091314.18446-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:50:11 +0000 (15:50 -0800)]
mm/sparse: fix a bad comparison
next_present_section_nr() could only return an unsigned number -1, so
just check it specifically where compilers will convert -1 to unsigned
if needed.
mm/sparse.c: In function 'sparse_init_nid':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:478:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:497:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin, pnum) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/sparse.c: In function 'sparse_init':
mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
((section_nr >= 0) && \
^~
mm/sparse.c:520:2: note: in expansion of macro
'for_each_present_section_nr'
for_each_present_section_nr(pnum_begin + 1, pnum_end) {
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
Fixes: c4e1be9ec113 ("mm, sparsemem: break out of loops early")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Stancek [Tue, 5 Mar 2019 23:50:08 +0000 (15:50 -0800)]
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:
CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51
Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
Krnl PSW :
0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8)
Call Trace:
([<
0000000000000000>] (null))
[<
00000000001adae4>] lock_acquire+0xec/0x258
[<
000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98
[<
000000000012a780>] page_table_free+0x48/0x1a8
[<
00000000002f6e54>] do_fault+0xdc/0x670
[<
00000000002fadae>] __handle_mm_fault+0x416/0x5f0
[<
00000000002fb138>] handle_mm_fault+0x1b0/0x320
[<
00000000001248cc>] do_dat_exception+0x19c/0x2c8
[<
000000000080e5ee>] pgm_check_handler+0x19e/0x200
page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire. This crash is
reproducible at least since 4.14.
Problem is that "vmf->vma" used in do_fault() can become stale. Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:
handle_mm_fault |
__handle_mm_fault |
do_fault |
vma = vmf->vma |
do_read_fault |
__do_fault |
vma->vm_ops->fault(vmf); |
mmap_sem is released |
|
| do_munmap()
| remove_vma_list()
| remove_vma()
| vm_area_free()
| # vma is released
| ...
| # same vma is allocated
| # from kmem cache
| do_mmap()
| vm_area_alloc()
| memset(vma, 0, ...)
|
pte_free(vma->vm_mm, ...); |
page_table_free |
spin_lock_bh(&mm->context.lock);|
<crash> |
Cache mm_struct to avoid using potentially stale "vma".
[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c
Link: http://lkml.kernel.org/r/5b3fdf19e2a5be460a384b936f5b56e13733f1b8.1551595137.git.jstancek@redhat.com
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Tue, 5 Mar 2019 23:50:03 +0000 (15:50 -0800)]
writeback: fix inode cgroup switching comment
Commit
682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb
transaction and use it for stat updates") refers to
inode_switch_wb_work_fn() which never got merged.
Switch the comments to inode_switch_wbs_work_fn().
Link: http://lkml.kernel.org/r/20190305004617.142590-1-gthelen@google.com
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:50:00 +0000 (15:50 -0800)]
mm/huge_memory.c: fix "orig_pud" set but not used
Commit
a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent
hugepages") introduced pudp_huge_get_and_clear_full() but no one uses
its return code.
In order to not diverge from pmdp_huge_get_and_clear_full(), just change
zap_huge_pud() to not assign the return value from
pudp_huge_get_and_clear_full().
mm/huge_memory.c: In function 'zap_huge_pud':
mm/huge_memory.c:1982:8: warning: variable 'orig_pud' set but not used [-Wunused-but-set-variable]
pud_t orig_pud;
^~~~~~~~
Link: http://lkml.kernel.org/r/20190301221956.97493-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:49:57 +0000 (15:49 -0800)]
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
When onlining a memory block with DEBUG_PAGEALLOC, it unmaps the pages
in the block from kernel, However, it does not map those pages while
offlining at the beginning. As the result, it triggers a panic below
while onlining on ppc64le as it checks if the pages are mapped before
unmapping. However, the imbalance exists for all arches where
double-unmappings could happen. Therefore, let kernel map those pages
in generic_online_page() before they have being freed into the page
allocator for the first time where it will set the page count to one.
On the other hand, it works fine during the boot, because at least for
IBM POWER8, it does,
early_setup
early_init_mmu
harsh__early_init_mmu
htab_initialize [1]
htab_bolt_mapping [2]
where it effectively map all memblock regions just like
kernel_map_linear_page(), so later mem_init() -> memblock_free_all()
will unmap them just fine without any imbalance. On other arches
without this imbalance checking, it still unmap them once at the most.
[1]
for_each_memblock(memory, reg) {
base = (unsigned long)__va(reg->base);
size = reg->size;
DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
base, size, prot);
BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
prot, mmu_linear_psize, mmu_kernel_ssize));
}
[2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
kernel BUG at arch/powerpc/mm/hash_utils_64.c:1815!
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA pSeries
CPU: 2 PID: 4298 Comm: bash Not tainted 5.0.0-rc7+ #15
NIP:
c000000000062670 LR:
c00000000006265c CTR:
0000000000000000
REGS:
c0000005bf8a75b0 TRAP: 0700 Not tainted (5.0.0-rc7+)
MSR:
800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
28422842
XER:
00000000
CFAR:
c000000000804f44 IRQMASK: 1
NIP [
c000000000062670] __kernel_map_pages+0x2e0/0x4f0
LR [
c00000000006265c] __kernel_map_pages+0x2cc/0x4f0
Call Trace:
__kernel_map_pages+0x2cc/0x4f0
free_unref_page_prepare+0x2f0/0x4d0
free_unref_page+0x44/0x90
__online_page_free+0x84/0x110
online_pages_range+0xc0/0x150
walk_system_ram_range+0xc8/0x120
online_pages+0x280/0x5a0
memory_subsys_online+0x1b4/0x270
device_online+0xc0/0xf0
state_store+0xc0/0x180
dev_attr_store+0x3c/0x60
sysfs_kf_write+0x70/0xb0
kernfs_fop_write+0x10c/0x250
__vfs_write+0x48/0x240
vfs_write+0xd8/0x210
ksys_write+0x70/0x120
system_call+0x5c/0x70
Link: http://lkml.kernel.org/r/20190301220814.97339-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:49:53 +0000 (15:49 -0800)]
mm/memcontrol.c: fix bad line in comment
Commit
230671533d64 ("mm: memory.low hierarchical behavior") missed an
asterisk in one of the comments.
mm/memcontrol.c:5774: warning: bad line: | 0, otherwise.
Link: http://lkml.kernel.org/r/20190301143734.94393-1-cai@lca.pw
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peng Fan [Tue, 5 Mar 2019 23:49:50 +0000 (15:49 -0800)]
mm/cma.c: cma_declare_contiguous: correct err handling
In case cma_init_reserved_mem failed, need to free the memblock
allocated by memblock_reserve or memblock_alloc_range.
Quote Catalin's comments:
https://lkml.org/lkml/2019/2/26/482
Kmemleak is supposed to work with the memblock_{alloc,free} pair and it
ignores the memblock_reserve() as a memblock_alloc() implementation
detail. It is, however, tolerant to memblock_free() being called on
a sub-range or just a different range from a previous memblock_alloc().
So the original patch looks fine to me. FWIW:
Link: http://lkml.kernel.org/r/20190227144631.16708-1-peng.fan@nxp.com
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 5 Mar 2019 23:49:46 +0000 (15:49 -0800)]
mm/page_ext.c: fix an imbalance with kmemleak
After offlining a memory block, kmemleak scan will trigger a crash, as
it encounters a page ext address that has already been freed during
memory offlining. At the beginning in alloc_page_ext(), it calls
kmemleak_alloc(), but it does not call kmemleak_free() in
free_page_ext().
BUG: unable to handle kernel paging request at
ffff888453d00000
PGD
128a01067 P4D
128a01067 PUD
128a04067 PMD
47e09e067 PTE
800ffffbac2ff060
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
CPU: 1 PID: 1594 Comm: bash Not tainted 5.0.0-rc8+ #15
Hardware name: HP ProLiant DL180 Gen9/ProLiant DL180 Gen9, BIOS U20 10/25/2017
RIP: 0010:scan_block+0xb5/0x290
Code: 85 6e 01 00 00 48 b8 00 00 30 f5 81 88 ff ff 48 39 c3 0f 84 5b 01 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 0f 85 87 01 00 00 <4c> 8b 3b e8 f3 0c fa ff 4c 39 3d 0c 6b 4c 01 0f 87 08 01 00 00 4c
RSP: 0018:
ffff8881ec57f8e0 EFLAGS:
00010082
RAX:
0000000000000000 RBX:
ffff888453d00000 RCX:
ffffffffa61e5a54
RDX:
0000000000000000 RSI:
0000000000000008 RDI:
ffff888453d00000
RBP:
ffff8881ec57f920 R08:
fffffbfff4ed588d R09:
fffffbfff4ed588c
R10:
fffffbfff4ed588c R11:
ffffffffa76ac463 R12:
dffffc0000000000
R13:
ffff888453d00ff9 R14:
ffff8881f80cef48 R15:
ffff8881f80cef48
FS:
00007f6c0e3f8740(0000) GS:
ffff8881f7680000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff888453d00000 CR3:
00000001c4244003 CR4:
00000000001606a0
Call Trace:
scan_gray_list+0x269/0x430
kmemleak_scan+0x5a8/0x10f0
kmemleak_write+0x541/0x6ca
full_proxy_write+0xf8/0x190
__vfs_write+0xeb/0x980
vfs_write+0x15a/0x4f0
ksys_write+0xd2/0x1b0
__x64_sys_write+0x73/0xb0
do_syscall_64+0xeb/0xaaa
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f6c0dad73b8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:
00007ffd5b863cb8 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000005 RCX:
00007f6c0dad73b8
RDX:
0000000000000005 RSI:
000055a9216e1710 RDI:
0000000000000001
RBP:
000055a9216e1710 R08:
000000000000000a R09:
00007ffd5b863840
R10:
000000000000000a R11:
0000000000000246 R12:
00007f6c0dda9780
R13:
0000000000000005 R14:
00007f6c0dda4740 R15:
0000000000000005
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod ahci libahci igb i2c_algo_bit libata i2c_core dm_mirror dm_region_hash dm_log dm_mod efivarfs
CR2:
ffff888453d00000
---[ end trace
ccf646c7456717c5 ]---
Kernel panic - not syncing: Fatal exception
Shutting down cpus with NMI
Kernel Offset: 0x24c00000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190227173147.75650-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 5 Mar 2019 23:49:42 +0000 (15:49 -0800)]
mm/compaction: pass pgdat to too_many_isolated() instead of zone
too_many_isolated() in mm/compaction.c looks only at node state, so it
makes more sense to change argument to pgdat instead of zone.
Link: http://lkml.kernel.org/r/20190228083329.31892-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 5 Mar 2019 23:49:39 +0000 (15:49 -0800)]
mm: remove zone_lru_lock() function, access ->lru_lock directly
We have common pattern to access lru_lock from a page pointer:
zone_lru_lock(page_zone(page))
Which is silly, because it unfolds to this:
&NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]->zone_pgdat->lru_lock
while we can simply do
&NODE_DATA(page_to_nid(page))->lru_lock
Remove zone_lru_lock() function, since it's only complicate things. Use
'page_pgdat(page)->lru_lock' pattern instead.
[aryabinin@virtuozzo.com: a slightly better version of __split_huge_page()]
Link: http://lkml.kernel.org/r/20190301121651.7741-1-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/20190228083329.31892-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Tue, 5 Mar 2019 23:49:35 +0000 (15:49 -0800)]
mm/workingset: remove unused @mapping argument in workingset_eviction()
workingset_eviction() doesn't use and never did use the @mapping
argument. Remove it.
Link: http://lkml.kernel.org/r/20190228083329.31892-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@surriel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo A. R. Silva [Tue, 5 Mar 2019 23:49:31 +0000 (15:49 -0800)]
mm/swapfile.c: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kvzalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);
Notice that, in this case, variable size is not necessary, hence it is
removed.
This code was detected with the help of Coccinelle.
Link: http://lkml.kernel.org/r/20190221154622.GA19599@embeddedor
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yue Hu [Tue, 5 Mar 2019 23:49:27 +0000 (15:49 -0800)]
mm/cma_debug.c: remove static scoped cma_debugfs_root
Currently cma_debugfs_root is static storage. That is unnecessary since
it will be only used by next cma_debugfs_add_one(). We can just pass it
to following calling to save thisspace. Also remove useless idx
parameter.
Link: http://lkml.kernel.org/r/20190221040130.8940-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:49:24 +0000 (15:49 -0800)]
tmpfs: test link accounting with O_TMPFILE
Mount tmpfs with "nr_inodes=3" for easy check.
Link: http://lkml.kernel.org/r/20190219215016.GA20084@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matej Kupljen <matej.kupljen@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 5 Mar 2019 23:49:20 +0000 (15:49 -0800)]
MAINTAINERS: add entry for memblock
Add entry for memblock in MAINTAINERS file
Link: http://lkml.kernel.org/r/20190214093630.GC9063@rapoport-lnx
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yu Zhao [Tue, 5 Mar 2019 23:49:17 +0000 (15:49 -0800)]
mm/shmem: make find_get_pages_range() work for huge page
find_get_pages_range() and find_get_pages_range_tag() already correctly
increment reference count on head when seeing compound page, but they
may still use page index from tail. Page index from tail is always
zero, so these functions don't work on huge shmem. This hasn't been a
problem because, AFAIK, nobody calls these functions on (huge) shmem.
Fix them anyway just in case.
Link: http://lkml.kernel.org/r/20190110030838.84446-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 5 Mar 2019 23:49:13 +0000 (15:49 -0800)]
mm: unexport free_reserved_area
This function is only used by built-in code, which makes perfect sense
given the purpose of it.
Link: http://lkml.kernel.org/r/20190213174621.29297-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Tue, 5 Mar 2019 23:49:10 +0000 (15:49 -0800)]
tools/vm/slabinfo: clean up usage menu debug items
Attempt to make the usage comment for debug options a little cleaner.
Link: http://lkml.kernel.org/r/20190212001219.27769-5-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Tue, 5 Mar 2019 23:49:07 +0000 (15:49 -0800)]
tools/vm/slabinfo: align usage output columns
Usage message uses spaces not tabspaces, a few tabspaces have snuck in
making the columns not align correctly when output.
Align usage output columns using spaces instead of tabspaces.
Link: http://lkml.kernel.org/r/20190212001219.27769-4-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Tue, 5 Mar 2019 23:49:03 +0000 (15:49 -0800)]
tools/vm/slabinfo: put options in alphabetic order
Primarily the usage message lists options in alphabetic order however
there are a bunch of the options that are not in alphabetic order.
Put options in alphabetic order.
Link: http://lkml.kernel.org/r/20190212001219.27769-3-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Tue, 5 Mar 2019 23:48:59 +0000 (15:48 -0800)]
tools/vm/slabinfo: update options in usage message
Currently usage message list only a subset of the available options.
should list them all.
Update options in usage massage to include all available options.
Link: http://lkml.kernel.org/r/20190212001219.27769-2-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yu Zhao [Tue, 5 Mar 2019 23:48:56 +0000 (15:48 -0800)]
include/linux/compaction.h: fix potential build error
Declaration of struct node is required regardless. On UMA systems,
including compaction.h without preceding node.h shouldn't cause a build
error.
Link: http://lkml.kernel.org/r/20190208080437.253322-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oscar Salvador [Tue, 5 Mar 2019 23:48:53 +0000 (15:48 -0800)]
mm,memory_hotplug: explicitly pass the head to isolate_huge_page
isolate_huge_page() expects we pass the head of hugetlb page to it:
bool isolate_huge_page(...)
{
...
VM_BUG_ON_PAGE(!PageHead(page), page);
...
}
While I really cannot think of any situation where we end up with a
non-head page between hands in do_migrate_range(), let us make sure the
code is as sane as possible by explicitly passing the Head. Since we
already got the pointer, it does not take us extra effort.
Link: http://lkml.kernel.org/r/20190208090604.975-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
john.hubbard@gmail.com [Tue, 5 Mar 2019 23:48:49 +0000 (15:48 -0800)]
mm: page_cache_add_speculative(): refactor out some code duplication
From: John Hubbard <jhubbard@nvidia.com>
This combines the common elements of these routines:
page_cache_get_speculative()
page_cache_add_speculative()
This was anticipated by the original author, as shown by the comment in
commit
ce0ad7f095258 ("powerpc/mm: Lockless get_user_pages_fast() for
64-bit (v3)"):
"Same as above, but add instead of inc (could just be merged)"
There is no intention to introduce any behavioral change, but there is a
small risk of that, due to slightly differing ways of expressing the
TINY_RCU and related configurations.
This also removes the VM_BUG_ON(in_interrupt()) that was in
page_cache_add_speculative(), but not in page_cache_get_speculative().
This provides slightly less detection of such bugs, but it given that it
was only there on the "add" path anyway, we can likely do without it
just fine.
And it removes the
VM_BUG_ON_PAGE(PageCompound(page) && page != compound_head(page), page);
that page_cache_add_speculative() had.
Link: http://lkml.kernel.org/r/20190206231016.22734-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Tue, 5 Mar 2019 23:48:46 +0000 (15:48 -0800)]
mm/migrate.c: cleanup expected_page_refs()
Andrea has noted that page migration code propagates page_mapping(page)
through the whole migration stack down to migrate_page() function so it
seems stupid to then use page_mapping(page) in expected_page_refs()
instead of passed down 'mapping' argument. I agree so let's make
expected_page_refs() more in line with the rest of the migration stack.
Link: http://lkml.kernel.org/r/20190207112314.24872-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 5 Mar 2019 23:48:42 +0000 (15:48 -0800)]
docs/core-api/mm: fix return value descriptions in mm/
Many kernel-doc comments in mm/ have the return value descriptions
either misformatted or omitted at all which makes kernel-doc script
unhappy:
$ make V=1 htmldocs
...
./mm/util.c:36: info: Scanning doc for kstrdup
./mm/util.c:41: warning: No description found for return value of 'kstrdup'
./mm/util.c:57: info: Scanning doc for kstrdup_const
./mm/util.c:66: warning: No description found for return value of 'kstrdup_const'
./mm/util.c:75: info: Scanning doc for kstrndup
./mm/util.c:83: warning: No description found for return value of 'kstrndup'
...
Fixing the formatting and adding the missing return value descriptions
eliminates ~100 such warnings.
Link: http://lkml.kernel.org/r/1549549644-4903-4-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 5 Mar 2019 23:48:39 +0000 (15:48 -0800)]
docs/core-api/mm: fix user memory accessors formatting
The descriptions of userspace memory access functions had minor issues
with formatting that made kernel-doc unable to properly detect the
function/macro names and the return value sections:
./arch/x86/include/asm/uaccess.h:80: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:139: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:231: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:505: info: Scanning doc for
./arch/x86/include/asm/uaccess.h:530: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:58: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:69: warning: No description found for return
value of 'clear_user'
./arch/x86/lib/usercopy_32.c:78: info: Scanning doc for
./arch/x86/lib/usercopy_32.c:90: warning: No description found for return
value of '__clear_user'
Fix the formatting.
Link: http://lkml.kernel.org/r/1549549644-4903-3-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 5 Mar 2019 23:48:36 +0000 (15:48 -0800)]
docs/mm: vmalloc: re-indent kernel-doc comemnts
Some kernel-doc comments in mm/vmalloc.c have leading tab in
indentation. This leads to excessive indentation in the generated HTML
and to the inconsistency of its layout ([1] vs [2]).
Besides, multi-line Note: sections are not handled properly with extra
indentation.
[1] https://www.kernel.org/doc/html/v4.20/core-api/mm-api.html?#c.vm_map_ram
[2] https://www.kernel.org/doc/html/v4.20/core-api/mm-api.html?#c.vfree
Link: http://lkml.kernel.org/r/1549549644-4903-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael S. Tsirkin [Tue, 5 Mar 2019 23:48:33 +0000 (15:48 -0800)]
mm/page_poison: update comment after code moved
mm/debug-pagealloc.c is no more, so of course header now needs to be
updated. This seems like something checkpatch should be able to catch -
worth looking into?
Link: http://lkml.kernel.org/r/20190207191113.14039-1-mst@redhat.com
Fixes: 8823b1dbc05f ("mm/page_poison.c: enable PAGE_POISONING as a separate option")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:48:29 +0000 (15:48 -0800)]
numa: make "nr_online_nodes" unsigned int
Number of online NUMA nodes can't be negative as well. This doesn't
save space as the variable is used only in 32-bit context, but do it
anyway for consistency.
Link: http://lkml.kernel.org/r/20190201223151.GB15820@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 5 Mar 2019 23:48:26 +0000 (15:48 -0800)]
numa: make "nr_node_ids" unsigned int
Number of NUMA nodes can't be negative.
This saves a few bytes on x86_64:
add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238)
Function old new delta
hv_synic_alloc.cold 88 110 +22
prealloc_shrinker 260 262 +2
bootstrap 249 251 +2
sched_init_numa 1566 1567 +1
show_slab_objects 778 777 -1
s_show 1201 1200 -1
kmem_cache_init 346 345 -1
__alloc_workqueue_key 1146 1145 -1
mem_cgroup_css_alloc 1614 1612 -2
__do_sys_swapon 4702 4699 -3
__list_lru_init 655 651 -4
nic_probe 2379 2374 -5
store_user_store 118 111 -7
red_zone_store 106 99 -7
poison_store 106 99 -7
wq_numa_init 348 338 -10
__kmem_cache_empty 75 65 -10
task_numa_free 186 173 -13
merge_across_nodes_store 351 336 -15
irq_create_affinity_masks 1261 1246 -15
do_numa_crng_init 343 321 -22
task_numa_fault 4760 4737 -23
swapfile_init 179 156 -23
hv_synic_alloc 536 492 -44
apply_wqattrs_prepare 746 695 -51
Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Tue, 5 Mar 2019 23:48:22 +0000 (15:48 -0800)]
mm,oom: don't kill global init via memory.oom.group
Since setting global init process to some memory cgroup is technically
possible, oom_kill_memcg_member() must check it.
Tasks in /test1 are going to be killed due to memory.oom.group set
Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char *argv[])
{
static char buffer[
10485760];
static int pipe_fd[2] = { EOF, EOF };
unsigned int i;
int fd;
char buf[64] = { };
if (pipe(pipe_fd))
return 1;
if (chdir("/sys/fs/cgroup/"))
return 1;
fd = open("cgroup.subtree_control", O_WRONLY);
write(fd, "+memory", 7);
close(fd);
mkdir("test1", 0755);
fd = open("test1/memory.oom.group", O_WRONLY);
write(fd, "1", 1);
close(fd);
fd = open("test1/cgroup.procs", O_WRONLY);
write(fd, "1", 1);
snprintf(buf, sizeof(buf) - 1, "%d", getpid());
write(fd, buf, strlen(buf));
close(fd);
snprintf(buf, sizeof(buf) - 1, "%lu", sizeof(buffer) * 5);
fd = open("test1/memory.max", O_WRONLY);
write(fd, buf, strlen(buf));
close(fd);
for (i = 0; i < 10; i++)
if (fork() == 0) {
char c;
close(pipe_fd[1]);
read(pipe_fd[0], &c, 1);
memset(buffer, 0, sizeof(buffer));
sleep(3);
_exit(0);
}
close(pipe_fd[0]);
close(pipe_fd[1]);
sleep(3);
return 0;
}
[ 37.052923][ T9185] a.out invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[ 37.056169][ T9185] CPU: 4 PID: 9185 Comm: a.out Kdump: loaded Not tainted 5.0.0-rc4-next-
20190131 #280
[ 37.059205][ T9185] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.062954][ T9185] Call Trace:
[ 37.063976][ T9185] dump_stack+0x67/0x95
[ 37.065263][ T9185] dump_header+0x51/0x570
[ 37.066619][ T9185] ? trace_hardirqs_on+0x3f/0x110
[ 37.068171][ T9185] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.069967][ T9185] oom_kill_process+0x18d/0x210
[ 37.071515][ T9185] out_of_memory+0x11b/0x380
[ 37.072936][ T9185] mem_cgroup_out_of_memory+0xb6/0xd0
[ 37.074601][ T9185] try_charge+0x790/0x820
[ 37.076021][ T9185] mem_cgroup_try_charge+0x42/0x1d0
[ 37.077629][ T9185] mem_cgroup_try_charge_delay+0x11/0x30
[ 37.079370][ T9185] do_anonymous_page+0x105/0x5e0
[ 37.080939][ T9185] __handle_mm_fault+0x9cb/0x1070
[ 37.082485][ T9185] handle_mm_fault+0x1b2/0x3a0
[ 37.083819][ T9185] ? handle_mm_fault+0x47/0x3a0
[ 37.085181][ T9185] __do_page_fault+0x255/0x4c0
[ 37.086529][ T9185] do_page_fault+0x28/0x260
[ 37.087788][ T9185] ? page_fault+0x8/0x30
[ 37.088978][ T9185] page_fault+0x1e/0x30
[ 37.090142][ T9185] RIP: 0033:0x7f8b183aefe0
[ 37.091433][ T9185] Code: 20 f3 44 0f 7f 44 17 d0 f3 44 0f 7f 47 30 f3 44 0f 7f 44 17 c0 48 01 fa 48 83 e2 c0 48 39 d1 74 a3 66 0f 1f 84 00 00 00 00 00 <66> 44 0f 7f 01 66 44 0f 7f 41 10 66 44 0f 7f 41 20 66 44 0f 7f 41
[ 37.096917][ T9185] RSP: 002b:
00007fffc5d329e8 EFLAGS:
00010206
[ 37.098615][ T9185] RAX:
00000000006010e0 RBX:
0000000000000008 RCX:
0000000000c30000
[ 37.100905][ T9185] RDX:
00000000010010c0 RSI:
0000000000000000 RDI:
00000000006010e0
[ 37.103349][ T9185] RBP:
0000000000000000 R08:
00007f8b188f4740 R09:
0000000000000000
[ 37.105797][ T9185] R10:
00007fffc5d32420 R11:
00007f8b183aef40 R12:
0000000000000005
[ 37.108228][ T9185] R13:
0000000000000000 R14:
ffffffffffffffff R15:
0000000000000000
[ 37.110840][ T9185] memory: usage 51200kB, limit 51200kB, failcnt 125
[ 37.113045][ T9185] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.115808][ T9185] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[ 37.117660][ T9185] Memory cgroup stats for /test1: cache:0KB rss:49484KB rss_huge:30720KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:49700KB inactive_file:0KB active_file:0KB unevictable:0KB
[ 37.123371][ T9185] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/test1,task_memcg=/test1,task=a.out,pid=9188,uid=0
[ 37.128158][ T9185] Memory cgroup out of memory: Killed process 9188 (a.out) total-vm:14456kB, anon-rss:10324kB, file-rss:504kB, shmem-rss:0kB
[ 37.132710][ T9185] Tasks in /test1 are going to be killed due to memory.oom.group set
[ 37.132833][ T54] oom_reaper: reaped process 9188 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.135498][ T9185] Memory cgroup out of memory: Killed process 1 (systemd) total-vm:43400kB, anon-rss:1228kB, file-rss:3992kB, shmem-rss:0kB
[ 37.143434][ T9185] Memory cgroup out of memory: Killed process 9182 (a.out) total-vm:14456kB, anon-rss:76kB, file-rss:588kB, shmem-rss:0kB
[ 37.144328][ T54] oom_reaper: reaped process 1 (systemd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.147585][ T9185] Memory cgroup out of memory: Killed process 9183 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157222][ T9185] Memory cgroup out of memory: Killed process 9184 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:508kB, shmem-rss:0kB
[ 37.157259][ T9185] Memory cgroup out of memory: Killed process 9185 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.157291][ T9185] Memory cgroup out of memory: Killed process 9186 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:508kB, shmem-rss:0kB
[ 37.157306][ T54] oom_reaper: reaped process 9183 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.157328][ T9185] Memory cgroup out of memory: Killed process 9187 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.157452][ T9185] Memory cgroup out of memory: Killed process 9189 (a.out) total-vm:14456kB, anon-rss:6228kB, file-rss:512kB, shmem-rss:0kB
[ 37.158733][ T9185] Memory cgroup out of memory: Killed process 9190 (a.out) total-vm:14456kB, anon-rss:552kB, file-rss:512kB, shmem-rss:0kB
[ 37.160083][ T54] oom_reaper: reaped process 9186 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.160187][ T54] oom_reaper: reaped process 9189 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.206941][ T54] oom_reaper: reaped process 9185 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.212300][ T9185] Memory cgroup out of memory: Killed process 9191 (a.out) total-vm:14456kB, anon-rss:4180kB, file-rss:512kB, shmem-rss:0kB
[ 37.212317][ T54] oom_reaper: reaped process 9190 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.218860][ T9185] Memory cgroup out of memory: Killed process 9192 (a.out) total-vm:14456kB, anon-rss:1080kB, file-rss:512kB, shmem-rss:0kB
[ 37.227667][ T54] oom_reaper: reaped process 9192 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 37.292323][ T9193] abrt-hook-ccpp (9193) used greatest stack depth: 10480 bytes left
[ 37.351843][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[ 37.354833][ T1] CPU: 7 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.0.0-rc4-next-
20190131 #280
[ 37.357876][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 37.361685][ T1] Call Trace:
[ 37.363239][ T1] dump_stack+0x67/0x95
[ 37.365010][ T1] panic+0xfc/0x2b0
[ 37.366853][ T1] do_exit+0xd55/0xd60
[ 37.368595][ T1] do_group_exit+0x47/0xc0
[ 37.370415][ T1] get_signal+0x32a/0x920
[ 37.372449][ T1] ? _raw_spin_unlock_irqrestore+0x3d/0x70
[ 37.374596][ T1] do_signal+0x32/0x6e0
[ 37.376430][ T1] ? exit_to_usermode_loop+0x26/0x9b
[ 37.378418][ T1] ? prepare_exit_to_usermode+0xa8/0xd0
[ 37.380571][ T1] exit_to_usermode_loop+0x3e/0x9b
[ 37.382588][ T1] prepare_exit_to_usermode+0xa8/0xd0
[ 37.384594][ T1] ? page_fault+0x8/0x30
[ 37.386453][ T1] retint_user+0x8/0x18
[ 37.388160][ T1] RIP: 0033:0x7f42c06974a8
[ 37.389922][ T1] Code: Bad RIP value.
[ 37.391788][ T1] RSP: 002b:
00007ffc3effd388 EFLAGS:
00010213
[ 37.394075][ T1] RAX:
000000000000000e RBX:
00007ffc3effd390 RCX:
0000000000000000
[ 37.396963][ T1] RDX:
000000000000002a RSI:
00007ffc3effd390 RDI:
0000000000000004
[ 37.399550][ T1] RBP:
00007ffc3effd680 R08:
0000000000000000 R09:
0000000000000000
[ 37.402334][ T1] R10:
00000000ffffffff R11:
0000000000000246 R12:
0000000000000001
[ 37.404890][ T1] R13:
ffffffffffffffff R14:
0000000000000884 R15:
000056460b1ac3b0
Link: http://lkml.kernel.org/r/201902010336.x113a4EO027170@www262.sakura.ne.jp
Fixes: 3d8b38eb81cac813 ("mm, oom: introduce memory.oom.group")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Jordan [Tue, 5 Mar 2019 23:48:19 +0000 (15:48 -0800)]
mm, swap: bounds check swap_info array accesses to avoid NULL derefs
Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:
Smatch complains that the NULL checks on "si" aren't consistent. This
seems like a real bug because we have not ensured that the type is
valid and so "si" can be NULL.
Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:
CPU0 CPU1
alloc_swap_info() if (type >= nr_swapfiles)
swap_info[type] = p /* handle invalid entry */
smp_wmb() smp_rmb()
++nr_swapfiles p = swap_info[type]
Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].
Ying Huang noticed other places in swapfile.c don't order these reads
properly. Introduce swap_type_to_swap_info to encourage correct usage.
Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).
This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.
Link: http://lkml.kernel.org/r/20190131024410.29859-1-daniel.m.jordan@oracle.com
Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill Tkhai [Tue, 5 Mar 2019 23:48:15 +0000 (15:48 -0800)]
mm/vmscan.c: do not allocate duplicate stack variables in shrink_page_list()
On path shrink_inactive_list() ---> shrink_page_list() we allocate stack
variables for the statistics twice. This is completely useless, and
this just consumes stack much more, then we really need.
The patch kills duplicate stack variables from shrink_page_list(), and
this reduce stack usage and object file size significantly:
Stack usage:
Before: vmscan.c:1122:22:shrink_page_list 648 static
After: vmscan.c:1122:22:shrink_page_list 616 static
Size of vmscan.o:
text data bss dec hex filename
Before: 56866 4720 128 61714 f112 mm/vmscan.o
After: 56770 4720 128 61618 f0b2 mm/vmscan.o
Link: http://lkml.kernel.org/r/154894900030.5211.12104993874109647641.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 5 Mar 2019 23:48:12 +0000 (15:48 -0800)]
mm: ksm: do not block on page lock when searching stable tree
ksmd needs to search the stable tree to look for the suitable KSM page,
but the KSM page might be locked for a while due to i.e. KSM page rmap
walk. Basically it is not a big deal since commit
2c653d0ee2ae ("ksm:
introduce ksm_max_page_sharing per page deduplication limit"), since
max_page_sharing limits the number of shared KSM pages.
But it still sounds not worth waiting for the lock, the page can be
skip, then try to merge it in the next scan to avoid potential stall if
its content is still intact.
Introduce trylock mode to get_ksm_page() to not block on page lock, like
what try_to_merge_one_page() does. And, define three possible
operations (nolock, lock and trylock) as enum type to avoid stacking up
bools and make the code more readable.
Return -EBUSY if trylock fails, since NULL means not find suitable KSM
page, which is a valid case.
With the default max_page_sharing setting (256), there is almost no
observed change comparing lock vs trylock.
However, with ksm02 of LTP, the reduced ksmd full scan time can be
observed, which has set max_page_sharing to 786432. With lock version,
ksmd may tak 10s - 11s to run two full scans, with trylock version ksmd
may take 8s - 11s to run two full scans. And, the number of
pages_sharing and pages_to_scan keep same. Basically, this change has
no harm.
[hughd@google.com: fix BUG_ON()]
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182122280.6914@eggly.anvils
Link: http://lkml.kernel.org/r/1548793753-62377-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Down [Tue, 5 Mar 2019 23:48:09 +0000 (15:48 -0800)]
mm: memcontrol: expose THP events on a per-memcg basis
Currently THP allocation events data is fairly opaque, since you can
only get it system-wide. This patch makes it easier to reason about
transparent hugepage behaviour on a per-memcg basis.
For anonymous THP-backed pages, we already have MEMCG_RSS_HUGE in v1,
which is used for v1's rss_huge [sic]. This is reused here as it's
fairly involved to untangle NR_ANON_THPS right now to make it per-memcg,
since right now some of this is delegated to rmap before we have any
memcg actually assigned to the page. It's a good idea to rework that,
but let's leave untangling THP allocation for a future patch.
[akpm@linux-foundation.org: fix build]
[chris@chrisdown.name: fix memcontrol build when THP is disabled]
Link: http://lkml.kernel.org/r/20190131160802.GA5777@chrisdown.name
Link: http://lkml.kernel.org/r/20190129205852.GA7310@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 5 Mar 2019 23:48:05 +0000 (15:48 -0800)]
mm: vmscan: do not iterate all mem cgroups for global direct reclaim
In current implementation, both kswapd and direct reclaim has to iterate
all mem cgroups. It is not a problem before offline mem cgroups could
be iterated. But, currently with iterating offline mem cgroups, it
could be very time consuming. In our workloads, we saw over 400K mem
cgroups accumulated in some cases, only a few hundred are online memcgs.
Although kswapd could help out to reduce the number of memcgs, direct
reclaim still get hit with iterating a number of offline memcgs in some
cases. We experienced the responsiveness problems due to this
occassionally.
A simple test with pref shows it may take around 220ms to iterate 8K
memcgs in direct reclaim:
dd 13873 [011] 578.542919: vmscan:mm_vmscan_direct_reclaim_begin
dd 13873 [011] 578.758689: vmscan:mm_vmscan_direct_reclaim_end
So for 400K, it may take around 11 seconds to iterate all memcgs.
Here just break the iteration once it reclaims enough pages as what
memcg direct reclaim does. This may hurt the fairness among memcgs.
But the cached iterator cookie could help to achieve the fairness more
or less.
Link: http://lkml.kernel.org/r/1548799877-10949-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Tue, 5 Mar 2019 23:48:02 +0000 (15:48 -0800)]
mm: swap: use mem_cgroup_is_root() instead of deferencing css->parent
mem_cgroup_is_root() is the preferred API to check if memcg is root or
not. Use it instead of deferencing css->parent.
Link: http://lkml.kernel.org/r/1547232913-118148-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joel Fernandes (Google) [Tue, 5 Mar 2019 23:47:58 +0000 (15:47 -0800)]
selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal
Add tests to verify sealing memfds with the F_SEAL_FUTURE_WRITE works as
expected.
Link: http://lkml.kernel.org/r/20190112203816.85534-3-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Shuah Khan <shuah@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joel Fernandes (Google) [Tue, 5 Mar 2019 23:47:54 +0000 (15:47 -0800)]
mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging
drivers are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer. See CursorWindow
documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.
A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
where we don't need to modify core VFS structures to get the same
behavior of the seal. This solves several side-effects pointed by Andy.
self-tests are provided in later patch to verify the expected semantics.
[1] https://lore.kernel.org/lkml/
20181111173650.GA256781@google.com/
Thanks a lot to Andy for suggestions to improve code.
Link: http://lkml.kernel.org/r/20190112203816.85534-2-joel@joelfernandes.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 5 Mar 2019 23:47:51 +0000 (15:47 -0800)]
powerpc/mm/iommu: allow large IOMMU page size only for hugetlb backing
THP pages can get split during different code paths. An incremented
reference count does imply we will not split the compound page. But the
pmd entry can be converted to level 4 pte entries. Keep the code
simpler by allowing large IOMMU page size only if the guest ram is
backed by hugetlb pages.
Link: http://lkml.kernel.org/r/20190114095438.32470-6-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aneesh Kumar K.V [Tue, 5 Mar 2019 23:47:47 +0000 (15:47 -0800)]
powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_alloc
The current code doesn't do page migration if the page allocated is a
compound page. With HugeTLB migration support, we can end up allocating
hugetlb pages from CMA region. Also, THP pages can be allocated from
CMA region. This patch updates the code to handle compound pages
correctly. The patch also switches to a single get_user_pages with the
right count, instead of doing one get_user_pages per page. That avoids
reading page table multiple times. This is done by using
get_user_pages_longterm, because that also takes care of DAX backed
pages.
DAX pages lifetime is dictated by file system rules and as such, we need
to make sure that we free these pages on operations like truncate and
punch hole. If we have long term pin on these pages, which are mostly
return to userspace with elevated page count, the entity holding the
long term pin may not be aware of the fact that file got truncated and
the file system blocks possibly got reused. That can result in
corruption.
The patch also converts the hpas member of mm_iommu_table_group_mem_t to
a union. We use the same storage location to store pointers to struct
page. We cannot update all the code path use struct page *, because we
access hpas in real mode and we can't do that struct page * to pfn
conversion in real mode.
[aneesh.kumar@linux.ibm.com: address review feedback, update changelog]
Link: http://lkml.kernel.org/r/20190227144736.5872-4-aneesh.kumar@linux.ibm.com
Link: http://lkml.kernel.org/r/20190114095438.32470-5-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>