Rafael J. Wysocki [Mon, 4 Jun 2018 08:41:07 +0000 (10:41 +0200)]
Merge branches 'pm-cpufreq-sched' and 'pm-cpuidle'
* pm-cpufreq-sched:
cpufreq: schedutil: Avoid missing updates for one-CPU policies
schedutil: Allow cpufreq requests to be made even when kthread kicked
cpufreq: Rename cpufreq_can_do_remote_dvfs()
cpufreq: schedutil: Cleanup and document iowait boost
cpufreq: schedutil: Fix iowait boost reset
cpufreq: schedutil: Don't set next_freq to UINT_MAX
Revert "cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily"
* pm-cpuidle:
cpuidle: governors: Consolidate PM QoS handling
cpuidle: governors: Drop redundant checks related to PM QoS
Rafael J. Wysocki [Mon, 4 Jun 2018 08:40:57 +0000 (10:40 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq: (25 commits)
dt-bindings: cpufreq: Document operating-points-v2-kryo-cpu
cpufreq: Add Kryo CPU scaling driver
cpufreq: Use static SRCU initializer
kernel/SRCU: provide a static initializer
cpufreq: Fix new policy initialization during limits updates via sysfs
cpufreq: tegra20: Wrap cpufreq into platform driver
cpufreq: tegra20: Allow cpufreq driver to be built as loadable module
cpufreq: tegra20: Check if this is Tegra20 machine
cpufreq: tegra20: Remove unneeded variable initialization
cpufreq: tegra20: Remove unnecessary parentheses
cpufreq: tegra20: Remove unneeded check in tegra_cpu_init
cpufreq: tegra20: Release clocks properly
cpufreq: tegra20: Remove EMC clock usage
cpufreq: tegra20: Clean up included headers
cpufreq: tegra20: Clean up whitespaces in the code
cpufreq: tegra20: Change module description
Revert "cpufreq: rcar: Add support for R8A7795 SoC"
Revert "cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver"
cpufreq: intel_pstate: allow trace in passive mode
cpufreq: optimize cpufreq_notify_transition()
...
Rafael J. Wysocki [Mon, 4 Jun 2018 08:40:41 +0000 (10:40 +0200)]
Merge branch 'pm-opp'
* pm-opp: (24 commits)
PM / Domains: Drop unused parameter in genpd_allocate_dev_data()
PM / Domains: Drop genpd as in-param for pm_genpd_remove_device()
PM / Domains: Drop __pm_genpd_add_device()
PM / Domains: Drop extern declarations of functions in pm_domain.h
PM / domains: Add perf_state attribute to genpd debugfs
OPP: Allow same OPP table to be used for multiple genpd
PM / Domain: Return 0 on error from of_genpd_opp_to_performance_state()
PM / OPP: Fix shared OPP table support in dev_pm_opp_register_set_opp_helper()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_regulators()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_prop_name()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_supported_hw()
PM / OPP: silence an uninitialized variable warning
PM / OPP: Remove dev_pm_opp_{un}register_get_pstate_helper()
PM / OPP: Get performance state using genpd helper
PM / Domain: Implement of_genpd_opp_to_performance_state()
PM / Domain: Add support to parse domain's OPP table
PM / Domain: Add struct device to genpd
PM / OPP: Implement dev_pm_opp_get_of_node()
PM / OPP: Implement of_dev_pm_opp_find_required_opp()
PM / OPP: Implement dev_pm_opp_of_add_table_indexed()
...
Rafael J. Wysocki [Mon, 4 Jun 2018 08:40:33 +0000 (10:40 +0200)]
Merge branch 'pm-domains'
* pm-domains:
PM / domains: Improve wording of dev_pm_domain_attach() comment
PM / Domains: Don't return -EEXIST at attach when PM domain exists
spi: Respect all error codes from dev_pm_domain_attach()
soundwire: Respect all error codes from dev_pm_domain_attach()
mmc: sdio: Respect all error codes from dev_pm_domain_attach()
i2c: Respect all error codes from dev_pm_domain_attach()
driver core: Respect all error codes from dev_pm_domain_attach()
amba: Respect all error codes from dev_pm_domain_attach()
PM / Domains: Allow a better error handling of dev_pm_domain_attach()
PM / Domains: Check for existing PM domain in dev_pm_domain_attach()
PM / Domains: Drop redundant code in genpd while attaching devices
PM / Domains: Drop comment in genpd about legacy Samsung DT binding
PM / Domains: Fix error path during attach in genpd
Rafael J. Wysocki [Mon, 4 Jun 2018 08:40:20 +0000 (10:40 +0200)]
Merge branches 'pm-qos' and 'pm-core'
* pm-qos:
PM / QoS: Drop redundant declaration of pm_qos_get_value()
* pm-core:
PM / runtime: Drop usage count for suppliers at device link removal
PM / runtime: Fixup reference counting of device link suppliers at probe
PM: wakeup: Use pr_debug() for the "aborting suspend" message
PM / core: Drop unused internal inline functions for sysfs
PM / core: Drop unused internal functions for pm_qos sysfs
PM / core: Drop unused internal inline functions for wakeirqs
PM / core: Drop internal unused inline functions for wakeups
PM / wakeup: Only update last time for active wakeup sources
PM / wakeup: Use seq_open() to show wakeup stats
PM / core: Use dev_printk() and symbols in suspend/resume diagnostics
PM / core: Simplify initcall_debug_report() timing
PM / core: Remove unused initcall_debug_report() arguments
PM / core: fix deferred probe breaking suspend resume order
Rafael J. Wysocki [Sun, 3 Jun 2018 08:12:30 +0000 (10:12 +0200)]
Merge back earlier PM tools material for v4.18.
Rafael J. Wysocki [Sun, 3 Jun 2018 08:11:50 +0000 (10:11 +0200)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat utility updates for v4.18 from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (65 commits)
tools/power turbostat: update version number
tools/power turbostat: Add Node in output
tools/power turbostat: add node information into turbostat calculations
tools/power turbostat: remove num_ from cpu_topology struct
tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
tools/power turbostat: track thread ID in cpu_topology
tools/power turbostat: Calculate additional node information for a package
tools/power turbostat: Fix node and siblings lookup data
tools/power turbostat: set max_num_cpus equal to the cpumask length
tools/power turbostat: if --num_iterations, print for specific number of iterations
tools/power turbostat: Add Cannon Lake support
tools/power turbostat: delete duplicate #defines
x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
tools/power turbostat: add POLL and POLL% column
tools/power turbostat: Fix --hide Pk%pc10
tools/power turbostat: Build-in "Low Power Idle" counters support
tools/power turbostat: Don't make man pages executable
tools/power turbostat: remove blank lines
tools/power turbostat: a small C-states dump readability immprovement
...
Len Brown [Mon, 29 Jan 2018 02:54:28 +0000 (21:54 -0500)]
tools/power turbostat: update version number
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:35 +0000 (10:04 -0400)]
tools/power turbostat: Add Node in output
Output a Node column if there is more than one node/socket.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:34 +0000 (10:04 -0400)]
tools/power turbostat: add node information into turbostat calculations
The previous patches have added node information to turbostat, but the
counters code does not take it into account.
Add node information from cpu_topology calculations to turbostat
counters.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:33 +0000 (10:04 -0400)]
tools/power turbostat: remove num_ from cpu_topology struct
Cleanup, remove num_ from num_nodes_per_pkg, num_cores_per_node, and
num_threads_per_node.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:32 +0000 (10:04 -0400)]
tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node
turbostat incorrectly assumes that there is one node per package. As a
result num_cores_per_pkg is not correctly named and is actually
num_cores_per_node.
Rename num_cores_per_pkg to num_cores_per_node.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:31 +0000 (10:04 -0400)]
tools/power turbostat: track thread ID in cpu_topology
The code can be simplified if the cpu_topology *cpus tracks the thread
IDs. This removes an additional file lookup and simplifies the counter
initialization code.
Add thread ID to cpu_topology information and cleanup the counter
initialization code.
v2: prevent thread_id from being overwritten
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:30 +0000 (10:04 -0400)]
tools/power turbostat: Calculate additional node information for a package
The code currently assumes each package has exactly one node. This is not
the case for AMD systems and Intel systems with COD. AMD systems also
may re-enumerate each node's core IDs starting at 0 (for example, an AMD
processor may have two nodes, each with core IDs from 0 to 7). In order
to properly enumerate the cores we need to track both the physical and
logical node IDs.
Add physical_node_id to track the node ID assigned by the kernel, and
logical_node_id used by turbostat to track the nodes per package ie) a
0-based count within the package.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 2 Jun 2018 02:08:58 +0000 (22:08 -0400)]
tools/power turbostat: Fix node and siblings lookup data
The turbostat code only looks at thread_siblings_list to determine if
processing units/threads are on the same the core. This works well on
Intel systems which have a shared L1 instruction and data cache. This
does not work on AMD systems which have shared L1 instruction cache but
separate L1 data caches. Other utilities also check sibling's core ID
to determine if the processing unit shares the same core.
Additionally, the cpu_topology *cpus list used in topology_probe() can
be used elsewhere in the code to simplify things.
Export *cpus to the entire turbostat code, and add Processing Unit/Thread
IDs information to each cpu_topology struct. Confirm that the thread
is on the same core as indicated by thread_siblings_list.
[v2]: Fixup CPU_* usage that caused gcc malloc error.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Prarit Bhargava [Fri, 1 Jun 2018 14:04:28 +0000 (10:04 -0400)]
tools/power turbostat: set max_num_cpus equal to the cpumask length
Future fixes will use sysfs files that contain cpumask output. The code
needs to know the length of the cpumask in order to determine which cpus
are set in a cpumask. Currently topo.max_cpu_num is the maximum cpu
number. It can be increased the the maximum value of cpus represented in
cpumasks.
Set max_num_cpus to the length of a cpumask.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chen Yu [Thu, 26 Apr 2018 00:41:03 +0000 (08:41 +0800)]
tools/power turbostat: if --num_iterations, print for specific number of iterations
There's a use case during test to only print specific round of iterations
if --num_iterations is specified, for example, with this patch applied:
turbostat -i 5 -n 4
will capture 4 samples with 5 seconds interval.
[lenb: renamed to --num_iterations from --iterations]
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Srinivas Pandruvada [Thu, 31 May 2018 17:39:07 +0000 (10:39 -0700)]
tools/power turbostat: Add Cannon Lake support
All MSRs related to turbostat are same as Kabylake.
Even though SDM claims that core C3 residency can be read from MSR 0x662,
the read on this MSR fails on CNL platform. Hence disabled C3 MSR read
and display.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Jun 2018 17:04:18 +0000 (13:04 -0400)]
tools/power turbostat: delete duplicate #defines
The SNB_C1_AUTO_UNDEMOTE definition should have been deleted once
it was copied into msr-index.h. One copy of the truth is better --
particularly when Matt needs to fix it:-)
Signed-off-by: Len Brown <len.brown@intel.com>
Matt Turner [Tue, 13 Feb 2018 19:12:05 +0000 (11:12 -0800)]
x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added in the
initial turbostat commit. The reversed definitions were presumably
copied from turbostat.c to this file.
Fixes: 9c63a650bb10 ("tools/power/x86/turbostat: share kernel MSR #defines")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Matt Turner [Tue, 13 Feb 2018 19:12:04 +0000 (11:12 -0800)]
tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added.
Fixes: 889facbee3e6 ("tools/power turbostat: v3.0: monitor Watts and Temperature")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Jun 2018 16:38:29 +0000 (12:38 -0400)]
tools/power turbostat: add POLL and POLL% column
Like the "C1" and "C1%" column, the new POLL and POLL% columns
show invocations and residency% during the measurement interval.
While it didn't seem important to track in the past,
we've recently found some Linux cpuidle bugs related to POLL%.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Mon, 29 Jan 2018 04:42:42 +0000 (23:42 -0500)]
tools/power turbostat: Fix --hide Pk%pc10
The column header for PC10 residency is "Pk%pc10"
This is missing the 'g' that others have, eg Pkg%pc6,
to allow tab-delimited columns to fit into 8-columns.
However, --hide Pk%pc10 did not work, it was still looking for the 'g'.
This was confusing, because --list shows the correct "Pk%pc10"
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Jun 2018 16:35:53 +0000 (12:35 -0400)]
tools/power turbostat: Build-in "Low Power Idle" counters support
Linux 4.15 exports the ACPI Low Power Idle Table's
counters in /sys/devices/system/cpu/cpuidle/
low_power_idle_cpu_residency_us
Show this in the "CPU%LPI" column.
Today this reflects the "North Complex"
residency in PC10, so expect it to
closely follow "Pk%pc10".
low_power_idle_system_residency_us
Show this in the "SYS%LPI" column.
Today, this reflects the North is in PC10,
plus the PCH is sufficiently quiescent
to save additional power via the "S0ix"
system state, as measured by the
PCH SLP_S0 counter.
Signed-off-by: Len Brown <len.brown@intel.com>
Laura Abbott [Wed, 20 Dec 2017 04:02:31 +0000 (20:02 -0800)]
tools/power turbostat: Don't make man pages executable
rpm-lint flagged these as being executable:
kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/turbostat.8.gz
kernel-tools.x86_64: W: spurious-executable-perm /usr/share/man/man8/x86_energy_perf_policy.8.gz
Fix this
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 28 Jan 2018 03:39:21 +0000 (22:39 -0500)]
tools/power turbostat: remove blank lines
When the user reuests to collect and show columns
that are not present on every row (eg. for every CPU)
turbostat still prints an (empty) line for every CPU.
Update so no blank lines are printed.
old:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12
Pkg%pc6
9.12
9.12
new:
# turbostat --quiet --show Pkg%pc6
Pkg%pc6
9.12
9.12
Pkg%pc6
9.12
9.12
Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 5 Sep 2017 12:25:42 +0000 (15:25 +0300)]
tools/power turbostat: a small C-states dump readability immprovement
Improve readability a little bit by changing this output:
MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked: pkg-cstate-limit=7: unlimited, automatic-c-state-conversion=off)
with this output:
MSR_PKG_CST_CONFIG_CONTROL: 0x00008407 (locked, pkg-cstate-limit=7 (unlimited), automatic-c-state-conversion=off)
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 5 Sep 2017 12:14:08 +0000 (15:14 +0300)]
tools/power turbostat: dump BDX, SKX automatic C-state conversion bit
BDX and SKX have a bit that tells them to PROMOTE shallow
C-states requests to MWAIT(C6). It is generally a BIOS bug
if this bit is set. As we have encountered that BIOS bug,
let's print this bit in turbostat debug output.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Tue, 2 Jan 2018 21:17:23 +0000 (16:17 -0500)]
tools/power turbostat: do not hard-code 25MHz crystal on SKX
Some SKX use a 24 MHz crystal, so do not hard code 25 MHz.
Also, SKX crystal is not exact, because SKX uses an EMI reduction
circuit that costs a fraction of a percent.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 8 Dec 2017 22:38:17 +0000 (17:38 -0500)]
tools/power turbostat: fix possible sprintf buffer overflow
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Thu, 9 Nov 2017 04:15:42 +0000 (23:15 -0500)]
tools/power turbostat: fix MSR_IA32_MISC_ENABLE MWAIT printout
MSR_IA32_MISC_ENABLE[18] is the MWAIT ENABLE bit, not DISABLE bit...
so
MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO)
should print as:
MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST MWAIT PREFETCH TURBO)
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Wed, 4 Oct 2017 12:01:47 +0000 (15:01 +0300)]
tools/power turbostat: fix printing on input
The recent patch that implements table printing on a keypress introduced a
regression - turbostat prints the table almost continuously if it is run from a
daemon program.
The problem is also easy to reproduce like this:
echo | turbostat
The reason is that we cannot assume that stdin is always a TTY. It can be many
things.
This patch adds fixes the problem by limiting the new keypress functionality to
TTYs only. If stdin is not a TTY, we just sleep for the full interval time.
While on it, clean-up 'do_sleep()' to return no value, as callers do not expect
that anyway.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 19 Jul 2017 23:28:37 +0000 (19:28 -0400)]
tools/power turbostat: end current interval upon newline input
In turbostat interval mode, a newline typed on standard input
will now conclude the current interval. Data will immediately
be collected and printed for that interval, and the next interval
will be started.
This is similar to the recently added SIGUSR1 feature.
But that is for use by programs, while this is for interactive use.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 15 Jul 2017 19:51:26 +0000 (15:51 -0400)]
tools/power turbostat: on SIGUSR1: sample, print and continue
Interval-mode turbostat now catches and discards SIGUSR1.
Thus, SIGUSR1 can be used to tell turbostat to cut short
the current measurement interval. Turbostat will then start
the next measurement interval using the regular interval length.
This can be used to give turbostat variable intervals.
Invoke turbostat with --interval LARGE_NUMBER_SEC
and have a program that has permission to send it a SIGUSR1
always before LARGE_NUMBER_SEC expires.
It may also be useful to use "--enable Time_Of_Day_Seconds"
to observe the actual interval length.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 15 Jul 2017 18:57:37 +0000 (14:57 -0400)]
tools/power turbostat: on SIGINT: sample, print and exit
When running in interval-mode, catch interrupts
and print a final data record before exiting.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Tue, 17 Oct 2017 19:42:56 +0000 (15:42 -0400)]
tools/power turbostat: add --enable Time_Of_Day_Seconds
Add a Time_Of_Day_Seconds column showing when measurement
for each row was completed. Units are [sec.subsec] since Epoch,
as reported by gettimeofday(2).
While useful to correlate turbostat output with other tools,
this built-in column is disabled, by default.
Add the "--enable" option to enable such disabled-by-default
built-in columns:
"--enable Time_Of_Day_Seconds"
"--enable usec"
"--enable all", will enable all disabled-by-defauilt built-in counters.
When "--debug" is used, all disabled-by-default columns are enabled,
unless explicitly skipped using "--hide"
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Fri, 4 Aug 2017 15:42:23 +0000 (18:42 +0300)]
tools/power turbostat: fix Skylake Xeon package C-state display
Turbostat neglects to display all package C-states for some Skylake Xeon BIOS configurations.
This is due to a typo in the table decoding MSR_PKG_CST_CONFIG_CONTROL (0x000000e2)
Here we fix that typo, according to Intel SDM, vol 4, Table 2-41 -
"MSRs Supported by Intel® Xeon® Processor Scalable Family with DisplayFamily_DisplayModel 06_55H".
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sun, 28 Jan 2018 01:09:16 +0000 (20:09 -0500)]
MAINTAINERS: add turbostat utility
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Thu, 31 May 2018 21:23:07 +0000 (16:23 -0500)]
Merge tag 'xfs-4.17-fixes-3' of git://git./fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Clear out i_mapping error state when we're reinitializing inodes.
This last minute fix prevents writeback error state from persisting
past the end of the in-core inode lifecycle and causing EIO errors to
be reported to userspace when no error has occurred.
This fix for the behavioral regression has been soaking in for-next
for a while, but various fs developers persuaded me to try to get it
upstream for 4.17 because the patch that broke things was introduced
in 4.17-rc4"
* tag 'xfs-4.17-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: clear writeback errors in inode_init_always
Linus Torvalds [Thu, 31 May 2018 14:39:57 +0000 (09:39 -0500)]
Merge tag 'platform-drivers-x86-v4.17-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fix from Andy Shevchenko:
"Fix NULL pointer dereference in asus-wmi on rfkill cleanup.
The effective change is just one new condition - two lines of code.
But it required moving one static helper function, which is why the
diff looks a bit bigger"
* tag 'platform-drivers-x86-v4.17-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: asus-wmi: Fix NULL pointer dereference
João Paulo Rechi Vita [Tue, 22 May 2018 21:30:15 +0000 (14:30 -0700)]
platform/x86: asus-wmi: Fix NULL pointer dereference
Do not perform the rfkill cleanup routine when
(asus->driver->wlan_ctrl_by_user && ashs_present()) is true, since
nothing is registered with the rfkill subsystem in that case. Doing so
leads to the following kernel NULL pointer dereference:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD
1a3aa8067
PUD
1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task:
ffff8801a639ba00 task.stack:
ffffc900014cc000
RIP: 0010:[<
ffffffff816c7348>] [<
ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
RSP: 0018:
ffffc900014cfce0 EFLAGS:
00010282
RAX:
0000000000000000 RBX:
ffff8801a54315b0 RCX:
00000000c0000100
RDX:
0000000000000001 RSI:
0000000000000000 RDI:
ffff8801a54315b4
RBP:
ffffc900014cfd30 R08:
0000000000000000 R09:
0000000000000002
R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8801a54315b4
R13:
ffff8801a639ba00 R14:
00000000ffffffff R15:
ffff8801a54315b8
FS:
00007faa254fb700(0000) GS:
ffff8801aef80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000000 CR3:
00000001a3b1b000 CR4:
00000000001406e0
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<
ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<
ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<
ffffffff816c73e7>] mutex_lock+0x17/0x30
[<
ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<
ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<
ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<
ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<
ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<
ffffffff814a29e3>] device_release_driver+0x23/0x30
[<
ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<
ffffffff8149e5a9>] device_del+0x139/0x270
[<
ffffffff814a5028>] platform_device_del+0x28/0x90
[<
ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<
ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<
ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<
ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<
ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<
ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
RIP [<
ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
RSP <
ffffc900014cfce0>
CR2:
0000000000000000
---[ end trace
8d484233fa7cb512 ]---
note: modprobe[3275] exited with preempt_count 2
https://bugzilla.kernel.org/show_bug.cgi?id=196467
Reported-by: red.f0xyz@gmail.com
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Darrick J. Wong [Thu, 31 May 2018 02:43:53 +0000 (19:43 -0700)]
fs: clear writeback errors in inode_init_always
In inode_init_always(), we clear the inode mapping flags, which clears
any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not
also clear wb_err, which means that old mapping errors can leak through
to new inodes.
This is crucial for the XFS inode allocation path because we recycle old
in-core inodes and we do not want error state from an old file to leak
into the new file. This bug was discovered by running generic/036 and
generic/047 in a loop and noticing that the EIOs generated by the
collision of direct and buffered writes in generic/036 would survive the
remount between 036 and 047, and get reported to the fsyncs (on
different files!) in generic/047.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Linus Torvalds [Wed, 30 May 2018 21:37:59 +0000 (16:37 -0500)]
Merge tag 'for-linus-
20180530' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single fix that should make it into this release, fixing a
regression with T10-DIF on NVMe"
* tag 'for-linus-
20180530' of git://git.kernel.dk/linux-block:
nvme: fix extended data LBA supported setting
Linus Torvalds [Wed, 30 May 2018 21:35:07 +0000 (16:35 -0500)]
Merge tag 'selinux-pr-
20180530' of git://git./linux/kernel/git/pcmoore/selinux
Pull SELinux fix from Paul Moore:
"One more small fix for SELinux: a small string length fix found by
KASAN.
I dislike sending patches this late in the release cycle, but this
patch fixes a legitimate problem, is very small, limited in scope, and
well understood.
There are two threads with more information on the problem, the latest
is linked below:
https://marc.info/?t=
152723737400001&r=1&w=2
Stephen points out in the thread linked above:
'Such a setxattr() call can only be performed by a process with
CAP_MAC_ADMIN that is also allowed mac_admin permission in SELinux
policy. Consequently, this is never possible on Android (no process
is allowed mac_admin permission, always enforcing) and is only
possible in Fedora/RHEL for a few domains (if enforcing)'"
* tag 'selinux-pr-
20180530' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: KASAN: slab-out-of-bounds in xattr_getsecurity
Linus Torvalds [Wed, 30 May 2018 21:33:22 +0000 (16:33 -0500)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a potential kernel panic in the inside-secure driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: inside-secure - do not use memset on MMIO
Rafael J. Wysocki [Wed, 30 May 2018 11:43:01 +0000 (13:43 +0200)]
cpuidle: governors: Consolidate PM QoS handling
There is some code duplication related to the PM QoS handling between
the existing cpuidle governors, so move that code to a common helper
function and call that from the governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 30 May 2018 11:41:09 +0000 (13:41 +0200)]
cpuidle: governors: Drop redundant checks related to PM QoS
PM_QOS_RESUME_LATENCY_NO_CONSTRAINT is defined as the 32-bit integer
maximum, so it is not necessary to test the return value of
dev_pm_qos_raw_read_value() against it directly in the menu and
ladder cpuidle governors.
Drop these redundant checks.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Wed, 30 May 2018 15:30:30 +0000 (10:30 -0500)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- a missing -msoft-float for the compile of the kexec purgatory
- a fix for the dasd driver to avoid the double use of a field in the
'struct request'
[ That latter one is being discussed, and Christoph asked for something
cleaner, but for now it's a fix ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: use blk_mq_rq_from_pdu for per request data
s390/purgatory: Fix endless interrupt loop
Ulf Hansson [Tue, 29 May 2018 10:04:16 +0000 (12:04 +0200)]
PM / Domains: Drop unused parameter in genpd_allocate_dev_data()
The in-parameter struct generic_pm_domain *genpd to
genpd_allocate_dev_data() is unused, so let's drop it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Tue, 29 May 2018 10:04:15 +0000 (12:04 +0200)]
PM / Domains: Drop genpd as in-param for pm_genpd_remove_device()
There is no need to pass a genpd struct to pm_genpd_remove_device(), as we
already have the information about the PM domain (genpd) through the device
structure.
Additionally, we don't allow to remove a PM domain from a device, other
than the one it may have assigned to it, so really it does not make sense
to have a separate in-param for it.
For these reason, drop it and update the current only call to
pm_genpd_remove_device() from amdgpu_acp.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Tue, 29 May 2018 10:04:14 +0000 (12:04 +0200)]
PM / Domains: Drop __pm_genpd_add_device()
There are still a few non-DT existing users of genpd, however neither of
them uses __pm_genpd_add_device(), hence let's drop it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Tue, 29 May 2018 10:04:13 +0000 (12:04 +0200)]
PM / Domains: Drop extern declarations of functions in pm_domain.h
Using "extern" to declare a function in a public header file is somewhat
pointless, but also doesn't hurt. However, to make all the function
declarations in pm_domain.h to be consistent, let's drop the use of
"extern".
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 30 May 2018 11:51:01 +0000 (13:51 +0200)]
Merge branch 'pm-domains' into pm-opp
Rajendra Nayak [Wed, 30 May 2018 09:45:17 +0000 (15:15 +0530)]
PM / domains: Add perf_state attribute to genpd debugfs
Now that genpd supports performance states, add this additional
attribute as part of the power domains debugfs entry, to display
the current performance state for the Power domain.
Suggested-by: David Collins <collinsd@codeaurora.org>
Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Wed, 30 May 2018 10:54:57 +0000 (12:54 +0200)]
Merge branch 'opp/linux-next' of git://git./linux/kernel/git/vireshk/pm into pm-opp
Pull more OPP updates for v4.18 from Viresh Kumar.
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
OPP: Allow same OPP table to be used for multiple genpd
PM / OPP: Fix shared OPP table support in dev_pm_opp_register_set_opp_helper()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_regulators()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_prop_name()
PM / OPP: Fix shared OPP table support in dev_pm_opp_set_supported_hw()
Ilia Lin [Wed, 30 May 2018 02:39:29 +0000 (05:39 +0300)]
dt-bindings: cpufreq: Document operating-points-v2-kryo-cpu
The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC
to provide the OPP framework with required information.
This is used to determine the voltage and frequency value for each OPP of
operating-points-v2 table when it is parsed by the OPP framework.
This change adds documentation for the DT bindings.
The "operating-points-v2-kryo-cpu" DT extends the "operating-points-v2"
with following parameters:
- nvmem-cells (NVMEM area containig the speedbin information)
- opp-supported-hw: A single 32 bit bitmap value,
representing compatible HW:
0: MSM8996 V3, speedbin 0
1: MSM8996 V3, speedbin 1
2: MSM8996 V3, speedbin 2
3: unused
4: MSM8996 SG, speedbin 0
5: MSM8996 SG, speedbin 1
6: MSM8996 SG, speedbin 2
7-31: unused
Signed-off-by: Ilia Lin <ilialin@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ilia Lin [Wed, 30 May 2018 02:39:28 +0000 (05:39 +0300)]
cpufreq: Add Kryo CPU scaling driver
In Certain QCOM SoCs like apq8096 and msm8996 that have KRYO processors,
the CPU frequency subset and voltage value of each OPP varies
based on the silicon variant in use. Qualcomm Process Voltage Scaling Tables
defines the voltage and frequency value based on the msm-id in SMEM
and speedbin blown in the efuse combination.
The qcom-cpufreq-kryo driver reads the msm-id and efuse value from the SoC
to provide the OPP framework with required information.
This is used to determine the voltage and frequency value for each OPP of
operating-points-v2 table when it is parsed by the OPP framework.
Signed-off-by: Ilia Lin <ilialin@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Viresh Kumar [Wed, 30 May 2018 09:49:38 +0000 (15:19 +0530)]
OPP: Allow same OPP table to be used for multiple genpd
The OPP binding says:
Property: operating-points-v2
...
This can contain more than one phandle for power domain
providers that provide multiple power domains. That is, one
phandle for each power domain. If only one phandle is available,
then the same OPP table will be used for all power domains
provided by the power domain provider.
But the OPP core isn't allowing the same OPP table to be used for
multiple domains. Update dev_pm_opp_of_add_table_indexed() to allow
that.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Rajendra Nayak <rnayak@codeaurora.org>
Sebastian Andrzej Siewior [Fri, 25 May 2018 10:19:58 +0000 (12:19 +0200)]
cpufreq: Use static SRCU initializer
Use the static SRCU initializer for `cpufreq_transition_notifier_list'.
This avoids the init_cpufreq_transition_notifier_list() initcall. Its
only purpose is to initialize the SRCU notifier once during boot and set
another variable which is used as an indicator whether the init was
perfromed before cpufreq_register_notifier() was used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sebastian Andrzej Siewior [Fri, 25 May 2018 10:19:57 +0000 (12:19 +0200)]
kernel/SRCU: provide a static initializer
There are macros for static initializer for the three out of four
possible notifier types, that are:
ATOMIC_NOTIFIER_HEAD()
BLOCKING_NOTIFIER_HEAD()
RAW_NOTIFIER_HEAD()
This patch provides a static initilizer for the forth type to make it
complete.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tao Wang [Sat, 26 May 2018 07:16:48 +0000 (15:16 +0800)]
cpufreq: Fix new policy initialization during limits updates via sysfs
If the policy limits are updated via cpufreq_update_policy() and
subsequently via sysfs, the limits stored in user_policy may be
set incorrectly.
For example, if both min and max are set via sysfs to the maximum
available frequency, user_policy.min and user_policy.max will also
be the maximum. If a policy notifier triggered by
cpufreq_update_policy() lowers both the min and the max at this
point, that change is not reflected by the user_policy limits, so
if the max is updated again via sysfs to the same lower value,
then user_policy.max will be lower than user_policy.min which
shouldn't happen. In particular, if one of the policy CPUs is
then taken offline and back online, cpufreq_set_policy() will
fail for it due to a failing limits check.
To prevent that from happening, initialize the min and max fields
of the new_policy object to the ones stored in user_policy that
were previously set via sysfs.
Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Wed, 30 May 2018 03:22:15 +0000 (22:22 -0500)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"We are switching a bunch of Lenovo devices with Synaptics touchpads
from PS/2 emulation over to native RMI/SMbus.
Given that all commits are marked for stable there is no point
delaying them till next release"
[ Also fix a too-small stack array for i2c communication in elan driver ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c_smbus - fix corrupted stack
Input: synaptics - add Lenovo 80 series ids to SMBus
Input: synaptics - add Intertouch support on X1 Carbon 6th and X280
Input: synaptics - Lenovo Thinkpad X1 Carbon G5 (2017) with Elantech trackpoints should use RMI
Input: synaptics - Lenovo Carbon X1 Gen5 (2017) devices should use RMI
Sachin Grover [Fri, 25 May 2018 08:31:39 +0000 (14:01 +0530)]
selinux: KASAN: slab-out-of-bounds in xattr_getsecurity
Call trace:
[<
ffffff9203a8d7a8>] dump_backtrace+0x0/0x428
[<
ffffff9203a8dbf8>] show_stack+0x28/0x38
[<
ffffff920409bfb8>] dump_stack+0xd4/0x124
[<
ffffff9203d187e8>] print_address_description+0x68/0x258
[<
ffffff9203d18c00>] kasan_report.part.2+0x228/0x2f0
[<
ffffff9203d1927c>] kasan_report+0x5c/0x70
[<
ffffff9203d1776c>] check_memory_region+0x12c/0x1c0
[<
ffffff9203d17cdc>] memcpy+0x34/0x68
[<
ffffff9203d75348>] xattr_getsecurity+0xe0/0x160
[<
ffffff9203d75490>] vfs_getxattr+0xc8/0x120
[<
ffffff9203d75d68>] getxattr+0x100/0x2c8
[<
ffffff9203d76fb4>] SyS_fgetxattr+0x64/0xa0
[<
ffffff9203a83f70>] el0_svc_naked+0x24/0x28
If user get root access and calls security.selinux setxattr() with an
embedded NUL on a file and then if some process performs a getxattr()
on that file with a length greater than the actual length of the string,
it would result in a panic.
To fix this, add the actual length of the string to the security context
instead of the length passed by the userspace process.
Signed-off-by: Sachin Grover <sgrover@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Tue, 29 May 2018 20:30:16 +0000 (15:30 -0500)]
Merge tag 'afs-fixes-
20180529' of git://git./linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- fix a BUG triggerable from faccessat()
- fix the mounting of backup volumes
* tag 'afs-fixes-
20180529' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix mounting of backup volumes
afs: Fix directory permissions check
Jens Axboe [Tue, 29 May 2018 18:54:12 +0000 (12:54 -0600)]
Merge branch 'nvme-4.17' of git://git.infradead.org/nvme into for-linus
Pull NVMe fix from Christoph:
"Below is a one-liner fix from Max that unbreaks T10-DIF support, which
got broken in 4.15."
* 'nvme-4.17' of git://git.infradead.org/nvme:
nvme: fix extended data LBA supported setting
Max Gurtovoy [Sun, 27 May 2018 15:50:10 +0000 (18:50 +0300)]
nvme: fix extended data LBA supported setting
This value depands on the metadata support value, so reorder the
initialization to fit.
Fixes: b5be3b392 ("nvme: always unregister the integrity profile in __nvme_revalidate_disk")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Linus Torvalds [Tue, 29 May 2018 12:28:48 +0000 (07:28 -0500)]
Merge tag 'trace-v4.17-rc4-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"While writing selftests for a new feature, I triggered two existing
bugs that deal with triggers and instances.
- a generic trigger bug where the triggers are not removed from a
linked list properly when deleting an instance.
- a bug specific to snapshots, where the snapshot is done in the top
level buffer, when it is supposed to snapshot the buffer associated
to the instance the snapshot trigger exists in"
* tag 'trace-v4.17-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Make the snapshot trigger work with instances
tracing: Fix crash when freeing instances with event triggers
Rafael J. Wysocki [Fri, 25 May 2018 10:33:10 +0000 (12:33 +0200)]
PM / QoS: Drop redundant declaration of pm_qos_get_value()
The extra forward declaration of pm_qos_get_value() is redundant, so
drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Steven Rostedt (VMware) [Mon, 28 May 2018 14:56:36 +0000 (10:56 -0400)]
tracing: Make the snapshot trigger work with instances
The snapshot trigger currently only affects the main ring buffer, even when
it is used by the instances. This can be confusing as the snapshot trigger
is listed in the instance.
> # cd /sys/kernel/tracing
> # mkdir instances/foo
> # echo snapshot > instances/foo/events/syscalls/sys_enter_fchownat/trigger
> # echo top buffer > trace_marker
> # echo foo buffer > instances/foo/trace_marker
> # touch /tmp/bar
> # chown rostedt /tmp/bar
> # cat instances/foo/snapshot
# tracer: nop
#
#
# * Snapshot is freed *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
> # cat snapshot
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-1189 [000] .... 111.488323: tracing_mark_write: top buffer
Not only did the snapshot occur in the top level buffer, but the instance
snapshot buffer should have been allocated, and it is still free.
Cc: stable@vger.kernel.org
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Mon, 28 May 2018 12:25:57 +0000 (05:25 -0700)]
Merge tag 'nds32-for-linus-4.17-fixes' of git://git./linux/kernel/git/greentime/linux
Pull nds32 fixes from Greentime Hu:
"Bug fixes and build error fixes for nds32"
* tag 'nds32-for-linus-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: Fix compiler warning, Wstringop-overflow, in vdso.c
nds32: Disable local irq before calling cpu_dcache_wb_page in copy_user_highpage
nds32: Flush the cache of the page at vmaddr instead of kaddr in flush_anon_page
nds32: Correct flush_dcache_page function
nds32: Fix the unaligned access handler
nds32: Renaming the file for unaligned access
nds32: To fix a cache inconsistency issue by setting correct cacheability of NTC
nds32: To refine readability of INT_MASK_INITAIAL_VAL
nds32: Fix the virtual address may map too much range by tlbop issue.
nds32: Fix the allmodconfig build. To make sure CONFIG_CPU_LITTLE_ENDIAN is default y
nds32: Fix build failed because arch_trace_hardirqs_off is changed to trace_hardirqs_off.
nds32: Fix the unknown type u8 issue.
nds32: Fix the symbols undefined issue by exporting them.
nds32: Fix xfs_buf built failed by export invalidate_kernel_vmap_range and flush_kernel_vmap_range
nds32: Fix drivers/gpu/drm/udl/udl_fb.c building error by defining PAGE_SHARED
nds32: Fix building error of crypto/xor.c by adding xor.h
nds32: Fix building error when CONFIG_FREEZE is enabled.
nds32: lib: To use generic lib instead of libgcc to prevent the symbol undefined issue.
Steven Rostedt (VMware) [Mon, 28 May 2018 00:54:44 +0000 (20:54 -0400)]
tracing: Fix crash when freeing instances with event triggers
If a instance has an event trigger enabled when it is freed, it could cause
an access of free memory. Here's the case that crashes:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo snapshot > instances/foo/events/initcall/initcall_start/trigger
# rmdir instances/foo
Would produce:
general protection fault: 0000 [#1] PREEMPT SMP PTI
Modules linked in: tun bridge ...
CPU: 5 PID: 6203 Comm: rmdir Tainted: G W 4.17.0-rc4-test+ #933
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:clear_event_triggers+0x3b/0x70
RSP: 0018:
ffffc90003783de0 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
6b6b6b6b6b6b6b2b RCX:
0000000000000000
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff8800c7130ba0
RBP:
ffffc90003783e00 R08:
ffff8801131993f8 R09:
0000000100230016
R10:
ffffc90003783d80 R11:
0000000000000000 R12:
ffff8800c7130ba0
R13:
ffff8800c7130bd8 R14:
ffff8800cc093768 R15:
00000000ffffff9c
FS:
00007f6f4aa86700(0000) GS:
ffff88011eb40000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f6f4a5aed60 CR3:
00000000cd552001 CR4:
00000000001606e0
Call Trace:
event_trace_del_tracer+0x2a/0xc5
instance_rmdir+0x15c/0x200
tracefs_syscall_rmdir+0x52/0x90
vfs_rmdir+0xdb/0x160
do_rmdir+0x16d/0x1c0
__x64_sys_rmdir+0x17/0x20
do_syscall_64+0x55/0x1a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This was due to the call the clears out the triggers when an instance is
being deleted not removing the trigger from the link list.
Cc: stable@vger.kernel.org
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sun, 27 May 2018 20:01:47 +0000 (13:01 -0700)]
Linux 4.17-rc7
Linus Torvalds [Sun, 27 May 2018 16:27:27 +0000 (09:27 -0700)]
Merge tag 'kbuild-fixes-v4.17-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- enable '-fno-tree-loop-im' only when supported
- add '-fno-PIE' option before the asm-goto test
* tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Makefile: disable PIE before testing asm goto
kbuild: gcov: enable -fno-tree-loop-im if supported
Ulf Hansson [Thu, 24 May 2018 08:33:36 +0000 (10:33 +0200)]
PM / runtime: Drop usage count for suppliers at device link removal
In the case consumer device is runtime resumed, while the link to the
supplier is removed, the earlier call to pm_runtime_get_sync() made from
rpm_get_suppliers() does not get properly balanced with a corresponding
call to pm_runtime_put(). This leads to that suppliers remains to be
runtime resumed forever, while they don't need to.
Let's fix the behaviour by calling rpm_put_suppliers() when dropping a
device link. Not that, since rpm_put_suppliers() checks the
link->rpm_active flag, we can correctly avoid to call pm_runtime_put() in
cases when we shouldn't.
Reported-by: Todor Tomov <todor.tomov@linaro.org>
Fixes: 21d5c57b3726 (PM / runtime: Use device links)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ulf Hansson [Fri, 18 May 2018 08:48:49 +0000 (10:48 +0200)]
PM / runtime: Fixup reference counting of device link suppliers at probe
In the driver core, before it invokes really_probe() it runtime resumes the
suppliers for the device via calling pm_runtime_get_suppliers(), which also
increases the runtime PM usage count for each of the available supplier.
This makes sense, as to be able to allow the consumer device to be probed
by its driver. However, if the driver decides to add a new supplier link
during ->probe(), hence updating the list of suppliers, the following call
to pm_runtime_put_suppliers(), invoked after really_probe() in the driver
core, we get into trouble.
More precisely, pm_runtime_put() gets called also for the new supplier(s),
which is wrong as the driver core, didn't trigger pm_runtime_get_sync() to
be called for it in the first place. In other words, the new supplier may
be runtime suspended even in cases when it shouldn't.
Fix this behaviour, by runtime resume suppliers according to the same
conditions as managed by the runtime PM core, when runtime resume callbacks
are being invoked.
Additionally, don't try to runtime suspend any of the suppliers after
really_probe(), but instead rely on that to happen via the consumer device,
when it becomes runtime suspended.
Fixes: 21d5c57b3726 (PM / runtime: Use device links)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Todd E Brandt [Thu, 24 May 2018 16:36:28 +0000 (09:36 -0700)]
PM / tools: pm-graph: upgrade to v5.1
general changes:
- make python dependent on version2 to enable clearlinux
- upgrade dmesg error/warning extraction to be more detailed
- enable logs generated from -cmd runs to be processed in gzip form
- add notification on power mode entry failure into the timeline
- add -battery option to show if battery is connected and its charge
summary changes (output of -summary):
- add -genhtml option to regenerate missing timelines from logs found
- add min/max/median/avg data to the summary page with links to the data
- add highlight to minimum, maximum, and median tests
- add result column to summary (pass or fail) with red highlight on fail
- add issues column to summary with a list of dmesg err/warn/bugs
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Sat, 26 May 2018 21:05:16 +0000 (14:05 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes for v4.17:
- a fix for a crash in scm_call_atomic on qcom platforms
- display fix for Allwinner A10
- a fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
- a fix for eMMC corruption on hikey
- i2c-gpio descriptor tables for ixp4xx
... plus a small typo fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: Fix i2c-gpio GPIO descriptor tables
arm64: dts: hikey: Fix eMMC corruption regression
firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
ARM: sun8i: v3s: fix spelling mistake: "disbaled" -> "disabled"
ARM: dts: sun4i: Fix incorrect clocks for displays
ARM: dts: sun8i: h3: Re-enable EMAC on Orange Pi One
Linus Torvalds [Sat, 26 May 2018 20:24:16 +0000 (13:24 -0700)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 store buffer fixes from Thomas Gleixner:
"Two fixes for the SSBD mitigation code:
- expose SSBD properly to guests. This got broken when the CPU
feature flags got reshuffled.
- simplify the CPU detection logic to avoid duplicate entries in the
tables"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify the CPU bug detection logic
KVM/VMX: Expose SSBD properly to guests
Linus Torvalds [Sat, 26 May 2018 20:10:16 +0000 (13:10 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Three fixes for scheduler and kthread code:
- allow calling kthread_park() on an already parked thread
- restore the sched_pi_setprio() tracepoint behaviour
- clarify the unclear string for the scheduling domain debug output"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, tracing: Fix trace_sched_pi_setprio() for deboosting
kthread: Allow kthread_park() on a parked kthread
sched/topology: Clarify root domain(s) debug string
Olof Johansson [Sat, 26 May 2018 19:12:44 +0000 (12:12 -0700)]
Merge tag 'hisi-fixes-for-4.17v2' of git://github.com/hisilicon/linux-hisi into fixes
ARM64: hisi fixes for 4.17
- Remove eMMC max-frequency property to fix eMMC corruption on hikey board
* tag 'hisi-fixes-for-4.17v2' of git://github.com/hisilicon/linux-hisi:
arm64: dts: hikey: Fix eMMC corruption regression
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Walleij [Sat, 26 May 2018 16:37:34 +0000 (18:37 +0200)]
ARM: Fix i2c-gpio GPIO descriptor tables
I used bad names in my clumsiness when rewriting many board
files to use GPIO descriptors instead of platform data. A few
had the platform_device ID set to -1 which would indeed give
the device name "i2c-gpio".
But several had it set to >=0 which gives the names
"i2c-gpio.0", "i2c-gpio.1" ...
Fix the offending instances in the ARM tree. Sorry for the
mess.
Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors")
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Simon Guinot <simon.guinot@sequanux.org>
Reported-by: Simon Guinot <simon.guinot@sequanux.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Linus Torvalds [Sat, 26 May 2018 17:46:57 +0000 (10:46 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"PPC:
- Close a hole which could possibly lead to the host timebase getting
out of sync.
- Three fixes relating to PTEs and TLB entries for radix guests.
- Fix a bug which could lead to an interrupt never getting delivered
to the guest, if it is pending for a guest vCPU when the vCPU gets
offlined.
s390:
- Fix false negatives in VSIE validity check (Cc stable)
x86:
- Fix time drift of VMX preemption timer when a guest uses LAPIC
timer in periodic mode (Cc stable)
- Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow
migration from hosts that don't need retpoline mitigation (Cc
stable)
- Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and
CPUID.OSXSAVE (Cc stable)
- Report correct RIP after Hyper-V hypercall #UD (introduced in
-rc6)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix #UD address of failed Hyper-V hypercalls
kvm: x86: IA32_ARCH_CAPABILITIES is always supported
KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
x86/kvm: fix LAPIC timer drift when guest uses periodic mode
KVM: s390: vsie: fix < 8k check for the itdba
KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path
KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change
KVM: PPC: Book3S HV: Make radix clear pte when unmapping
KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page
KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
John Stultz [Sat, 26 May 2018 03:10:47 +0000 (20:10 -0700)]
arm64: dts: hikey: Fix eMMC corruption regression
This patch is a partial revert of
commit
abd7d0972a19 ("arm64: dts: hikey: Enable HS200 mode on eMMC")
which has been causing eMMC corruption on my HiKey board.
Symptoms usually looked like:
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
...
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc0: new HS200 MMC card at address 0001
...
dwmmc_k3
f723d000.dwmmc0: Unexpected command timeout, state 3
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
print_req_error: I/O error, dev mmcblk0, sector
8810504
Aborting journal on device mmcblk0p10-8.
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 24800000Hz (slot req 400000Hz, actual 400000HZ div = 31)
mmc_host mmc0: Bus speed (slot 0) = 148800000Hz (slot req 150000000Hz, actual 148800000HZ div = 0)
EXT4-fs error (device mmcblk0p10): ext4_journal_check_start:61: Detected aborted journal
EXT4-fs (mmcblk0p10): Remounting filesystem read-only
And quite often this would result in a disk that wouldn't properly
boot even with older kernels.
It seems the max-frequency property added by the above patch is
causing the problem, so remove it.
Cc: Ryan Grachek <ryan@edited.us>
Cc: Wei Xu <xuwei5@hisilicon.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: YongQin Liu <yongqin.liu@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Wei Xu <xuwei04@gmail.com>
Antoine Tenart [Thu, 17 May 2018 13:22:14 +0000 (15:22 +0200)]
crypto: inside-secure - do not use memset on MMIO
This patch fixes the Inside Secure driver which uses a memtset() call to
set an MMIO area from the cryptographic engine to 0. This is wrong as
memset() isn't guaranteed to work on MMIO for many reasons. This led to
kernel paging request panics in certain cases. Use memset_io() instead.
Fixes: 1b44c5a60c13 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
Reported-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Sat, 26 May 2018 03:24:28 +0000 (20:24 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kasan: fix memory hotplug during boot
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
checkpatch: fix macro argument precedence test
init/main.c: include <linux/mem_encrypt.h>
kernel/sys.c: fix potential Spectre v1 issue
mm/memory_hotplug: fix leftover use of struct page during hotplug
proc: fix smaps and meminfo alignment
mm: do not warn on offline nodes unless the specific node is explicitly requested
mm, memory_hotplug: make has_unmovable_pages more robust
mm/kasan: don't vfree() nonexistent vm_area
MAINTAINERS: change hugetlbfs maintainer and update files
ipc/shm: fix shmat() nil address after round-down when remapping
Revert "ipc/shm: Fix shmat mmap nil-page protection"
idr: fix invalid ptr dereference on item delete
ocfs2: revert "ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"
mm: fix nr_rotate_swap leak in swapon() error case
Linus Torvalds [Sat, 26 May 2018 02:54:42 +0000 (19:54 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP,
from Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from Eric
Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from
Daniel Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem de
Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
ibmvnic: Fix partial success login retries
selftests/net: Add missing config options for PMTU tests
mlx4_core: allocate ICM memory in page size chunks
enic: set DMA mask to 47 bit
ppp: remove the PPPIOCDETACH ioctl
ipv4: remove warning in ip_recv_error
net : sched: cls_api: deal with egdev path only if needed
vhost: synchronize IOTLB message with dev cleanup
packet: fix reserve calculation
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
bpf: properly enforce index mask to prevent out-of-bounds speculation
net/mlx4: Fix irq-unsafe spinlock usage
net: phy: broadcom: Fix bcm_write_exp()
net: phy: broadcom: Fix auxiliary control register reads
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
ibmvnic: Only do H_EOI for mobility events
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
virtio-net: fix leaking page for gso packet during mergeable XDP
...
David Hildenbrand [Fri, 25 May 2018 21:48:11 +0000 (14:48 -0700)]
kasan: fix memory hotplug during boot
Using module_init() is wrong. E.g. ACPI adds and onlines memory before
our memory notifier gets registered.
This makes sure that ACPI memory detected during boot up will not result
in a kernel crash.
Easily reproducible with QEMU, just specify a DIMM when starting up.
Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
Fixes: 786a8959912e ("kasan: disable memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Fri, 25 May 2018 21:48:08 +0000 (14:48 -0700)]
kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
We have to free memory again when we cancel onlining, otherwise a later
onlining attempt will fail.
Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com
Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 25 May 2018 21:48:04 +0000 (14:48 -0700)]
checkpatch: fix macro argument precedence test
checkpatch's macro argument precedence test is broken so fix it.
Link: http://lkml.kernel.org/r/5dd900e9197febc1995604bb33c23c136d8b33ce.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Malaterre [Fri, 25 May 2018 21:48:00 +0000 (14:48 -0700)]
init/main.c: include <linux/mem_encrypt.h>
In commit
c7753208a94c ("x86, swiotlb: Add memory encryption support") a
call to function `mem_encrypt_init' was added. Include prototype
defined in header <linux/mem_encrypt.h> to prevent a warning reported
during compilation with W=1:
init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]
Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo A. R. Silva [Fri, 25 May 2018 21:47:57 +0000 (14:47 -0700)]
kernel/sys.c: fix potential Spectre v1 issue
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Cameron [Fri, 25 May 2018 21:47:53 +0000 (14:47 -0700)]
mm/memory_hotplug: fix leftover use of struct page during hotplug
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug. In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.
Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.
Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.
The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.
It came up whilst testing some arm64 hotplug patches, but appears to be
universal. Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.
These patches call __add_pages (where most of the issue was fixed by
Pavel's patch). If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false. Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...
Exact path to the problem is as follows:
mm/memory_hotplug.c: add_memory_resource()
The node is not online so we enter the 'if (new_node)' twice, on the
second such block there is a call to link_mem_sections which calls
into
drivers/node.c: link_mem_sections() which calls
drivers/node.c: register_mem_sect_under_node() which calls
get_nid_for_pfn and keeps trying until the output of that matches
the expected node (passed all the way down from
add_memory_resource)
It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time). (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).
Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 25 May 2018 21:47:50 +0000 (14:47 -0700)]
proc: fix smaps and meminfo alignment
The 4.17-rc /proc/meminfo and /proc/<pid>/smaps look ugly: single-digit
numbers (commonly 0) are misaligned.
Remove seq_put_decimal_ull_width()'s leftover optimization for single
digits: it's wrong now that num_to_str() takes care of the width.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805241554210.1326@eggly.anvils
Fixes: d1be35cb6f96 ("proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 25 May 2018 21:47:46 +0000 (14:47 -0700)]
mm: do not warn on offline nodes unless the specific node is explicitly requested
Oscar has noticed that we splat
WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9
[...]
CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-
20180517-1-default+ #66
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Call Trace:
vmemmap_populate+0xf2/0x2ae
sparse_mem_map_populate+0x28/0x35
sparse_add_one_section+0x4c/0x187
__add_pages+0xe7/0x1a0
add_pages+0x16/0x70
add_memory_resource+0xa3/0x1d0
add_memory+0xe4/0x110
acpi_memory_device_add+0x134/0x2e0
acpi_bus_attach+0xd9/0x190
acpi_bus_scan+0x37/0x70
acpi_device_hotplug+0x389/0x4e0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x146/0x340
worker_thread+0x47/0x3e0
kthread+0xf5/0x130
ret_from_fork+0x35/0x40
when adding memory to a node that is currently offline.
The VM_WARN_ON is just too loud without a good reason. In this
particular case we are doing
alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order)
so we do not insist on allocating from the given node (it is more a
hint) so we can fall back to any other populated node and moreover we
explicitly ask to not warn for the allocation failure.
Soften the warning only to cases when somebody asks for the given node
explicitly by __GFP_THISNODE.
Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Oscar Salvador <osalvador@techadventures.net>
Tested-by: Oscar Salvador <osalvador@techadventures.net>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 25 May 2018 21:47:42 +0000 (14:47 -0700)]
mm, memory_hotplug: make has_unmovable_pages more robust
Oscar has reported:
: Due to an unfortunate setting with movablecore, memblocks containing bootmem
: memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
: So while trying to remove that memory, the system failed in do_migrate_range
: and __offline_pages never returned.
:
: This can be reproduced by running
: qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M
: and movablecore=4G kernel command line
:
: linux kernel: BIOS-provided physical RAM map:
: linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
: linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
: linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
: linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
: linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
: linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
: linux kernel: NX (Execute Disable) protection: active
: linux kernel: SMBIOS 2.8 present.
: linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
: linux kernel: Hypervisor detected: KVM
: linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
: linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
: linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
:
: linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
: linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
: linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
: linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
: linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
: linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
: linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
: linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
:
: zoneinfo shows that the zone movable is placed into both numa nodes:
: Node 0, zone Movable
: pages free 160140
: min 1823
: low 2278
: high 2733
: spanned 262144
: present 262144
: managed 245670
: Node 1, zone Movable
: pages free 448427
: min 3827
: low 4783
: high 5739
: spanned 524288
: present 524288
: managed 515766
Note how only Node 0 has a hutplugable memory region which would rule it
out from the early memblock allocations (most likely memmap). Node1
will surely contain memmaps on the same node and those would prevent
offlining to succeed. So this is arguably a configuration issue.
Although one could argue that we should be more clever and rule early
allocations from the zone movable. This would be correct but probably
not worth the effort considering what a hack movablecore is.
Anyway, We could do better for those cases though. We rely on
start_isolate_page_range resp. has_unmovable_pages to do their job.
The first one isolates the whole range to be offlined so that we do not
allocate from it anymore and the later makes sure we are not stumbling
over non-migrateable pages.
has_unmovable_pages is overly optimistic, however. It doesn't check all
the pages if we are withing zone_movable because we rely that those
pages will be always migrateable. As it turns out we are still not
perfect there. While bootmem pages in zonemovable sound like a clear
bug which should be fixed let's remove the optimization for now and warn
if we encounter unmovable pages in zone_movable in the meantime. That
should help for now at least.
Btw. this wasn't a real problem until commit
72b39cfc4d75 ("mm,
memory_hotplug: do not fail offlining too early") because we used to
have a small number of retries and then failed. This turned out to be
too fragile though.
Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Oscar Salvador <osalvador@techadventures.net>
Tested-by: Oscar Salvador <osalvador@techadventures.net>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Fri, 25 May 2018 21:47:38 +0000 (14:47 -0700)]
mm/kasan: don't vfree() nonexistent vm_area
KASAN uses different routines to map shadow for hot added memory and
memory obtained in boot process. Attempt to offline memory onlined by
normal boot process leads to this:
Trying to vfree() nonexistent vm area (
000000005d3b34b9)
WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
Call Trace:
kasan_mem_notifier+0xad/0xb9
notifier_call_chain+0x166/0x260
__blocking_notifier_call_chain+0xdb/0x140
__offline_pages+0x96a/0xb10
memory_subsys_offline+0x76/0xc0
device_offline+0xb8/0x120
store_mem_state+0xfa/0x120
kernfs_fop_write+0x1d5/0x320
__vfs_write+0xd4/0x530
vfs_write+0x105/0x340
SyS_write+0xb0/0x140
Obviously we can't call vfree() to free memory that wasn't allocated via
vmalloc(). Use find_vm_area() to see if we can call vfree().
Unfortunately it's a bit tricky to properly unmap and free shadow
allocated during boot, so we'll have to keep it. If memory will come
online again that shadow will be reused.
Matthew asked: how can you call vfree() on something that isn't a
vmalloc address?
vfree() is able to free any address returned by
__vmalloc_node_range(). And __vmalloc_node_range() gives you any
address you ask. It doesn't have to be an address in [VMALLOC_START,
VMALLOC_END] range.
That's also how the module_alloc()/module_memfree() works on
architectures that have designated area for modules.
[aryabinin@virtuozzo.com: improve comments]
Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
[akpm@linux-foundation.org: fix typos, reflow comment]
Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Fri, 25 May 2018 21:47:35 +0000 (14:47 -0700)]
MAINTAINERS: change hugetlbfs maintainer and update files
The current hugetlbfs maintainer has not been active for more than a few
years. I have been been active in this area for more than two years and
plan to remain active in the foreseeable future.
Also, update the hugetlbfs entry to include linux-mm mail list and
additional hugetlbfs related files. hugetlb.c and hugetlb.h are not
100% hugetlbfs, but a majority of their content is hugetlbfs related.
Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Nadia Yvette Chambers <nyc@holomorphy.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:30 +0000 (14:47 -0700)]
ipc/shm: fix shmat() nil address after round-down when remapping
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As
of this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:27 +0000 (14:47 -0700)]
Revert "ipc/shm: Fix shmat mmap nil-page protection"
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/
20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit
95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 25 May 2018 21:47:24 +0000 (14:47 -0700)]
idr: fix invalid ptr dereference on item delete
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen. This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id >= 64.
Roman said:
The syzcaller test boils down to opening /dev/kvm, creating an
eventfd, and calling a couple of KVM ioctls. None of this requires
superuser. And the result is dereferencing an uninitialized pointer
which is likely a crash. The specific path caught by syzbot is via
KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
other user-triggerable paths, so cc:stable is probably justified.
Matthew added:
We have around 250 calls to idr_remove() in the kernel today. Many of
them pass an ID which is embedded in the object they're removing, so
they're safe. Picking a few likely candidates:
drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
drivers/atm/nicstar.c could be taken down by a handcrafted packet
Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>