Jiri Olsa [Sat, 4 May 2019 15:15:56 +0000 (17:15 +0200)]
perf/x86/intel: Fix race in intel_pmu_disable_event()
New race in x86_pmu_stop() was introduced by replacing the
atomic __test_and_clear_bit() of cpuc->active_mask by separate
test_bit() and __clear_bit() calls in the following commit:
3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
The race causes panic for PEBS events with enabled callchains:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
...
RIP: 0010:perf_prepare_sample+0x8c/0x530
Call Trace:
<NMI>
perf_event_output_forward+0x2a/0x80
__perf_event_overflow+0x51/0xe0
handle_pmi_common+0x19e/0x240
intel_pmu_handle_irq+0xad/0x170
perf_event_nmi_handler+0x2e/0x50
nmi_handle+0x69/0x110
default_do_nmi+0x3e/0x100
do_nmi+0x11a/0x180
end_repeat_nmi+0x16/0x1a
RIP: 0010:native_write_msr+0x6/0x20
...
</NMI>
intel_pmu_disable_event+0x98/0xf0
x86_pmu_stop+0x6e/0xb0
x86_pmu_del+0x46/0x140
event_sched_out.isra.97+0x7e/0x160
...
The event is configured to make samples from PEBS drain code,
but when it's disabled, we'll go through NMI path instead,
where data->callchain will not get allocated and we'll crash:
x86_pmu_stop
test_bit(hwc->idx, cpuc->active_mask)
intel_pmu_disable_event(event)
{
...
intel_pmu_pebs_disable(event);
...
EVENT OVERFLOW -> <NMI>
intel_pmu_handle_irq
handle_pmi_common
TEST PASSES -> test_bit(bit, cpuc->active_mask))
perf_event_overflow
perf_prepare_sample
{
...
if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
data->callchain = perf_callchain(event, regs);
CRASH -> size += data->callchain->nr;
}
</NMI>
...
x86_pmu_disable_event(event)
}
__clear_bit(hwc->idx, cpuc->active_mask);
Fixing this by disabling the event itself before setting
off the PEBS bit.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Arcari <darcari@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander Shishkin [Fri, 3 May 2019 08:55:36 +0000 (11:55 +0300)]
perf/x86/intel/pt: Remove software double buffering PMU capability
Now that all AUX allocations are high-order by default, the software
double buffering PMU capability doesn't make sense any more, get rid
of it. In case some PMUs choose to opt out, we can re-introduce it.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Link: http://lkml.kernel.org/r/20190503085536.24119-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alexander Shishkin [Fri, 3 May 2019 08:55:35 +0000 (11:55 +0300)]
perf/ring_buffer: Fix AUX software double buffering
This recent commit:
5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically")
overlooked the fact that the previous one page granularity of the AUX buffer
provided an implicit double buffering capability to the PMU driver, which
went away when the entire buffer became one high-order page.
Always make the full-trace mode AUX allocation at least two-part to preserve
the previous behavior and allow the implicit double buffering to continue.
Reported-by: Ammy Yi <ammy.yi@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Fixes: 5768402fd9c6e87 ("perf/ring_buffer: Use high order allocations for AUX buffers optimistically")
Link: http://lkml.kernel.org/r/20190503085536.24119-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 3 May 2019 05:48:18 +0000 (07:48 +0200)]
Merge tag 'perf-urgent-for-mingo-5.1-
20190502' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
tools UAPI:
Arnaldo Carvalho de Melo:
- Sync x86's vmx.h with the kernel.
- Copy missing unistd.h headers for arc, hexagon and riscv, fixing
a reported build regression on the ARC 32-bit architecture.
perf bench numa:
Arnaldo Carvalho de Melo:
- Add define for RUSAGE_THREAD if not present, fixing the build on the
ARC architecture when only zlib and libnuma are present.
perf BPF:
Arnaldo Carvalho de Melo:
- The disassembler-four-args feature test needs -ldl on distros such as
Mageia 7.
Bo YU:
- Fix unlocking on success in perf_env__find_btf(), detected with
the coverity tool.
libtraceevent:
Leo Yan:
- Change misleading hard coded 'trace-cmd' string in error messages.
ARM hardware tracing:
Leo Yan:
- Always allocate memory for cs_etm_queue::prev_packet, fixing a segfault
when processing CoreSight perf data.
perf annotate:
Thadeu Lima de Souza Cascardo:
- Fix build on 32 bit for BPF.
perf report:
Thomas Richter:
- Report OOM in status line in the GTK UI.
core libs:
- Remove needless asm/unistd.h that, used with sys/syscall.h ended
up redefining the syscalls defines in environments such as the
ARC arch when using uClibc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Arnaldo Carvalho de Melo [Thu, 2 May 2019 13:26:23 +0000 (09:26 -0400)]
perf tools: Remove needless asm/unistd.h include fixing build in some places
We were including sys/syscall.h and asm/unistd.h, since sys/syscall.h
includes asm/unistd.h, sometimes this leads to the redefinition of
defines, breaking the build.
Noticed on ARC with uCLibc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-xjpf80o64i2ko74aj2jih0qg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 22 Apr 2019 18:21:35 +0000 (15:21 -0300)]
tools arch uapi: Copy missing unistd.h headers for arc, hexagon and riscv
Since those were introduced in:
c8ce48f06503 ("asm-generic: Make time32 syscall numbers optional")
But when the asm-generic/unistd.h was sync'ed with tools/ in:
1a787fc5ba18 ("tools headers uapi: Sync copy of asm-generic/unistd.h with the kernel sources")
I forgot to copy the files for the architectures that define
__ARCH_WANT_TIME32_SYSCALLS, so the perf build was breaking there, as
reported by Vineet Gupta for the ARC architecture.
After updating my ARC container to use the glibc based toolchain + cross
building libnuma, zlib and elfutils, I finally managed to reproduce the
problem and verify that this now is fixed and will not regress as will
be tested before each pull req sent upstream.
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
CC: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20190426193531.GC28586@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 1 May 2019 20:27:00 +0000 (16:27 -0400)]
tools build: Add -ldl to the disassembler-four-args feature test
Thomas Backlund reported that the perf build was failing on the Mageia 7
distro, that is because it uses:
cat /tmp/build/perf/feature/test-disassembler-four-args.make.output
/usr/bin/ld: /usr/lib64/libbfd.a(plugin.o): in function `try_load_plugin':
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:243:
undefined reference to `dlopen'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:271:
undefined reference to `dlsym'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:256:
undefined reference to `dlclose'
/usr/bin/ld:
/home/iurt/rpmbuild/BUILD/binutils-2.32/objs/bfd/../../bfd/plugin.c:246:
undefined reference to `dlerror'
as we allow dynamic linking and loading
Mageia 7 uses these linker flags:
$ rpm --eval %ldflags
-Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-O1 -Wl,--build-id -Wl,--enable-new-dtags
So add -ldl to this feature LDFLAGS.
Reported-by: Thomas Backlund <tmb@mageia.org>
Tested-by: Thomas Backlund <tmb@mageia.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20190501173158.GC21436@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Sun, 28 Apr 2019 08:32:27 +0000 (16:32 +0800)]
perf cs-etm: Always allocate memory for cs_etm_queue::prev_packet
Robert Walker reported a segmentation fault is observed when process
CoreSight trace data; this issue can be easily reproduced by the command
'perf report --itrace=i1000i' for decoding tracing data.
If neither the 'b' flag (synthesize branches events) nor 'l' flag
(synthesize last branch entries) are specified to option '--itrace',
cs_etm_queue::prev_packet will not been initialised. After merging the
code to support exception packets and sample flags, there introduced a
number of uses of cs_etm_queue::prev_packet without checking whether it
is valid, for these cases any accessing to uninitialised prev_packet
will cause crash.
As cs_etm_queue::prev_packet is used more widely now and it's already
hard to follow which functions have been called in a context where the
validity of cs_etm_queue::prev_packet has been checked, this patch
always allocates memory for cs_etm_queue::prev_packet.
Reported-by: Robert Walker <robert.walker@arm.com>
Suggested-by: Robert Walker <robert.walker@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Robert Walker <robert.walker@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 7100b12cf474 ("perf cs-etm: Generate branch sample for exception packet")
Fixes: 24fff5eb2b93 ("perf cs-etm: Avoid stale branch samples when flush packet")
Link: http://lkml.kernel.org/r/20190428083228.20246-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Sun, 28 Apr 2019 08:32:28 +0000 (16:32 +0800)]
perf cs-etm: Don't check cs_etm_queue::prev_packet validity
Since cs_etm_queue::prev_packet is allocated for all cases, it will
never be NULL pointer; now validity checking prev_packet is pointless,
remove all of them.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Robert Walker <robert.walker@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190428083228.20246-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Tue, 23 Apr 2019 10:53:03 +0000 (12:53 +0200)]
perf report: Report OOM in status line in the GTK UI
An -ENOMEM error is not reported in the GTK GUI. Instead this error
message pops up on the screen:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1
Processing events... [974K/3M]
Error:failed to process sample
0xf4198 [0x8]: failed to process type: 68
However when I use the same perf.data file with --stdio it works:
[root@m35lp76 perf]# ./perf report -i perf.data.error68-1 --stdio \
| head -12
# Total Lost Samples: 0
#
# Samples: 76K of event 'cycles'
# Event count (approx.):
99056160000
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .........
#
8.81% find [kernel.kallsyms] [k] ftrace_likely_update
8.74% swapper [kernel.kallsyms] [k] ftrace_likely_update
8.34% sshd [kernel.kallsyms] [k] ftrace_likely_update
2.19% kworker/u512:1- [kernel.kallsyms] [k] ftrace_likely_update
The sample precentage is a bit low.....
The GUI always fails in the FINISHED_ROUND event (68) and does not
indicate the reason why.
When happened is the following. Perf report calls a lot of functions and
down deep when a FINISHED_ROUND event is processed, these functions are
called:
perf_session__process_event()
+ perf_session__process_user_event()
+ process_finished_round()
+ ordered_events__flush()
+ __ordered_events__flush()
+ do_flush()
+ ordered_events__deliver_event()
+ perf_session__deliver_event()
+ machine__deliver_event()
+ perf_evlist__deliver_event()
+ process_sample_event()
+ hist_entry_iter_add() --> only called in GUI case!!!
+ hist_iter__report__callback()
+ symbol__inc_addr_sample()
Now this functions runs out of memory and
returns -ENOMEM. This is reported all the way up
until function
perf_session__process_event() returns to its caller, where -ENOMEM is
changed to -EINVAL and processing stops:
if ((skip = perf_session__process_event(session, event, head)) < 0) {
pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
head, event->header.size, event->header.type);
err = -EINVAL;
goto out_err;
}
This occurred in the FINISHED_ROUND event when it has to process some
10000 entries and ran out of memory.
This patch indicates the root cause and displays it in the status line
of ther perf report GUI.
Output before (on GUI status line):
0xf4198 [0x8]: failed to process type: 68
Output after:
0xf4198 [0x8]: failed to process type: 68 [not enough memory]
Committer notes:
the 'skip' variable needs to be initialized to -EINVAL, so that when the
size is less than sizeof(struct perf_event_attr) we avoid this valid
compiler warning:
util/session.c: In function ‘perf_session__process_events’:
util/session.c:1936:7: error: ‘skip’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
err = skip;
~~~~^~~~~~
util/session.c:1874:6: note: ‘skip’ was declared here
s64 skip;
^~~~
cc1: all warnings being treated as errors
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20190423105303.61683-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 25 Apr 2019 21:36:51 +0000 (18:36 -0300)]
perf bench numa: Add define for RUSAGE_THREAD if not present
While cross building perf to the ARC architecture on a fedora 30 host,
we were failing with:
CC /tmp/build/perf/bench/numa.o
bench/numa.c: In function ‘worker_thread’:
bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’?
getrusage(RUSAGE_THREAD, &rusage);
^~~~~~~~~~~~~
SIGEV_THREAD
bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in
[perfbuilder@
60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1
arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1
20190225
[perfbuilder@
60d5802468f6 perf]$
Trying to reproduce a report by Vineet, I noticed that, with just
cross-built zlib and numactl libraries, I ended up with the above
failure.
So, since RUSAGE_THREAD is available as a define, check for that and
numactl libraries, I ended up with the above failure.
So, since RUSAGE_THREAD is available as a define in the system headers,
check if it is defined in the 'perf bench numa' sources and define it if
not.
Now it builds and I have to figure out if the problem reported by Vineet
only takes place if we have libelf or some other library available.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Wed, 24 Apr 2019 01:38:02 +0000 (09:38 +0800)]
tools lib traceevent: Change tag string for error
The traceevent lib is used by the perf tool, and when executing
perf test -v 6
it outputs error log on the ARM64 platform:
running test 33 '*:*'trace-cmd: No such file or directory
[...]
trace-cmd: Invalid argument
The trace event parsing code originally came from trace-cmd so it keeps
the tag string "trace-cmd" for errors, this easily introduces the
impression that the perf tool launches trace-cmd command for trace event
parsing, but in fact the related parsing is accomplished by the
traceevent lib.
This patch changes the tag string to "libtraceevent" so that we can
avoid confusion and let users to more easily connect the error with
traceevent lib.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190424013802.27569-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thadeu Lima de Souza Cascardo [Wed, 3 Apr 2019 19:44:52 +0000 (16:44 -0300)]
perf annotate: Fix build on 32 bit for BPF annotation
Commit
6987561c9e86 ("perf annotate: Enable annotation of BPF programs") adds
support for BPF programs annotations but the new code does not build on 32-bit.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Song Liu <songliubraving@fb.com>
Fixes: 6987561c9e86 ("perf annotate: Enable annotation of BPF programs")
Link: http://lkml.kernel.org/r/20190403194452.10845-1-cascardo@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 22 Apr 2019 14:54:50 +0000 (11:54 -0300)]
tools uapi x86: Sync vmx.h with the kernel
To pick up the changes from:
2b27924bb1d4 ("KVM: nVMX: always use early vmcs check when EPT is disabled")
That causes this object in the tools/perf build process to be rebuilt:
CC /tmp/build/perf/arch/x86/util/kvm-stat.o
But it isn't using VMX_ABORT_ prefixed constants, so no change in
behaviour.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/n/tip-bjbo3zc0r8i8oa0udpvftya6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bo YU [Mon, 22 Apr 2019 08:01:38 +0000 (04:01 -0400)]
perf bpf: Return value with unlocking in perf_env__find_btf()
In perf_env__find_btf(), we're returning without unlocking
"env->bpf_progs.lock". There may be cause lockdep issue.
Detected by CoversityScan, CID#
1444762:(program hangs(LOCK))
Signed-off-by: Bo YU <tsu.yubo@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Fixes: 2db7b1e0bd49d: (perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf())
Link: http://lkml.kernel.org/r/20190422080138.10088-1-tsu.yubo@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kim Phillips [Thu, 2 May 2019 15:58:37 +0000 (15:58 +0000)]
MAINTAINERS: Include vendor specific files under arch/*/events/*
Add an explicit subdirectory specification for arch/x86/events/amd to
the MAINTAINERS file, to distinguish it from its parent. This will
produce the correct set of maintainers for the files found therein.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gary Hook <Gary.Hook@amd.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Fixes: 39b0332a2158 ("perf/x86: Move perf_event_amd.c ........... => x86/events/amd/core.c")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kim Phillips [Thu, 2 May 2019 15:29:47 +0000 (15:29 +0000)]
perf/x86/amd: Update generic hardware cache events for Family 17h
Add a new amd_hw_cache_event_ids_f17h assignment structure set
for AMD families 17h and above, since a lot has changed. Specifically:
L1 Data Cache
The data cache access counter remains the same on Family 17h.
For DC misses, PMCx041's definition changes with Family 17h,
so instead we use the L2 cache accesses from L1 data cache
misses counter (PMCx060,umask=0xc8).
For DC hardware prefetch events, Family 17h breaks compatibility
for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
Prefetch DC Fills."
L1 Instruction Cache
PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
compatible on Family 17h.
For prefetches, we remove the erroneous PMCx04B assignment which
counts how many software data cache prefetch load instructions were
dispatched.
LL - Last Level Cache
Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
on Family 17h, where the last level cache is L3. L3 counters
can be accessed using the existing AMD Uncore driver.
Data TLB
On Intel machines, data TLB accesses ("dTLB-loads") are assigned
to counters that count load/store instructions retired. This
is inconsistent with instruction TLB accesses, where Intel
implementations report iTLB misses that hit in the STLB.
Ideally, dTLB-loads would count higher level dTLB misses that hit
in lower level TLBs, and dTLB-load-misses would report those
that also missed in those lower-level TLBs, therefore causing
a page table walk. That would be consistent with instruction
TLB operation, remove the redundancy between dTLB-loads and
L1-dcache-loads, and prevent perf from producing artificially
low percentage ratios, i.e. the "0.01%" below:
42,550,869 L1-dcache-loads
41,591,860 dTLB-loads
4,802 dTLB-load-misses # 0.01% of all dTLB cache hits
7,283,682 L1-dcache-stores
7,912,392 dTLB-stores
310 dTLB-store-misses
On AMD Families prior to 17h, the "Data Cache Accesses" counter is
used, which is slightly better than load/store instructions retired,
but still counts in terms of individual load/store operations
instead of TLB operations.
So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
to a counter for L1 dTLB misses that hit in the L2 dTLB, and
"dTLB-load-misses" to a counter for L1 DTLB misses that caused
L2 DTLB misses and therefore also caused page table walks. This
results in a much more accurate view of data TLB performance:
60,961,781 L1-dcache-loads
4,601 dTLB-loads
963 dTLB-load-misses # 20.93% of all dTLB cache hits
Note that for all AMD families, data loads and stores are combined
in a single accesses counter, so no 'L1-dcache-stores' are reported
separately, and stores are counted with loads in 'L1-dcache-loads'.
Also note that the "% of all dTLB cache hits" string is misleading
because (a) "dTLB cache": although TLBs can be considered caches for
page tables, in this context, it can be misinterpreted as data cache
hits because the figures are similar (at least on Intel), and (b) not
all those loads (technically accesses) technically "hit" at that
hardware level. "% of all dTLB accesses" would be more clear/accurate.
Instruction TLB
On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
the STLB and completed a page table walk.
For AMD Family 17h and above, for 'iTLB-loads' we replace the
erroneous instruction cache fetches counter with PMCx084
"L1 ITLB Miss, L2 ITLB Hit".
For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
L2 ITLB Miss", but set a 0xff umask because without it the event
does not get counted.
Branch Predictor (BPU)
PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
Node Level Events
Family 17h does not have a PMCx0e9 counter, and corresponding counters
have not been made available publicly, so for now, we mark them as
unsupported for Families 17h and above.
Reference:
"Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
Released 7/17/2018, Publication #56255, Revision 3.03:
https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
[ mingo: tidied up the line breaks. ]
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 1 May 2019 21:57:23 +0000 (14:57 -0700)]
Merge tag 'for-v5.1-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Two more fixes for the 5.1 cycle.
One division by zero fix in a specific driver and one core workaround
for bad userspace behaviour from systemd regarding uevents. IMHO this
can be considered to be a userspace bug, but the debug messages are
useless anyways
- cpcap-battery: fix a division by zero
- core: fix systemd issue due to log messages produced by uevent"
* tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
power: supply: cpcap-battery: Fix division by zero
Linus Torvalds [Wed, 1 May 2019 20:40:30 +0000 (13:40 -0700)]
Merge tag 'arc-5.1-final' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"A few minor fixes for ARC.
- regression in memset if line size !64
- avoid panic if PAE and IOC"
* tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: memset: fix build with L1_CACHE_SHIFT != 6
ARC: [hsdk] Make it easier to add PAE40 region to DTB
ARC: PAE40: don't panic and instead turn off hw ioc
Linus Torvalds [Wed, 1 May 2019 20:03:39 +0000 (13:03 -0700)]
Merge tag 'acpi-5.1-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a recent ACPICA change that caused initialization to fail on
systems with Thunderbolt docking stations connected at the init time"
* tag 'acpi-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Clear status of GPEs before enabling them"
Linus Torvalds [Wed, 1 May 2019 19:19:20 +0000 (12:19 -0700)]
gcc-9: don't warn about uninitialized btrfs extent_type variable
The 'extent_type' variable does seem to be reliably initialized, but
it's _very_ non-obvious, since there's a "goto next" case that jumps
over the normal initialization. That will then always trigger the
"start >= extent_end" test, which will end up never falling through to
the use of that variable.
But the code is certainly not obvious, and the compiler warning looks
reasonable. Make 'extent_type' an int, and initialize it to an invalid
negative value, which seems to be the common pattern in other places.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 1 May 2019 18:20:53 +0000 (11:20 -0700)]
gcc-9: properly declare the {pv,hv}clock_page storage
The pvlock_page and hvclock_page variables are (as the name implies)
addresses to pages, created by the linker script.
But we declared them as just "extern u8" variables, which _works_, but
now that gcc does some more bounds checking, it causes warnings like
warning: array subscript 1 is outside array bounds of ‘u8[1]’
when we then access more than one byte from those variables.
Fix this by simply making the declaration of the variables match
reality, which makes the compiler happy too.
Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
Linus Torvalds [Wed, 1 May 2019 18:07:40 +0000 (11:07 -0700)]
gcc-9: don't warn about uninitialized variable
I'm not sure what made gcc warn about this code now. The 'ret' variable
does end up initialized in all cases, but it's definitely not obvious,
so the compiler is quite reasonable to warn about this.
So just add initialization to make it all much more obvious both to
compilers and to humans.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 1 May 2019 18:05:41 +0000 (11:05 -0700)]
gcc-9: silence 'address-of-packed-member' warning
We already did this for clang, but now gcc has that warning too. Yes,
yes, the address may be unaligned. And that's kind of the point.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 30 Apr 2019 22:03:00 +0000 (15:03 -0700)]
Merge tag 'fsnotify_for_v5.1-rc8' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify fix from Jan Kara:
"A fix of user trigerable NULL pointer dereference syzbot has recently
spotted.
The problem was introduced in this merge window so no CC stable is
needed"
* tag 'fsnotify_for_v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
Rafael J. Wysocki [Tue, 30 Apr 2019 09:18:21 +0000 (11:18 +0200)]
Revert "ACPICA: Clear status of GPEs before enabling them"
Revert commit
c8b1917c8987 ("ACPICA: Clear status of GPEs before
enabling them") that causes problems with Thunderbolt controllers
to occur if a dock device is connected at init time (the xhci_hcd
and thunderbolt modules crash which prevents peripherals connected
through them from working).
Commit
c8b1917c8987 effectively causes commit
ecc1165b8b74 ("ACPICA:
Dispatch active GPEs at init time") to get undone, so the problem
addressed by commit
ecc1165b8b74 appears again as a result of it.
Fixes: c8b1917c8987 ("ACPICA: Clear status of GPEs before enabling them")
Link: https://lore.kernel.org/lkml/s5hy33siofw.wl-tiwai@suse.de/T/#u
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
Reported-by: Michael Hirmke <opensuse@mike.franken.de>
Reported-by: Takashi Iwai <tiwai@suse.de>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Tue, 30 Apr 2019 15:41:22 +0000 (08:41 -0700)]
Merge tag 'usb-5.1-rc8' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for a bunch of warnings/errors that the
syzbot has been finding with it's new-found ability to stress-test the
USB layer.
All of these are tiny, but fix real issues, and are marked for stable
as well. All of these have had lots of testing in linux-next as well"
* tag 'usb-5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: w1 ds2490: Fix bug caused by improper use of altsetting array
USB: yurex: Fix protection fault after device removal
usb: usbip: fix isoc packet num validation in get_pipe
USB: core: Fix bug caused by duplicate interface PM usage counter
USB: dummy-hcd: Fix failure to give back unlinked URBs
USB: core: Fix unterminated string returned by usb_string()
Linus Torvalds [Tue, 30 Apr 2019 15:38:02 +0000 (08:38 -0700)]
Merge tag 'selinux-pr-
20190429' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One small patch for the stable folks to fix a problem when building
against the latest glibc.
I'll be honest and say that I'm not really thrilled with the idea of
sending this up right now, but Greg is a little annoyed so here I
figured I would at least send this"
* tag 'selinux-pr-
20190429' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: use kernel linux/socket.h for genheaders and mdp
Linus Torvalds [Mon, 29 Apr 2019 20:24:34 +0000 (13:24 -0700)]
Merge tag 'seccomp-v5.1-rc8' of git://git./linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"Syzbot found a use-after-free bug in seccomp due to flags that should
not be allowed to be used together.
Tycho fixed this, I updated the self-tests, and the syzkaller PoC has
been running for several days without triggering KASan (before this
fix, it would reproduce). These patches have also been in -next for
almost a week, just to be sure.
- Add logic for making some seccomp flags exclusive (Tycho)
- Update selftests for exclusivity testing (Kees)"
* tag 'seccomp-v5.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Make NEW_LISTENER and TSYNC flags exclusive
selftests/seccomp: Prepare for exclusive seccomp flags
Linus Torvalds [Mon, 29 Apr 2019 16:51:29 +0000 (09:51 -0700)]
x86: make ZERO_PAGE() at least parse its argument
This doesn't really do anything, but at least we now parse teh
ZERO_PAGE() address argument so that we'll catch the most obvious errors
in usage next time they'll happen.
See commit
6a5c5d26c4c6 ("rdma: fix build errors on s390 and MIPS due to
bad ZERO_PAGE use") what happens when we don't have any use of the macro
argument at all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 29 Apr 2019 16:48:53 +0000 (09:48 -0700)]
rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use
The parameter to ZERO_PAGE() was wrong, but since all architectures
except for MIPS and s390 ignore it, it wasn't noticed until 0-day
reported the build error.
Fixes: 67f269b37f9b ("RDMA/ucontext: Fix regression with disassociate")
Cc: stable@vger.kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paulo Alcantara [Mon, 25 Feb 2019 00:55:28 +0000 (21:55 -0300)]
selinux: use kernel linux/socket.h for genheaders and mdp
When compiling genheaders and mdp from a newer host kernel, the
following error happens:
In file included from scripts/selinux/genheaders/genheaders.c:18:
./security/selinux/include/classmap.h:238:2: error: #error New
address family defined, please update secclass_map. #error New
address family defined, please update secclass_map. ^~~~~
make[3]: *** [scripts/Makefile.host:107:
scripts/selinux/genheaders/genheaders] Error 1 make[2]: ***
[scripts/Makefile.build:599: scripts/selinux/genheaders] Error 2
make[1]: *** [scripts/Makefile.build:599: scripts/selinux] Error 2
make[1]: *** Waiting for unfinished jobs....
Instead of relying on the host definition, include linux/socket.h in
classmap.h to have PF_MAX.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara <paulo@paulo.ac>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: manually merge in mdp.c, subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Mon, 29 Apr 2019 00:04:13 +0000 (17:04 -0700)]
Linux 5.1-rc7
Jan Kara [Wed, 24 Apr 2019 16:39:57 +0000 (18:39 +0200)]
fsnotify: Fix NULL ptr deref in fanotify_get_fsid()
fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
happen that it sees mark not fully initialized or mark that is already
detached from the object list. In these cases mark->connector
can be NULL leading to NULL ptr dereference. Fix the problem by
being careful when reading mark->connector and check it for being NULL.
Also use WRITE_ONCE when writing the mark just to prevent compiler from
doing something stupid.
Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Sun, 28 Apr 2019 17:50:57 +0000 (10:50 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A small number of ARM fixes
- Fix function tracer and unwinder dependencies so that we don't end
up building kernels that will crash
- Fix ARMv7M nommu initialisation (missing register initialisation)
- Fix EFI decompressor entry (ensuring barrier instructions are
enabled prior to use)"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache
ARM: 8856/1: NOMMU: Fix CCR register faulty initialization when MPU is disabled
ARM: fix function graph tracer and unwinder dependencies
Linus Torvalds [Sun, 28 Apr 2019 17:43:15 +0000 (10:43 -0700)]
Merge tag 'powerpc-5.1-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A one-liner to make our Radix MMU support depend on HUGETLB_PAGE. We
use some of the hugetlb inlines (eg. pud_huge()) when operating on the
linear mapping and if they're compiled into empty wrappers we can
corrupt memory.
Then two fixes to our VFIO IOMMU code. The first is not a regression
but fixes the locking to avoid a user-triggerable deadlock.
The second does fix a regression since rc1, and depends on the first
fix. It makes it possible to run guests with large amounts of memory
again (~256GB).
Thanks to Alexey Kardashevskiy"
* tag 'powerpc-5.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm_iommu: Allow pinning large regions
powerpc/mm_iommu: Fix potential deadlock
powerpc/mm/radix: Make Radix require HUGETLB_PAGE
Linus Torvalds [Sun, 28 Apr 2019 17:06:32 +0000 (10:06 -0700)]
Merge tag 'for-linus-
20190428' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of io_uring fixes that should go into this release. In
particular, this contains:
- The mutex lock vs ctx ref count fix (me)
- Removal of a dead variable (me)
- Two race fixes (Stefan)
- Ring head/tail condition fix for poll full SQ detection (Stefan)"
* tag 'for-linus-
20190428' of git://git.kernel.dk/linux-block:
io_uring: remove 'state' argument from io_{read,write} path
io_uring: fix poll full SQ detection
io_uring: fix race condition when sq threads goes sleeping
io_uring: fix race condition reading SQ entries
io_uring: fail io_uring_register(2) on a dying io_uring instance
Linus Torvalds [Sun, 28 Apr 2019 17:00:45 +0000 (10:00 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"One core bug fix and a few driver ones
- FRWR memory registration for hfi1/qib didn't work with with some
iovas causing a NFSoRDMA failure regression due to a fix in the NFS
side
- A command flow error in mlx5 allowed user space to send a corrupt
command (and also smash the kernel stack we've since learned)
- Fix a regression and some bugs with device hot unplug that was
discovered while reviewing Andrea's patches
- hns has a failure if the user asks for certain QP configurations"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/hns: Bugfix for mapping user db
RDMA/ucontext: Fix regression with disassociate
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
RDMA/mlx5: Do not allow the user to write to the clock page
IB/mlx5: Fix scatter to CQE in DCT QP creation
IB/rdmavt: Fix frwr memory registration
Linus Torvalds [Sun, 28 Apr 2019 16:45:18 +0000 (09:45 -0700)]
Merge tag 'dmaengine-fix-5.1-rc7' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
- fix for wrong register use in mediatek driver
- fix in sh driver for glitch is tx_status and treating 0 a valid
residue for cyclic
- fix in bcm driver for using right memory allocation flag
* tag 'dmaengine-fix-5.1-rc7' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: mediatek-cqdma: fix wrong register usage in mtk_cqdma_start
dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status
dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid
dmaengine: bcm2835: Avoid GFP_KERNEL in device_prep_slave_sg
Linus Torvalds [Sat, 27 Apr 2019 23:27:02 +0000 (16:27 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Just a couple of fixups for Synaptics RMI4 driver and allowing
snvs_pwrkey to be selected on more boards"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - write config register values to the right offset
Input: synaptics-rmi4 - fix possible double free
Input: snvs_pwrkey - make it depend on ARCH_MXC
Linus Torvalds [Sat, 27 Apr 2019 17:21:29 +0000 (10:21 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix an early boot crash in the RSDP parsing code by effectively
turning off the parsing call - we ran out of time but want to fix the
regression. The more involved fix is being worked on.
- Fix a crash that can trigger in the kmemlek code.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix a crash with kmemleak_scan()
x86/boot: Disable RSDP parsing temporarily
Linus Torvalds [Sat, 27 Apr 2019 17:18:40 +0000 (10:18 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a division by zero bug that can trigger in the NUMA placement
code"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Fix a possible divide-by-zero
Linus Torvalds [Sat, 27 Apr 2019 16:41:14 +0000 (09:41 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"A cstate event enumeration fix for Kaby/Coffee Lake CPUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
Linus Torvalds [Thu, 25 Apr 2019 23:13:58 +0000 (16:13 -0700)]
slip: make slhc_free() silently accept an error pointer
This way, slhc_free() accepts what slhc_init() returns, whether that is
an error or not.
In particular, the pattern in sl_alloc_bufs() is
slcomp = slhc_init(16, 16);
...
slhc_free(slcomp);
for the error handling path, and rather than complicate that code, just
make it ok to always free what was returned by the init function.
That's what the code used to do before commit
4ab42d78e37a ("ppp, slip:
Validate VJ compression slot parameters completely") when slhc_init()
just returned NULL for the error case, with no actual indication of the
details of the error.
Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com
Fixes: 4ab42d78e37a ("ppp, slip: Validate VJ compression slot parameters completely")
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 27 Apr 2019 01:15:33 +0000 (18:15 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
mm/page_alloc.c: fix never set ALLOC_NOFRAGMENT flag
mm/page_alloc.c: avoid potential NULL pointer dereference
mm, page_alloc: always use a captured page regardless of compaction result
mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
lib/test_vmalloc.c: do not create cpumask_t variable on stack
lib/Kconfig.debug: fix build error without CONFIG_BLOCK
zram: pass down the bvec we need to read into in the work struct
mm/memory_hotplug.c: drop memory device reference after find_memory_block()
Lucas Stach [Sat, 27 Apr 2019 00:22:01 +0000 (17:22 -0700)]
Input: synaptics-rmi4 - write config register values to the right offset
Currently any changed config register values don't take effect, as the
function to write them back is called with the wrong register offset.
Fixes: ff8f83708b3e (Input: synaptics-rmi4 - add support for 2D
sensors and F11)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Fri, 26 Apr 2019 18:26:53 +0000 (11:26 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- keep the tail of an unaligned initrd reserved
- adjust ftrace_make_call() to deal with the relative nature of PLTs
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/module: ftrace: deal with place relative nature of PLTs
arm64: mm: Ensure tail of unaligned initrd is reserved
Linus Torvalds [Fri, 26 Apr 2019 18:09:55 +0000 (11:09 -0700)]
Merge tag 'trace-v5.1-rc6' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Three tracing fixes:
- Use "nosteal" for ring buffer splice pages
- Memory leak fix in error path of trace_pid_write()
- Fix preempt_enable_no_resched() (use preempt_enable()) in ring
buffer code"
* tag 'trace-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
trace: Fix preempt_enable_no_resched() abuse
tracing: Fix a memory leak by early error exit in trace_pid_write()
tracing: Fix buffer_ref pipe ops
Linus Torvalds [Fri, 26 Apr 2019 17:46:22 +0000 (10:46 -0700)]
Merge tag 'gpio-v5.1-3' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Not much to say about them, regular fixes:
- Fix a bug on the errorpath of gpiochip_add_data_with_key()
- IRQ type setting on the spreadtrum GPIO driver"
* tag 'gpio-v5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Fix gpiochip_add_data_with_key() error path
gpio: eic: sprd: Fix incorrect irq type setting for the sync EIC
Linus Torvalds [Fri, 26 Apr 2019 17:39:46 +0000 (10:39 -0700)]
Merge tag 'drm-fixes-2019-04-26' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular drm fixes, nothing too outstanding, I'm guessing Easter was
slowing people down.
i915:
- FEC enable fix
- BXT display lanes fix
ttm:
- fix reinit for reloading drivers regression
imx:
- DP CSC fix
sun4i:
- module unload/load fix
vc4:
- memory leak fix
- compile fix
dw-hdmi:
- rockchip scdc overflow fix
sched:
- docs fix
vmwgfx:
- dma api layering fix"
* tag 'drm-fixes-2019-04-26' of git://anongit.freedesktop.org/drm/drm:
drm/bridge: dw-hdmi: fix SCDC configuration for ddc-i2c-bus
drm/vmwgfx: Fix dma API layer violation
drm/vc4: Fix compilation error reported by kbuild test bot
drm/sun4i: Unbind components before releasing DRM and memory
drm/vc4: Fix memory leak during gpu reset.
drm/sched: Fix description of drm_sched_stop
drm/imx: don't skip DP channel disable for background plane
gpu: ipu-v3: dp: fix CSC handling
drm/ttm: fix re-init of global structures
drm/sun4i: Fix component unbinding and component master deletion
drm/sun4i: Set device driver data at bind time for use in unbind
drm/sun4i: Add missing drm_atomic_helper_shutdown at driver unbind
drm/i915: Restore correct bxt_ddi_phy_calc_lane_lat_optim_mask() calculation
drm/i915: Do not enable FEC without DSC
drm: bridge: dw-hdmi: Fix overflow workaround for Rockchip SoCs
Linus Torvalds [Fri, 26 Apr 2019 16:46:46 +0000 (09:46 -0700)]
Merge tag 'for-5.1-rc6-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One patch to fix a crash in io submission path, due to memory
allocation errors.
In short, the multipage bio work that landed in 5.1 caused larger bios
that in turn require larger temporary memory for checksums. The patch
is a workaround, we're going to rework the allocation so it does not
require the vmalloc fallback.
It took a while to identify that it's caused by patches in 5.1 and not
a patchset that did some changes in error handling in the code. I've
tested it on various memory/cpu combinations, it could hit OOM but
does not crash.
The timestamp of the patch is less than a day due to updates in the
changelog, tests were running meanwhile"
* tag 'for-5.1-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Switch memory allocations in async csum calculation path to kvmalloc
Linus Torvalds [Fri, 26 Apr 2019 16:45:39 +0000 (09:45 -0700)]
Merge tag '5.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small SMB3 fixes (all for stable as well): two leaks and a
rename bug"
* tag '5.1-rc6-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix page reference leak with readv/writev
cifs: do not attempt cifs operation on smb2+ rename error
cifs: fix memory leak in SMB2_read
YueHaibing [Fri, 26 Apr 2019 05:24:05 +0000 (22:24 -0700)]
fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
Syzkaller report this:
sysctl could not get directory: /net//bridge -12
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
RSP: 0018:
ffff8881bb507778 EFLAGS:
00010206
RAX:
dffffc0000000000 RBX:
ffff8881f224b5b8 RCX:
ffffffff818f3f6a
RDX:
000000000000000a RSI:
0000000000000050 RDI:
ffff8881f224b568
RBP:
0000000000000000 R08:
ffffed10376a0ef4 R09:
ffffed10376a0ef4
R10:
0000000000000001 R11:
ffffed10376a0ef4 R12:
ffff8881f224b558
R13:
0000000000000000 R14:
0000000000000000 R15:
0000000000000000
FS:
00007f3e7ce13700(0000) GS:
ffff8881f7300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fd60fbe9398 CR3:
00000001cb55c001 CR4:
00000000007606e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
erase_entry fs/proc/proc_sysctl.c:178 [inline]
erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
start_unregistering fs/proc/proc_sysctl.c:331 [inline]
drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
get_subdir fs/proc/proc_sysctl.c:1022 [inline]
__register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
br_netfilter_init+0x68/0x1000 [br_netfilter]
do_one_initcall+0xbc/0x47d init/main.c:901
do_init_module+0x1b5/0x547 kernel/module.c:3456
load_module+0x6405/0x8c10 kernel/module.c:3804
__do_sys_finit_module+0x162/0x190 kernel/module.c:3898
do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace
68741688d5fbfe85 ]---
commit
23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer
dereference in put_links") forgot to handle start_unregistering() case,
while header->parent is NULL, it calls erase_header() and as seen in the
above syzkaller call trace, accessing &header->parent->root will trigger
a NULL pointer dereference.
As that commit explained, there is also no need to call
start_unregistering() if header->parent is NULL.
Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
Fixes: 0e47c99d7fe25 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Fri, 26 Apr 2019 05:24:01 +0000 (22:24 -0700)]
mm/page_alloc.c: fix never set ALLOC_NOFRAGMENT flag
Commit
0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
removed setting of the ALLOC_NOFRAGMENT flag. Bring it back.
The runtime effect is that ALLOC_NOFRAGMENT behaviour is restored so
that allocations are spread across local zones to avoid fragmentation
due to mixing pageblocks as long as possible.
Link: http://lkml.kernel.org/r/20190423120806.3503-2-aryabinin@virtuozzo.com
Fixes: 0a79cdad5eb2 ("mm: use alloc_flags to record if kswapd can wake")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Ryabinin [Fri, 26 Apr 2019 05:23:58 +0000 (22:23 -0700)]
mm/page_alloc.c: avoid potential NULL pointer dereference
ac.preferred_zoneref->zone passed to alloc_flags_nofragment() can be NULL.
'zone' pointer unconditionally derefernced in alloc_flags_nofragment().
Bail out on NULL zone to avoid potential crash. Currently we don't see
any crashes only because alloc_flags_nofragment() has another bug which
allows compiler to optimize away all accesses to 'zone'.
Link: http://lkml.kernel.org/r/20190423120806.3503-1-aryabinin@virtuozzo.com
Fixes: 6bb154504f8b ("mm, page_alloc: spread allocations across zones before introducing fragmentation")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 26 Apr 2019 05:23:54 +0000 (22:23 -0700)]
mm, page_alloc: always use a captured page regardless of compaction result
During the development of commit
5e1f0f098b46 ("mm, compaction: capture
a page under direct compaction"), a paranoid check was added to ensure
that if a captured page was available after compaction that it was
consistent with the final state of compaction. The intent was to catch
serious programming bugs such as using a stale page pointer and causing
corruption problems.
However, it is possible to get a captured page even if compaction was
unsuccessful if an interrupt triggered and happened to free pages in
interrupt context that got merged into a suitable high-order page. It's
highly unlikely but Li Wang did report the following warning on s390
occuring when testing OOM handling. Note that the warning is slightly
edited for clarity.
WARNING: CPU: 0 PID: 9783 at mm/page_alloc.c:3777 __alloc_pages_direct_compact+0x182/0x190
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
lockd grace fscache sunrpc pkey ghash_s390 prng xts aes_s390
des_s390 des_generic sha512_s390 zcrypt_cex4 zcrypt vmur binfmt_misc
ip_tables xfs libcrc32c dasd_fba_mod qeth_l2 dasd_eckd_mod dasd_mod
qeth qdio lcs ctcm ccwgroup fsm dm_mirror dm_region_hash dm_log
dm_mod
CPU: 0 PID: 9783 Comm: copy.sh Kdump: loaded Not tainted 5.1.0-rc 5 #1
This patch simply removes the check entirely instead of trying to be
clever about pages freed from interrupt context. If a serious
programming error was introduced, it is highly likely to be caught by
prep_new_page() instead.
Link: http://lkml.kernel.org/r/20190419085133.GH18914@techsingularity.net
Fixes: 5e1f0f098b46 ("mm, compaction: capture a page under direct compaction")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Li Wang <liwang@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 26 Apr 2019 05:23:51 +0000 (22:23 -0700)]
mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
Mikulas Patocka reported that commit
1c30844d2dfe ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") "broke"
memory management on parisc.
The machine is not NUMA but the DISCONTIG model creates three pgdats
even though it's a UMA machine for the following ranges
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
Mikulas reported:
With the patch
1c30844d2, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
For example, if I run:
# dd if=/dev/sda of=/dev/null bs=1M count=2048
- with the proper kernel, there should be "Buffers - 2GiB"
when this command finishes. With the patch
1c30844d2, buffers
will consume just 1GiB or slightly more, because the kernel was
incorrectly reclaiming them.
The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis. Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.
DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.
Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uladzislau Rezki (Sony) [Fri, 26 Apr 2019 05:23:47 +0000 (22:23 -0700)]
lib/test_vmalloc.c: do not create cpumask_t variable on stack
On my "Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz" system(12 CPUs) i get the
warning from the compiler about frame size:
warning: the frame size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
the size of cpumask_t depends on number of CPUs, therefore just make use
of cpumask_of() in set_cpus_allowed_ptr() as a second argument.
Link: http://lkml.kernel.org/r/20190418193925.9361-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
YueHaibing [Fri, 26 Apr 2019 05:23:44 +0000 (22:23 -0700)]
lib/Kconfig.debug: fix build error without CONFIG_BLOCK
If CONFIG_TEST_KMOD is set to M, while CONFIG_BLOCK is not set, XFS and
BTRFS can not be compiled successly.
Link: http://lkml.kernel.org/r/20190410075434.35220-1-yuehaibing@huawei.com
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jérôme Glisse [Fri, 26 Apr 2019 05:23:41 +0000 (22:23 -0700)]
zram: pass down the bvec we need to read into in the work struct
When scheduling work item to read page we need to pass down the proper
bvec struct which points to the page to read into. Before this patch it
uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is
wrong.
Note that without this patch on arch/kernel where PAGE_SIZE != 4096
userspace could read random memory through a zram block device (thought
userspace probably would have no control on the address being read).
Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Hildenbrand [Fri, 26 Apr 2019 05:23:37 +0000 (22:23 -0700)]
mm/memory_hotplug.c: drop memory device reference after find_memory_block()
Right now we are using find_memory_block() to get the node id for the
pfn range to online. We are missing to drop a reference to the memory
block device. While the device still gets unregistered via
device_unregister(), resulting in no user visible problem, the device is
never released via device_release(), resulting in a memory leak. Fix
that by properly using a put_device().
Link: http://lkml.kernel.org/r/20190411110955.1430-1-david@redhat.com
Fixes: d0dc12e86b31 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Tue, 23 Apr 2019 20:03:18 +0000 (22:03 +0200)]
trace: Fix preempt_enable_no_resched() abuse
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net
Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Wenwen Wang [Sat, 20 Apr 2019 02:22:59 +0000 (21:22 -0500)]
tracing: Fix a memory leak by early error exit in trace_pid_write()
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.
To fix this issue, free the allocated buffer when an error is encountered.
Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
Fixes: f4d34a87e9c10 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jann Horn [Thu, 4 Apr 2019 21:59:25 +0000 (23:59 +0200)]
tracing: Fix buffer_ref pipe ops
This fixes multiple issues in buffer_pipe_buf_ops:
- The ->steal() handler must not return zero unless the pipe buffer has
the only reference to the page. But generic_pipe_buf_steal() assumes
that every reference to the pipe is tracked by the page's refcount,
which isn't true for these buffers - buffer_pipe_buf_get(), which
duplicates a buffer, doesn't touch the page's refcount.
Fix it by using generic_pipe_buf_nosteal(), which refuses every
attempted theft. It should be easy to actually support ->steal, but the
only current users of pipe_buf_steal() are the virtio console and FUSE,
and they also only use it as an optimization. So it's probably not worth
the effort.
- The ->get() and ->release() handlers can be invoked concurrently on pipe
buffers backed by the same struct buffer_ref. Make them safe against
concurrency by using refcount_t.
- The pointers stored in ->private were only zeroed out when the last
reference to the buffer_ref was dropped. As far as I know, this
shouldn't be necessary anyway, but if we do it, let's always do it.
Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Shun-Chih Yu [Thu, 25 Apr 2019 03:53:50 +0000 (11:53 +0800)]
dmaengine: mediatek-cqdma: fix wrong register usage in mtk_cqdma_start
This patch fixes wrong register usage in the mtk_cqdma_start. The
destination register should be MTK_CQDMA_DST2 instead.
Fixes: b1f01e48df5a ("dmaengine: mediatek: Add MediaTek Command-Queue DMA controller for MT6765 SoC")
Signed-off-by: Shun-Chih Yu <shun-chih.yu@mediatek.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Dave Airlie [Fri, 26 Apr 2019 00:32:57 +0000 (10:32 +1000)]
Merge tag 'imx-drm-fixes-2019-04-25' of git://git.pengutronix.de/pza/linux into drm-fixes
drm/imx: fix DP CSC handling
- Fix the DP color space conversion matrix setup to avoid bugs where
disabling the overlay plane while both primary and overlay plane are
routed via the CSC unit would not reconfigure the CSC routing
properly, leaving the display in a nonworking state, or the CSC
setting from a previously set mode would be left behind, causing
wrong colors when reenabling the display in certain configurations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1556183136.2271.3.camel@pengutronix.de
Dave Airlie [Fri, 26 Apr 2019 00:30:17 +0000 (10:30 +1000)]
Merge branch 'vmwgfx-fixes-5.1' of git://people.freedesktop.org/~thomash/linux into drm-fixes
A single fix for a layer violation requested by Cristoph.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thellstrom@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425204100.3982-1-thellstrom@vmware.com
Dave Airlie [Fri, 26 Apr 2019 00:29:07 +0000 (10:29 +1000)]
Merge tag 'drm-misc-fixes-2019-04-25' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- sun4i: Fix module loading / unloading
- vc4: Fix a compilation error and memory leak
- dw-hdmi: Fix an overflow on Rockchip and SCDC configuration
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190425132739.pngmfiqucqmulxkz@flea
Dave Airlie [Fri, 26 Apr 2019 00:25:57 +0000 (10:25 +1000)]
Merge branch 'drm-fixes-5.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
- ttm regression fix
- sched documentation fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424230120.3423-1-alexander.deucher@amd.com
Dave Airlie [Fri, 26 Apr 2019 00:13:49 +0000 (10:13 +1000)]
Merge tag 'drm-intel-fixes-2019-04-24' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
A fix for display lanes calculation for BXT and a protection
to avoid enabling FEC without DSC.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424215359.GA26100@intel.com
Tycho Andersen [Wed, 6 Mar 2019 20:14:13 +0000 (13:14 -0700)]
seccomp: Make NEW_LISTENER and TSYNC flags exclusive
As the comment notes, the return codes for TSYNC and NEW_LISTENER
conflict, because they both return positive values, one in the case of
success and one in the case of error. So, let's disallow both of these
flags together.
While this is technically a userspace break, all the users I know
of are still waiting on me to land this feature in libseccomp, so I
think it'll be safe. Also, at present my use case doesn't require
TSYNC at all, so this isn't a big deal to disallow. If someone
wanted to support this, a path forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN,
but the use cases are so different I don't see it really happening.
Finally, it's worth noting that this does actually fix a UAF issue: at the
end of seccomp_set_mode_filter(), we have:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);
But if ret > 0 because TSYNC raced, we'll install the listener fd and then
free the filter out from underneath it, causing a UAF when the task closes
it or dies. This patch also switches the condition to be simply if (ret),
so that if someone does add the flag mentioned above, they won't have to
remember to fix this too.
Reported-by: syzbot+b562969adb2e04af3442@syzkaller.appspotmail.com
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
CC: stable@vger.kernel.org # v5.0+
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Kees Cook [Wed, 24 Apr 2019 16:32:55 +0000 (09:32 -0700)]
selftests/seccomp: Prepare for exclusive seccomp flags
Some seccomp flags will become exclusive, so the selftest needs to
be adjusted to mask those out and test them individually for the "all
flags" tests.
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Andrey Smirnov [Wed, 24 Apr 2019 07:16:10 +0000 (00:16 -0700)]
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
Fix a similar endless event loop as was done in commit
8dcf32175b4e ("i2c: prevent endless uevent loop with
CONFIG_I2C_DEBUG_CORE"):
The culprit is the dev_dbg printk in the i2c uevent handler. If
this is activated (for instance by CONFIG_I2C_DEBUG_CORE) it results
in an endless loop with systemd-journald.
This happens if user-space scans the system log and reads the uevent
file to get information about a newly created device, which seems
fair use to me. Unfortunately reading the "uevent" file uses the
same function that runs for creating the uevent for a new device,
generating the next syslog entry
Both CONFIG_I2C_DEBUG_CORE and CONFIG_POWER_SUPPLY_DEBUG were reported
in https://bugs.freedesktop.org/show_bug.cgi?id=76886 but only former
seems to have been fixed. Drop debug prints as it was done in I2C
subsystem to resolve the issue.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Cc: Chris Healy <cphealy@gmail.com>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Xie XiuQi [Sat, 20 Apr 2019 08:34:16 +0000 (16:34 +0800)]
sched/numa: Fix a possible divide-by-zero
sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.
So then if in numa_get_avg_runtime() the expression:
'now - p->last_task_numa_placement'
ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.
[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cj.chengjian@huawei.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Thu, 25 Apr 2019 17:48:50 +0000 (10:48 -0700)]
Merge tag 'ceph-for-5.1-rc7' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"dentry name handling fixes from Jeff and a memory leak fix from Zheng.
Both are old issues, marked for stable"
* tag 'ceph-for-5.1-rc7' of git://github.com/ceph/ceph-client:
ceph: fix ci->i_head_snapc leak
ceph: handle the case where a dentry has been renamed on outstanding req
ceph: ensure d_name stability in ceph_dentry_hash()
ceph: only use d_name directly when parent is locked
Linus Torvalds [Thu, 25 Apr 2019 16:15:03 +0000 (09:15 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a bug in xts and lrw where they may sleep in an atomic
context"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: lrw - Fix atomic sleep when walking skcipher
crypto: xts - Fix atomic sleep when walking skcipher
Lijun Ou [Tue, 23 Apr 2019 09:30:26 +0000 (17:30 +0800)]
RDMA/hns: Bugfix for mapping user db
When the maximum send wr delivered by the user is zero, the qp does not
have a sq.
When allocating the sq db buffer to store the user sq pi pointer and map
it to the kernel mode, max_send_wr is used as the trigger condition, while
the kernel does not consider the max_send_wr trigger condition when
mapmping db. It will cause sq record doorbell map fail and create qp fail.
The failed print information as follows:
hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000
hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff
hns3 0000:7d:00.1: sq record doorbell map failed!
hns3 0000:7d:00.1: Create RC QP failed
Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Nikolay Borisov [Mon, 1 Apr 2019 08:29:58 +0000 (11:29 +0300)]
btrfs: Switch memory allocations in async csum calculation path to kvmalloc
Recent multi-page biovec rework allowed creation of bios that can span
large regions - up to 128 megabytes in the case of btrfs. OTOH btrfs'
submission path currently allocates a contiguous array to store the
checksums for every bio submitted. This means we can request up to
(128mb / BTRFS_SECTOR_SIZE) * 4 bytes + 32bytes of memory from kmalloc.
On busy systems with possibly fragmented memory said kmalloc can fail
which will trigger BUG_ON due to improper error handling IO submission
context in btrfs.
Until error handling is improved or bios in btrfs limited to a more
manageable size (e.g. 1m) let's use kvmalloc to fallback to vmalloc for
such large allocations. There is no hard requirement that the memory
allocated for checksums during IO submission has to be contiguous, but
this is a simple fix that does not require several non-contiguous
allocations.
For small writes this is unlikely to have any visible effect since
kmalloc will still satisfy allocation requests as usual. For larger
requests the code will just fallback to vmalloc.
We've performed evaluation on several workload types and there was no
significant difference kmalloc vs kvmalloc.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Alan Stern [Mon, 22 Apr 2019 15:16:04 +0000 (11:16 -0400)]
USB: w1 ds2490: Fix bug caused by improper use of altsetting array
The syzkaller USB fuzzer spotted a slab-out-of-bounds bug in the
ds2490 driver. This bug is caused by improper use of the altsetting
array in the usb_interface structure (the array's entries are not
always stored in numerical order), combined with a naive assumption
that all interfaces probed by the driver will have the expected number
of altsettings.
The bug can be fixed by replacing references to the possibly
non-existent intf->altsetting[alt] entry with the guaranteed-to-exist
intf->cur_altsetting entry.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+d65f673b847a1a96cdba@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alan Stern [Tue, 23 Apr 2019 18:48:29 +0000 (14:48 -0400)]
USB: yurex: Fix protection fault after device removal
The syzkaller USB fuzzer found a general-protection-fault bug in the
yurex driver. The fault occurs when a device has been unplugged; the
driver's interrupt-URB handler logs an error message referring to the
device by name, after the device has been unregistered and its name
deallocated.
This problem is caused by the fact that the interrupt URB isn't
cancelled until the driver's private data structure is released, which
can happen long after the device is gone. The cure is to make sure
that the interrupt URB is killed before yurex_disconnect() returns;
this is exactly the sort of thing that usb_poison_urb() was meant for.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: syzbot+2eb9121678bdb36e6d57@syzkaller.appspotmail.com
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Malte Leip [Sun, 14 Apr 2019 10:00:12 +0000 (12:00 +0200)]
usb: usbip: fix isoc packet num validation in get_pipe
Change the validation of number_of_packets in get_pipe to compare the
number of packets to a fixed maximum number of packets allowed, set to
be 1024. This number was chosen due to it being used by other drivers as
well, for example drivers/usb/host/uhci-q.c
Background/reason:
The get_pipe function in stub_rx.c validates the number of packets in
isochronous mode and aborts with an error if that number is too large,
in order to prevent malicious input from possibly triggering large
memory allocations. This was previously done by checking whether
pdu->u.cmd_submit.number_of_packets is bigger than the number of packets
that would be needed for pdu->u.cmd_submit.transfer_buffer_length bytes
if all except possibly the last packet had maximum length, given by
usb_endpoint_maxp(epd) * usb_endpoint_maxp_mult(epd). This leads to an
error if URBs with packets shorter than the maximum possible length are
submitted, which is allowed according to
Documentation/driver-api/usb/URB.rst and occurs for example with the
snd-usb-audio driver.
Fixes: c6688ef9f297 ("usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input")
Signed-off-by: Malte Leip <malte@leip.net>
Cc: stable <stable@vger.kernel.org>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jonas Karlman [Sun, 21 Apr 2019 08:25:50 +0000 (08:25 +0000)]
drm/bridge: dw-hdmi: fix SCDC configuration for ddc-i2c-bus
When ddc-i2c-bus property is used, a NULL pointer dereference is reported:
[ 31.041669] Unable to handle kernel NULL pointer dereference at virtual address
00000008
[ 31.041671] pgd =
4d3c16f6
[ 31.041673] [
00000008] *pgd=
00000000
[ 31.041678] Internal error: Oops: 5 [#1] SMP ARM
[ 31.041711] Hardware name: Rockchip (Device Tree)
[ 31.041718] PC is at i2c_transfer+0x8/0xe4
[ 31.041721] LR is at drm_scdc_read+0x54/0x84
[ 31.041723] pc : [<
c073273c>] lr : [<
c05926c4>] psr:
280f0013
[ 31.041725] sp :
edffdad0 ip :
5ccb5511 fp :
00000058
[ 31.041727] r10:
00000780 r9 :
edf91608 r8 :
c11b0f48
[ 31.041728] r7 :
00000438 r6 :
00000000 r5 :
00000000 r4 :
00000000
[ 31.041730] r3 :
edffdae7 r2 :
00000002 r1 :
edffdaec r0 :
00000000
[ 31.041908] [<
c073273c>] (i2c_transfer) from [<
c05926c4>] (drm_scdc_read+0x54/0x84)
[ 31.041913] [<
c05926c4>] (drm_scdc_read) from [<
c0592858>] (drm_scdc_set_scrambling+0x30/0xbc)
[ 31.041919] [<
c0592858>] (drm_scdc_set_scrambling) from [<
c05cc0f4>] (dw_hdmi_update_power+0x1440/0x1610)
[ 31.041926] [<
c05cc0f4>] (dw_hdmi_update_power) from [<
c05cc574>] (dw_hdmi_bridge_enable+0x2c/0x70)
[ 31.041932] [<
c05cc574>] (dw_hdmi_bridge_enable) from [<
c05aed48>] (drm_bridge_enable+0x24/0x34)
[ 31.041938] [<
c05aed48>] (drm_bridge_enable) from [<
c0591060>] (drm_atomic_helper_commit_modeset_enables+0x114/0x220)
[ 31.041943] [<
c0591060>] (drm_atomic_helper_commit_modeset_enables) from [<
c05c3fe0>] (rockchip_atomic_helper_commit_tail_rpm+0x28/0x64)
hdmi->i2c may not be set when ddc-i2c-bus property is used in device tree.
Fix this by using hdmi->ddc as the i2c adapter when calling drm_scdc_*().
Also report that SCDC is not supported when there is no DDC bus.
Fixes: 264fce6cc2c1 ("drm/bridge: dw-hdmi: Add SCDC and TMDS Scrambling support")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/VE1PR03MB59031814B5BCAB2152923BDAAC210@VE1PR03MB5903.eurprd03.prod.outlook.com
Geert Uytterhoeven [Wed, 24 Apr 2019 13:59:33 +0000 (15:59 +0200)]
gpio: Fix gpiochip_add_data_with_key() error path
The err_remove_chip block is too coarse, and may perform cleanup that
must not be done. E.g. if of_gpiochip_add() fails, of_gpiochip_remove()
is still called, causing:
OF: ERROR: Bad of_node_put() on /soc/gpio@
e6050000
CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted 5.1.0-rc2-koelsch+ #407
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
[<
c020ec74>] (unwind_backtrace) from [<
c020ae58>] (show_stack+0x10/0x14)
[<
c020ae58>] (show_stack) from [<
c07c1224>] (dump_stack+0x7c/0x9c)
[<
c07c1224>] (dump_stack) from [<
c07c5a80>] (kobject_put+0x94/0xbc)
[<
c07c5a80>] (kobject_put) from [<
c0470420>] (gpiochip_add_data_with_key+0x8d8/0xa3c)
[<
c0470420>] (gpiochip_add_data_with_key) from [<
c0473738>] (gpio_rcar_probe+0x1d4/0x314)
[<
c0473738>] (gpio_rcar_probe) from [<
c052fca8>] (platform_drv_probe+0x48/0x94)
and later, if a GPIO consumer tries to use a GPIO from a failed
controller:
WARNING: CPU: 0 PID: 1 at lib/refcount.c:156 kobject_get+0x38/0x4c
refcount_t: increment on 0; use-after-free.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc2-koelsch+ #407
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
[<
c020ec74>] (unwind_backtrace) from [<
c020ae58>] (show_stack+0x10/0x14)
[<
c020ae58>] (show_stack) from [<
c07c1224>] (dump_stack+0x7c/0x9c)
[<
c07c1224>] (dump_stack) from [<
c0221580>] (__warn+0xd0/0xec)
[<
c0221580>] (__warn) from [<
c02215e0>] (warn_slowpath_fmt+0x44/0x6c)
[<
c02215e0>] (warn_slowpath_fmt) from [<
c07c58fc>] (kobject_get+0x38/0x4c)
[<
c07c58fc>] (kobject_get) from [<
c068b3ec>] (of_node_get+0x14/0x1c)
[<
c068b3ec>] (of_node_get) from [<
c0686f24>] (of_find_node_by_phandle+0xc0/0xf0)
[<
c0686f24>] (of_find_node_by_phandle) from [<
c0686fbc>] (of_phandle_iterator_next+0x68/0x154)
[<
c0686fbc>] (of_phandle_iterator_next) from [<
c0687fe4>] (__of_parse_phandle_with_args+0x40/0xd0)
[<
c0687fe4>] (__of_parse_phandle_with_args) from [<
c0688204>] (of_parse_phandle_with_args_map+0x100/0x3ac)
[<
c0688204>] (of_parse_phandle_with_args_map) from [<
c0471240>] (of_get_named_gpiod_flags+0x38/0x380)
[<
c0471240>] (of_get_named_gpiod_flags) from [<
c046f864>] (gpiod_get_from_of_node+0x24/0xd8)
[<
c046f864>] (gpiod_get_from_of_node) from [<
c0470aa4>] (devm_fwnode_get_index_gpiod_from_child+0xa0/0x144)
[<
c0470aa4>] (devm_fwnode_get_index_gpiod_from_child) from [<
c05f425c>] (gpio_keys_probe+0x418/0x7bc)
[<
c05f425c>] (gpio_keys_probe) from [<
c052fca8>] (platform_drv_probe+0x48/0x94)
Fix this by splitting the cleanup block, and adding a missing call to
gpiochip_irqchip_remove().
Fixes: 28355f81969962cf ("gpio: defer probe if pinctrl cannot be found")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Thomas Hellstrom [Tue, 23 Apr 2019 12:02:57 +0000 (14:02 +0200)]
drm/vmwgfx: Fix dma API layer violation
Remove the check for IOMMU presence since it was considered a
layer violation.
This means we have no reliable way to destinguish between coherent
hardware IOMMU DMA address translations and incoherent SWIOTLB DMA
address translations, which we can't handle. So always presume the
former. This means that if anybody forces SWIOTLB without also setting
the vmw_force_coherent=1 vmwgfx option, driver operation will fail,
like it will on most other graphics drivers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Harry Pan [Wed, 24 Apr 2019 14:50:33 +0000 (22:50 +0800)]
perf/x86/intel: Update KBL Package C-state events to also include PC8/PC9/PC10 counters
Kaby Lake (and Coffee Lake) has PC8/PC9/PC10 residency counters.
This patch updates the list of Kaby/Coffee Lake PMU event counters
from the snb_cstates[] list of events to the hswult_cstates[]
list of events, which keeps all previously supported events and
also adds the PKG_C8, PKG_C9 and PKG_C10 residency counters.
This allows user space tools to profile them through the perf interface.
Signed-off-by: Harry Pan <harry.pan@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: gs0622@gmail.com
Link: http://lkml.kernel.org/r/20190424145033.1924-1-harry.pan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 24 Apr 2019 23:18:59 +0000 (16:18 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Just the usual assortment of small'ish fixes:
1) Conntrack timeout is sometimes not initialized properly, from
Alexander Potapenko.
2) Add a reasonable range limit to tcp_min_rtt_wlen to avoid
undefined behavior. From ZhangXiaoxu.
3) des1 field of descriptor in stmmac driver is initialized with the
wrong variable. From Yue Haibing.
4) Increase mlxsw pci sw reset timeout a little bit more, from Ido
Schimmel.
5) Match IOT2000 stmmac devices more accurately, from Su Bao Cheng.
6) Fallback refcount fix in TLS code, from Jakub Kicinski.
7) Fix max MTU check when using XDP in mlx5, from Maxim Mikityanskiy.
8) Fix recursive locking in team driver, from Hangbin Liu.
9) Fix tls_set_device_offload_Rx() deadlock, from Jakub Kicinski.
10) Don't use napi_alloc_frag() outside of softiq context of socionext
driver, from Ilias Apalodimas.
11) MAC address increment overflow in ncsi, from Tao Ren.
12) Fix a regression in 8K/1M pool switching of RDS, from Zhu Yanjun.
13) ipv4_link_failure has to validate the headers that are actually
there because RAW sockets can pass in arbitrary garbage, from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add sanity checks in ipv4_link_failure()
net/rose: fix unbound loop in rose_loopback_timer()
rxrpc: fix race condition in rxrpc_input_packet()
net: rds: exchange of 8K and 1M pool
net: vrf: Fix operation not supported when set vrf mac
net/ncsi: handle overflow when incrementing mac address
net: socionext: replace napi_alloc_frag with the netdev variant on init
net: atheros: fix spelling mistake "underun" -> "underrun"
spi: ST ST95HF NFC: declare missing of table
spi: Micrel eth switch: declare missing of table
net: stmmac: move stmmac_check_ether_addr() to driver probe
netfilter: fix nf_l4proto_log_invalid to log invalid packets
netfilter: never get/set skb->tstamp
netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK
dt-bindings: add an explanation for internal phy-mode
net/tls: don't leak IV and record seq when offload fails
net/tls: avoid potential deadlock in tls_set_device_offload_rx()
selftests/net: correct the return value for run_afpackettests
team: fix possible recursive locking when add slaves
...
Linus Torvalds [Wed, 24 Apr 2019 23:15:38 +0000 (16:15 -0700)]
Merge tag 'leds-for-5.1-rc7' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED update from Jacek Anaszewski:
"A single change to MAINTAINERS:
We announce a new LED reviewer - Dan Murphy"
* tag 'leds-for-5.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
MAINTAINERS: LEDs: Add designated reviewer for LED subsystem
Eric Dumazet [Wed, 24 Apr 2019 15:04:05 +0000 (08:04 -0700)]
ipv4: add sanity checks in ipv4_link_failure()
Before calling __ip_options_compile(), we need to ensure the network
header is a an IPv4 one, and that it is already pulled in skb->head.
RAW sockets going through a tunnel can end up calling ipv4_link_failure()
with total garbage in the skb, or arbitrary lengthes.
syzbot report :
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:355 [inline]
BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
Write of size 69 at addr
ffff888096abf068 by task syz-executor.4/9204
CPU: 0 PID: 9204 Comm: syz-executor.4 Not tainted 5.1.0-rc5+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
check_memory_region_inline mm/kasan/generic.c:185 [inline]
check_memory_region+0x123/0x190 mm/kasan/generic.c:191
memcpy+0x38/0x50 mm/kasan/common.c:133
memcpy include/linux/string.h:355 [inline]
__ip_options_echo+0x294/0x1120 net/ipv4/ip_options.c:123
__icmp_send+0x725/0x1400 net/ipv4/icmp.c:695
ipv4_link_failure+0x29f/0x550 net/ipv4/route.c:1204
dst_link_failure include/net/dst.h:427 [inline]
vti6_xmit net/ipv6/ip6_vti.c:514 [inline]
vti6_tnl_xmit+0x10d4/0x1c0c net/ipv6/ip6_vti.c:553
__netdev_start_xmit include/linux/netdevice.h:4414 [inline]
netdev_start_xmit include/linux/netdevice.h:4423 [inline]
xmit_one net/core/dev.c:3292 [inline]
dev_hard_start_xmit+0x1b2/0x980 net/core/dev.c:3308
__dev_queue_xmit+0x271d/0x3060 net/core/dev.c:3878
dev_queue_xmit+0x18/0x20 net/core/dev.c:3911
neigh_direct_output+0x16/0x20 net/core/neighbour.c:1527
neigh_output include/net/neighbour.h:508 [inline]
ip_finish_output2+0x949/0x1740 net/ipv4/ip_output.c:229
ip_finish_output+0x73c/0xd50 net/ipv4/ip_output.c:317
NF_HOOK_COND include/linux/netfilter.h:278 [inline]
ip_output+0x21f/0x670 net/ipv4/ip_output.c:405
dst_output include/net/dst.h:444 [inline]
NF_HOOK include/linux/netfilter.h:289 [inline]
raw_send_hdrinc net/ipv4/raw.c:432 [inline]
raw_sendmsg+0x1d2b/0x2f20 net/ipv4/raw.c:663
inet_sendmsg+0x147/0x5d0 net/ipv4/af_inet.c:798
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0xdd/0x130 net/socket.c:661
sock_write_iter+0x27c/0x3e0 net/socket.c:988
call_write_iter include/linux/fs.h:1866 [inline]
new_sync_write+0x4c7/0x760 fs/read_write.c:474
__vfs_write+0xe4/0x110 fs/read_write.c:487
vfs_write+0x20c/0x580 fs/read_write.c:549
ksys_write+0x14f/0x2d0 fs/read_write.c:599
__do_sys_write fs/read_write.c:611 [inline]
__se_sys_write fs/read_write.c:608 [inline]
__x64_sys_write+0x73/0xb0 fs/read_write.c:608
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458c29
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f293b44bc78 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
0000000000458c29
RDX:
0000000000000014 RSI:
00000000200002c0 RDI:
0000000000000003
RBP:
000000000073bf00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007f293b44c6d4
R13:
00000000004c8623 R14:
00000000004ded68 R15:
00000000ffffffff
The buggy address belongs to the page:
page:
ffffea00025aafc0 count:0 mapcount:0 mapping:
0000000000000000 index:0x0
flags: 0x1fffc0000000000()
raw:
01fffc0000000000 0000000000000000 ffffffff025a0101 0000000000000000
raw:
0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888096abef80: 00 00 00 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2
ffff888096abf000: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
>
ffff888096abf080: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
^
ffff888096abf100: 00 00 00 00 f1 f1 f1 f1 00 00 f3 f3 00 00 00 00
ffff888096abf180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 24 Apr 2019 12:35:00 +0000 (05:35 -0700)]
net/rose: fix unbound loop in rose_loopback_timer()
This patch adds a limit on the number of skbs that fuzzers can queue
into loopback_queue. 1000 packets for rose loopback seems more than enough.
Then, since we now have multiple cpus in most linux hosts,
we also need to limit the number of skbs rose_loopback_timer()
can dequeue at each round.
rose_loopback_queue() can be drop-monitor friendly, calling
consume_skb() or kfree_skb() appropriately.
Finally, use mod_timer() instead of del_timer() + add_timer()
syzbot report was :
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-...!: (10499 ticks this GP) idle=536/1/0x4000000000000002 softirq=103291/103291 fqs=34
rcu: (t=10500 jiffies g=140321 q=323)
rcu: rcu_preempt kthread starved for 10426 jiffies! g140321 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt I29168 10 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2877 [inline]
__schedule+0x813/0x1cc0 kernel/sched/core.c:3518
schedule+0x92/0x180 kernel/sched/core.c:3562
schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
rcu_gp_fqs_loop kernel/rcu/tree.c:1971 [inline]
rcu_gp_kthread+0x962/0x17b0 kernel/rcu/tree.c:2128
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 7632 Comm: kworker/0:4 Not tainted 5.1.0-rc5+ #172
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events iterate_cleanup_work
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1223
print_cpu_stall kernel/rcu/tree.c:1360 [inline]
check_cpu_stall kernel/rcu/tree.c:1434 [inline]
rcu_pending kernel/rcu/tree.c:3103 [inline]
rcu_sched_clock_irq.cold+0x500/0xa4a kernel/rcu/tree.c:2544
update_process_times+0x32/0x80 kernel/time/timer.c:1635
tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
__run_hrtimer kernel/time/hrtimer.c:1389 [inline]
__hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:95
Code: 89 25 b4 6e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 6e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60
RSP: 0018:
ffff8880ae807ce0 EFLAGS:
00000286 ORIG_RAX:
ffffffffffffff13
RAX:
ffff88806fd40640 RBX:
dffffc0000000000 RCX:
ffffffff863fbc56
RDX:
0000000000000100 RSI:
ffffffff863fbc1d RDI:
ffff88808cf94228
RBP:
ffff8880ae807d10 R08:
ffff88806fd40640 R09:
ffffed1015d00f8b
R10:
ffffed1015d00f8a R11:
0000000000000003 R12:
ffff88808cf941c0
R13:
00000000fffff034 R14:
ffff8882166cd840 R15:
0000000000000000
rose_loopback_timer+0x30d/0x3f0 net/rose/rose_loopback.c:91
call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
expire_timers kernel/time/timer.c:1362 [inline]
__run_timers kernel/time/timer.c:1681 [inline]
__run_timers kernel/time/timer.c:1649 [inline]
run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
__do_softirq+0x266/0x95a kernel/softirq.c:293
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 24 Apr 2019 16:44:11 +0000 (09:44 -0700)]
rxrpc: fix race condition in rxrpc_input_packet()
After commit
5271953cad31 ("rxrpc: Use the UDP encap_rcv hook"),
rxrpc_input_packet() is directly called from lockless UDP receive
path, under rcu_read_lock() protection.
It must therefore use RCU rules :
- udp_sk->sk_user_data can be cleared at any point in this function.
rcu_dereference_sk_user_data() is what we need here.
- Also, since sk_user_data might have been set in rxrpc_open_socket()
we must observe a proper RCU grace period before kfree(local) in
rxrpc_lookup_local()
v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
and Julia Lawall <julia.lawall@lip6.fr>, thanks !
v3,v2 : addressed David Howells feedback, thanks !
syzbot reported :
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
RSP: 0018:
ffff88809d7aef58 EFLAGS:
00010002
RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000026 RSI:
0000000000000000 RDI:
0000000000000001
RBP:
ffff88809d7af090 R08:
0000000000000001 R09:
0000000000000001
R10:
ffffed1015d05bc7 R11:
ffff888089428600 R12:
0000000000000000
R13:
0000000000000130 R14:
0000000000000001 R15:
0000000000000001
FS:
00007f059044d700(0000) GS:
ffff8880ae800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000004b6040 CR3:
00000000955ca000 CR4:
00000000001406f0
Call Trace:
lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
__udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
NF_HOOK include/linux/netfilter.h:289 [inline]
NF_HOOK include/linux/netfilter.h:283 [inline]
ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
dst_input include/net/dst.h:450 [inline]
ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
NF_HOOK include/linux/netfilter.h:289 [inline]
NF_HOOK include/linux/netfilter.h:283 [inline]
ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
__netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
__netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
napi_frags_finish net/core/dev.c:5769 [inline]
napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
call_write_iter include/linux/fs.h:1866 [inline]
do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
do_iter_write fs/read_write.c:957 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:938
vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
do_writev+0x15e/0x370 fs/read_write.c:1037
__do_sys_writev fs/read_write.c:1110 [inline]
__se_sys_writev fs/read_write.c:1107 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Tue, 23 Apr 2019 20:00:24 +0000 (15:00 -0500)]
MAINTAINERS: LEDs: Add designated reviewer for LED subsystem
Add a designated reviewer for the LED subsystem as there
are already two maintainers assigned.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Zhu Yanjun [Wed, 24 Apr 2019 06:56:42 +0000 (02:56 -0400)]
net: rds: exchange of 8K and 1M pool
Before the commit
490ea5967b0d ("RDS: IB: move FMR code to its own file"),
when the dirty_count is greater than 9/10 of max_items of 8K pool,
1M pool is used, Vice versa. After the commit
490ea5967b0d ("RDS: IB: move
FMR code to its own file"), the above is removed. When we make the
following tests.
Server:
rds-stress -r 1.1.1.16 -D 1M
Client:
rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
The following will appear.
"
connecting to 1.1.1.16:4000
negotiated options, tasks will start in 2 seconds
Starting up..header from 1.1.1.166:4001 to id 4001 bogus
..
tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us
cpu %
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
1 0 0 0.00 0.00 0.00 0.00 0.00 -1.00
...
"
So this exchange between 8K and 1M pool is added back.
Fixes: commit 490ea5967b0d ("RDS: IB: move FMR code to its own file")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Miaohe Lin [Sat, 20 Apr 2019 04:09:39 +0000 (12:09 +0800)]
net: vrf: Fix operation not supported when set vrf mac
Vrf device is not able to change mac address now because lack of
ndo_set_mac_address. Complete this in case some apps need to do
this.
Reported-by: Hui Wang <wanghui104@huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jérôme Glisse [Wed, 10 Apr 2019 19:37:47 +0000 (15:37 -0400)]
cifs: fix page reference leak with readv/writev
CIFS can leak pages reference gotten through GUP (get_user_pages*()
through iov_iter_get_pages()). This happen if cifs_send_async_read()
or cifs_write_from_iter() calls fail from within __cifs_readv() and
__cifs_writev() respectively. This patch move page unreference to
cifs_aio_ctx_release() which will happens on all code paths this is
all simpler to follow for correctness.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Frank Sorenson [Tue, 16 Apr 2019 13:37:27 +0000 (08:37 -0500)]
cifs: do not attempt cifs operation on smb2+ rename error
A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Ronnie Sahlberg [Tue, 23 Apr 2019 06:39:45 +0000 (16:39 +1000)]
cifs: fix memory leak in SMB2_read
Commit
088aaf17aa79300cab14dbee2569c58cfafd7d6e introduced a leak where
if SMB2_read() returned an error we would return without freeing the
request buffer.
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pan Bian [Fri, 19 Apr 2019 07:39:00 +0000 (07:39 +0000)]
Input: synaptics-rmi4 - fix possible double free
The RMI4 function structure has been released in rmi_register_function
if error occurs. However, it will be released again in the function
rmi_create_function, which may result in a double-free bug.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:28 +0000 (14:07 +0300)]
RDMA/ucontext: Fix regression with disassociate
When this code was consolidated the intention was that the VMA would
become backed by anonymous zero pages after the zap_vma_pte - however this
very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
bits to transform the VMA into an anonymous VMA. Since the vm_ops was
removed this broke.
Now userspace gets a SIGBUS if it touches the vma after disassociation.
Instead of converting the VMA to anonymous provide a fault handler that
puts a zero'd page into the VMA when user-space touches it after
disassociation.
Cc: stable@vger.kernel.org
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jacky Bai [Fri, 5 Apr 2019 17:31:09 +0000 (10:31 -0700)]
Input: snvs_pwrkey - make it depend on ARCH_MXC
The SNVS power key is not only used on i.MX6SX and i.MX7D, it is also
used by i.MX6UL and NXP's latest ARMv8 based i.MX8M series SOC. So
update the config dependency to use ARCH_MXC, and add the COMPILE_TEST
too.
Signed-off-by: Jacky Bai <ping.bai@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Jason Gunthorpe [Tue, 16 Apr 2019 11:07:26 +0000 (14:07 +0300)]
RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
Since mlx5 supports device disassociate it must use this API for all
BAR page mmaps, otherwise the pages can remain mapped after the device
is unplugged causing a system crash.
Cc: stable@vger.kernel.org
Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>