Arnaldo Carvalho de Melo [Tue, 3 Sep 2019 13:11:53 +0000 (10:11 -0300)]
perf tools: Remove debug.h from places where it is not needed
Pruning a bit more the includes dependency tree. Building this thing on
lots of containers takes time, we better reduce the time per build, each
container is doing 6 builds when clang and clang-devel are available,
and the plan is to do a 'make -C tools/perf build-test' that have many
more.
Also helps when doing normal development, as touching some random file
will have a much reduced chance of triggering lots of rebuilds.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-r889ur2cxe16m91m2a4pl15p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 3 Sep 2019 12:57:50 +0000 (09:57 -0300)]
perf debug: No need to include ui/util.h
Nothing from that file is used in util/debug.h, it is only needed in
some places that get it indirectly via including util/debug.h, remove
that entanglement.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-hn9v4jdova2nt018fqsjyzun@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 3 Sep 2019 12:47:53 +0000 (09:47 -0300)]
perf tools: Remove needless builtin.h include directives
Now that builtin.h isn't included by any other header, we can check
where it is really needed, i.e. we can remove it and be sure that it
isn't being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-mn7jheex85iw9qo6tlv26hb2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Mon, 2 Sep 2019 16:08:19 +0000 (16:08 +0000)]
perf tools: Add PMU event JSON files for ARM Cortex-A76 and, Neoverse N1.
The source of the event codes and description text was the Neoverse N1
technical reference manual at:
http://infocenter.arm.com/help/topic/com.arm.doc.100616_0301_01_en/neoverse_n1_trm_100616_0301_01_en.pdf
The Cortex-A76 shares the same event IDs as the Neoverse N1 and they
can be viewed at:
https://static.docs.arm.com/100798/0400/cortex_a76_trm_100798_0400_00_en.pdf
Signed-off-by: James Clark <james.clark@arm.com>
Cc: "linux-perf-users@vger.kernel.org" <linux-perf-users@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: james clark <james.clark@arm.com>
Cc: nd <nd@arm.com>
Link: http://lore.kernel.org/lkml/20190902160713.1425-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 2 Sep 2019 18:37:21 +0000 (15:37 -0300)]
perf jvmti: Link against tools/lib/string.o to have weak strlcpy()
That is needed in systems such some S/390 distros.
$ readelf -s /tmp/build/perf/jvmti/jvmti-in.o | grep strlcpy
452:
0000000000002990 125 FUNC WEAK DEFAULT 119 strlcpy
$
Thanks to Jiri Olsa for fixing up my initial stab at this, I forgot how
Makefiles are picky about spaces versus tabs.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andreas Krebbel <krebbel@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sergey Melnikov <melnikov.sergey.v@gmail.com>
Link: https://lkml.kernel.org/n/tip-x8vg9sffgb2t1tzqmhkrulh7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 2 Sep 2019 12:12:53 +0000 (14:12 +0200)]
libperf: Adopt perf_cpu_map__max() function
From 'perf stat', so that it can be used from multiple places.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190902121255.536-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 1 Sep 2019 12:48:22 +0000 (14:48 +0200)]
libperf: Add missing event.h file to install rule
So that this development header is properly installed and can be found
by tools linking with libperf.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190901124822.10132-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 1 Sep 2019 12:48:21 +0000 (14:48 +0200)]
perf tests: Add libperf automated test for 'make -C tools/perf build-test'
Add a libperf build test, that is triggered when one does:
$ make -C tools/perf build-test
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190901124822.10132-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Sun, 1 Sep 2019 12:48:19 +0000 (14:48 +0200)]
perf python: Add missing python/perf.so dependency for libperf
The python/perf.so compilation needs libperf ready, otherwise it fails:
$ make python/perf.so JOBS=1
BUILD: Doing 'make -j1' parallel build
GEN python/perf.so
gcc: error: /home/jolsa/kernel/linux-perf/tools/perf/lib/libperf.a: No such file or directory
Fixing this with by adding libperf dependency.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190901124822.10132-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Tue, 3 Sep 2019 11:08:21 +0000 (20:08 +0900)]
kprobes: Prohibit probing on BUG() and WARN() address
Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to
get the address where the BUG() has occurred, kprobes can not
do single-step out-of-line that instruction. So prohibit
probing on such address.
Without this fix, if someone put a kprobe on WARN(), the
kernel will crash with invalid opcode error instead of
outputing warning message, because kernel can not find
correct bug address.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Valdis Klētnieks [Thu, 8 Aug 2019 17:44:02 +0000 (13:44 -0400)]
perf/x86: Make more stuff static
When building with C=2, sparse makes note of a number of things:
arch/x86/events/intel/rapl.c:637:30: warning: symbol 'rapl_attr_update' was not declared. Should it be static?
arch/x86/events/intel/cstate.c:449:30: warning: symbol 'core_attr_update' was not declared. Should it be static?
arch/x86/events/intel/cstate.c:457:30: warning: symbol 'pkg_attr_update' was not declared. Should it be static?
arch/x86/events/msr.c:170:30: warning: symbol 'attr_update' was not declared. Should it be static?
arch/x86/events/intel/lbr.c:276:1: warning: symbol 'lbr_from_quirk_key' was not declared. Should it be static?
And they can all indeed be static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/128059.1565286242@turing-police
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Sun, 1 Sep 2019 03:03:08 +0000 (12:03 +0900)]
x86, perf: Fix the dependency of the x86 insn decoder selftest
Since x86 instruction decoder is not only for kprobes,
it should be tested when the insn.c is compiled.
(e.g. perf is enabled but kprobes is disabled)
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cbe5c34c8c1f ("x86: Compile insn.c and inat.c only for KPROBES")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 2 Sep 2019 07:12:40 +0000 (09:12 +0200)]
Merge tag 'perf-core-for-mingo-5.4-
20190901' of git://git./linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
objtool:
Josh Poimboeuf:
- Move x86 insn decoder to a common location.
Arnaldo Carvalho de Melo:
- Ignore intentional differences for the x86 insn decoder.
build:
Arnaldo Carvalho de Melo:
- Ignore intentional differences for the x86 insn decoder.
Intel PT:
Josh Poimboeuf:
- Use shared x86 insn decoder.
metric groups:
Jin Yao:
- Scale the metric result.
- Support multiple events.
perf c2c:
Jiri Olsa:
- Display proper cpu count in nodes column.
Miscellaneous:
Kyle Meyer:
- Replace MAX_NR_CPUS with perf_env::nr_cpus_online, i.e. with
the number of online CPUs as detected at tool start and/or
recorded in the perf.data file.
libtraceevent:
Tzvetomir Stoyanov:
- Simplify the tep_print_event_* APIs.
- Remove tep_register_trace_clock().
- Change users plugin directory.
Cleanups:
Arnaldo Carvalho de Melo:
- Continue taming the includes hell: remove needless include directives, fix
the fallout, rinse, repeat.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 2 Sep 2019 07:12:21 +0000 (09:12 +0200)]
Merge branch 'linus' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 1 Sep 2019 18:21:57 +0000 (11:21 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Fix the bogus detection of 32bit user mode for uretprobes which
caused corruption of the user return address resulting in
application crashes. In the uprobes handler in_ia32_syscall() is
obviously always returning false on a 64bit kernel. Use
user_64bit_mode() instead which works correctly.
- Prevent large page splitting when ftrace flips RW/RO on the kernel
text which caused iTLB performance issues. Ftrace wants to be
converted to text_poke() which avoids the problem, but for now
allow large page preservation in the static protections check when
the change request spawns a full large page.
- Prevent arch_dynirq_lower_bound() from returning 0 when the IOAPIC
is configured via device tree. In the device tree case the GSI 1:1
mapping is meaningless therefore the lower bound which protects the
GSI range on ACPI machines is irrelevant. Return the lower bound
which the core hands to the function instead of blindly returning 0
which causes the core to allocate the invalid virtual interupt
number 0 which in turn prevents all drivers from allocating and
requesting an interrupt.
- Remove the bogus initialization of LDR and DFR in the 32bit bigsmp
APIC driver. That uses physical destination mode where LDR/DFR are
ignored, but the initialization and the missing clear of LDR caused
the APIC to be left in a inconsistent state on kexec/reboot.
- Clear LDR when clearing the APIC registers so the APIC is in a well
defined state.
- Initialize variables proper in the find_trampoline_placement()
code.
- Silence GCC( build warning for the real mode part of the build"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text
x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning
x86/boot/compressed/64: Fix missing initialization in find_trampoline_placement()
x86/apic: Include the LDR when clearing out APIC registers
x86/apic: Do not initialize LDR and DFR for bigsmp
uprobes/x86: Fix detection of 32-bit user mode
x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
Linus Torvalds [Sun, 1 Sep 2019 18:09:42 +0000 (11:09 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for perf x86 hardware implementations:
- Restrict the period on Nehalem machines to prevent perf from
hogging the CPU
- Prevent the AMD IBS driver from overwriting the hardwre controlled
and pre-seeded reserved bits (0-6) in the count register which
caused a sample bias for dispatched micro-ops"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
perf/x86/intel: Restrict period on Nehalem
Linus Torvalds [Sun, 1 Sep 2019 17:39:25 +0000 (10:39 -0700)]
Merge branch 'turbostat' of git://git./linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
"User-space turbostat (and x86_energy_perf_policy) patches.
They are primarily bug fixes from users"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPL
tools/power turbostat: Fix caller parameter of get_tdp_amd()
tools/power turbostat: Fix CPU%C1 display value
tools/power turbostat: do not enforce 1ms
tools/power turbostat: read from pipes too
tools/power turbostat: Add Ice Lake NNPI support
tools/power turbostat: rename has_hsw_msrs()
tools/power turbostat: Fix Haswell Core systems
tools/power turbostat: add Jacobsville support
tools/power turbostat: fix buffer overrun
tools/power turbostat: fix file descriptor leaks
tools/power turbostat: fix leak of file descriptor on error return path
tools/power turbostat: Make interval calculation per thread to reduce jitter
tools/power turbostat: remove duplicate pc10 column
tools/power x86_energy_perf_policy: Fix argument parsing
tools/power: Fix typo in man page
tools/power/x86: Enable compiler optimisations and Fortify by default
tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2
Arnaldo Carvalho de Melo [Sat, 31 Aug 2019 20:35:53 +0000 (17:35 -0300)]
objtool: Ignore intentional differences for the x86 insn decoder
Since we need to build this in !x86, we need to explicitely use the x86
files, not things like asm/insn.h, so we intentionally differ from the
master copy in the kernel sources, add -I diff directives to ignore just
these differences when checking for drift.
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lore.kernel.org/lkml/20190830193109.p7jagidsrahoa4pn@treble
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-j965m9b7xtdc83em3twfkh9o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 31 Aug 2019 20:29:47 +0000 (17:29 -0300)]
objtool: Update sync-check.sh from perf's check-headers.sh
To allow using the -I trick that will be needed for checking the x86
insn decoder files.
Without the specific -I lines we still get the same warnings as before:
$ make -C tools/objtool/ clean ; make -C tools/objtool/
make: Entering directory '/home/acme/git/perf/tools/objtool'
CLEAN objtool
find -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
rm -f arch/x86/inat-tables.c fixdep
<SNIP>
LD objtool-in.o
make[1]: Leaving directory '/home/acme/git/perf/tools/objtool'
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/inat.h' differs from latest version at 'arch/x86/include/asm/inat.h'
diff -u tools/arch/x86/include/asm/inat.h arch/x86/include/asm/inat.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/insn.h' differs from latest version at 'arch/x86/include/asm/insn.h'
diff -u tools/arch/x86/include/asm/insn.h arch/x86/include/asm/insn.h
Warning: Kernel ABI header at 'tools/arch/x86/lib/inat.c' differs from latest version at 'arch/x86/lib/inat.c'
diff -u tools/arch/x86/lib/inat.c arch/x86/lib/inat.c
Warning: Kernel ABI header at 'tools/arch/x86/lib/insn.c' differs from latest version at 'arch/x86/lib/insn.c'
diff -u tools/arch/x86/lib/insn.c arch/x86/lib/insn.c
/home/acme/git/perf/tools/objtool
LINK objtool
make: Leaving directory '/home/acme/git/perf/tools/objtool'
$
The next patch will add the -I lines for those files.
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lore.kernel.org/lkml/20190830193109.p7jagidsrahoa4pn@treble
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-vu3p38mnxlwd80rlsnjkqcf2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sat, 31 Aug 2019 20:14:19 +0000 (17:14 -0300)]
perf build: Ignore intentional differences for the x86 insn decoder
Since we need to build this in !x86, we need to explicitely use the x86
files, not things like asm/insn.h, so we intentionally differ from the
master copy in the kernel sources, add -I diff directives to ignore just
these differences when checking for drift.
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-9qziqjjt120mmz6kyepka9p7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Josh Poimboeuf [Thu, 29 Aug 2019 22:41:21 +0000 (17:41 -0500)]
perf intel-pt: Use shared x86 insn decoder
Now that there's a common version of the decoder for all tools, use it
instead of the local copy.
Also use perf's check-headers.sh script to diff the decoder files to
make sure they remain in sync with the kernel version. Objtool has a
similar check.
Committer notes:
Had to keep this all pointing explicitely to x86 headers/files, i.e.
instead of asm/isnn.h we had to use ../include/asm/insn.h when the files
were in differemt dirs, or just replace "<asm/foo.h>" with "foo.h".
This way we continue to be able to process perf.data files with Intel PT
traces in distros other than x86.
Also fixed up the awk script paths to use $(srcdir)/tools/arch instead
or relative directories so that we keep detached tarballs (make help |
grep perf) working.
For now the include lines in these headers are being ignored so as not
to flag false reports of kernel/tools out of sync.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/8a37e615d2880f039505d693d1e068a009358a2b.1567118001.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Josh Poimboeuf [Thu, 29 Aug 2019 22:41:20 +0000 (17:41 -0500)]
perf intel-pt: Remove inat.c from build dependency list
intel-pt-insn-decoder.c includes inat.c directly, so it already has an
implicit dependency on inat.c. The Build file dependency is redundant.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/53776d6d29bc9eceb571d52df8fa32250c58a0f3.1567118001.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Josh Poimboeuf [Thu, 29 Aug 2019 22:41:19 +0000 (17:41 -0500)]
perf: Update .gitignore file
After a "make tools/perf", git reports the following untracked files:
tools/perf/feature/
tools/perf/fixdep
tools/perf/libtraceevent-dynamic-list
Add these generated files to perf's .gitignore file.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/03acbc6c2fbc72054861f6c301875db75db33030.1567118001.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Josh Poimboeuf [Thu, 29 Aug 2019 22:41:18 +0000 (17:41 -0500)]
objtool: Move x86 insn decoder to a common location
The kernel tree has three identical copies of the x86 instruction
decoder. Two of them are in the tools subdir.
The tools subdir is supposed to be completely standalone and separate
from the kernel. So having at least one copy of the kernel decoder in
the tools subdir is unavoidable. However, we don't need *two* of them.
Move objtool's copy of the decoder to a shared location, so that perf
will also be able to use it.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: x86@kernel.org
Link: http://lore.kernel.org/lkml/55b486b88f6bcd0c9a2a04b34f964860c8390ca8.1567118001.git.jpoimboe@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Wed, 28 Aug 2019 05:59:32 +0000 (13:59 +0800)]
perf metricgroup: Support multiple events for metricgroup
Some uncore metrics don't work as expected. For example, on
cascadelakex:
root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1
Performance counter stats for 'system wide':
1841092 unc_m_pmm_rpq_inserts
3680816 unc_m_pmm_wpq_inserts
1.
001775055 seconds time elapsed
root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1
Performance counter stats for 'system wide':
860649746 unc_m_pmm_rpq_occupancy.all
1840557 unc_m_pmm_rpq_inserts
12790627455 unc_m_clockticks
1.
001773348 seconds time elapsed
No metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' or 'UNC_M_PMM_READ_LATENCY' are
reported.
The issue is, the case of an alias expanding to mulitple events is not
supported, typically the uncore events. (see comments in
find_evsel_group()).
For UNC_M_PMM_BANDWIDTH.TOTAL in above example, the expanded event group
is '{unc_m_pmm_rpq_inserts,unc_m_pmm_wpq_inserts}:W', but the actual
events passed to find_evsel_group are:
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_rpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
unc_m_pmm_wpq_inserts
For this multiple events case, it's not supported well.
This patch introduces a new field 'metric_leader' in struct evsel. The
first event is considered as a metric leader. For the rest of same
events, they point to the first event via it's metric_leader field in
struct evsel.
This design is for adding the counting results of all same events to the
first event in group (the metric_leader).
With this patch,
root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1
Performance counter stats for 'system wide':
1842108 unc_m_pmm_rpq_inserts # 337.2 MB/sec UNC_M_PMM_BANDWIDTH.TOTAL
3682209 unc_m_pmm_wpq_inserts
1.
001819706 seconds time elapsed
root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1
Performance counter stats for 'system wide':
861970685 unc_m_pmm_rpq_occupancy.all # 219.4 ns UNC_M_PMM_READ_LATENCY
1842772 unc_m_pmm_rpq_inserts
12790196356 unc_m_clockticks
1.
001749103 seconds time elapsed
Now we can see the correct metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' and
'UNC_M_PMM_READ_LATENCY'.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190828055932.8269-5-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Wed, 28 Aug 2019 05:59:31 +0000 (13:59 +0800)]
perf metricgroup: Scale the metric result
Some metrics define the scale unit, such as
{
"BriefDescription": "Intel Optane DC persistent memory read latency (ns). Derived from unc_m_pmm_rpq_occupancy.all",
"Counter": "0,1,2,3",
"EventCode": "0xE0",
"EventName": "UNC_M_PMM_READ_LATENCY",
"MetricExpr": "UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS",
"MetricName": "UNC_M_PMM_READ_LATENCY",
"PerPkg": "1",
"ScaleUnit": "6000000000ns",
"UMask": "0x1",
"Unit": "iMC"
},
For above example, the ratio should be,
ratio = (UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS) *
6000000000
But in current code, the ratio is not scaled ( *
6000000000)
With this patch, the ratio is scaled and the unit (ns) is printed.
For example,
# 219.4 ns UNC_M_PMM_READ_LATENCY
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190828055932.8269-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jin Yao [Wed, 28 Aug 2019 05:59:29 +0000 (13:59 +0800)]
perf pmu: Change convert_scale from static to global
The function convert_scale() can be used to convert string to unit and
scale. For example,
s = "6000000000ns";
convert_scale(s, &unit, &scale);
unit = "ns", scale =
6000000000.
Currently this function is static. This patch renames the function to
perf_pmu__convert_scale and changes the function to global. No
functional change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190828055932.8269-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 18:09:54 +0000 (15:09 -0300)]
perf symbols: Move mem_info and branch_info out of symbol.h
The mem_info struct goes to mem-events.h and branch_info goes to
branch.h, where they belong, this way we can remove several headers from
symbols.h and trim the include dependency tree more.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-aupw71xnravcsu2xoabfmhpc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 17:45:20 +0000 (14:45 -0300)]
perf auxtrace: Uninline functions that touch perf_session
So that we don't carry the session.h include directive in auxtrace.h,
which in turn opens a can of worms of files that were getting all sorts
of things via that include, fix them all.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-d2d83aovpgri2z75wlitquni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 15:52:25 +0000 (12:52 -0300)]
perf tools: Remove needless evlist.h include directives
Remove the last unneeded use of cache.h in a header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
This is an old file, used by now incorrectly in many places, so it was
providing includes needed indirectly, fixup this fallout.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3x3l8gihoaeh7714os861ia7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 15:29:03 +0000 (12:29 -0300)]
perf tools: Remove needless evlist.h include directives
Now that evlist.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6d7kape36m94a266md0d3xbh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 15:18:50 +0000 (12:18 -0300)]
perf tools: Remove needless thread_map.h include directives
Now that thread_map.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fyzvg64cz1ikvyxp8d6nrhz1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 15:13:45 +0000 (12:13 -0300)]
perf tools: Remove needless thread.h include directives
Now that thread.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-kh333ivjbw05wsggckpziu86@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 15:07:23 +0000 (12:07 -0300)]
perf tools: Remove needless map.h include directives
Now that map.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-iu8ylqky7g1i9i54v3y7qovw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 14:56:47 +0000 (11:56 -0300)]
perf probe: No need for symbol.h, symbol_conf is enough
Remove one more unneeded use of symbol.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-vrda1tuem1o8pk82t2kfjtun@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 14:54:00 +0000 (11:54 -0300)]
perf tools: Remove needless sort.h include directives
Now that sort.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-tom8k0lbsxd9joprr8zpu6w1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 14:44:32 +0000 (11:44 -0300)]
perf tools: Move 'struct events_stats' and prototypes to separate header
This will allow us to untangle the header dependency a bit more, as some
places will not need event.h anymore.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-enqncj29ovzaat3cd9203rwl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 14:28:14 +0000 (11:28 -0300)]
perf hist: Remove needless ui/progress.h from hist.h
We only need a forward declaration, add it and fixup all the files that
need ui_progress definitions but were wrongly getting it from hist.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-84a90o9jdxybffxo9jmouokw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 14:11:01 +0000 (11:11 -0300)]
perf dsos: Move the dsos struct and its methods to separate source files
So that we can reduce the header dependency tree further, in the process
noticed that lots of places were getting even things like build-id
routines and 'struct perf_tool' definition indirectly, so fix all those
too.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 13:26:37 +0000 (10:26 -0300)]
perf symbols: Move symsrc prototypes to a separate header
So that we can remove dso.h from symbol.h and reduce the header
dependency tree.
Fixup cases where struct dso guts are needed but were obtained via
symbol.h, indirectly.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ip683cegt306ncu3gsz7ii21@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 13:19:19 +0000 (10:19 -0300)]
perf symbols: Add missing linux/refcount.h to symbol.h
We use refcount_t there, so we need that header or else it'll break when
we remove dso.h, that is from where it is getting that definition now...
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-5albrk0uve6x9cf6x3ebwpae@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 13:01:50 +0000 (10:01 -0300)]
perf symbol: Move C++ demangle defines to the only file using it
No need to have it generally available in such a critical header as
symbol.h.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-es1ufxv7bihiumytn5dm3drn@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Fri, 30 Aug 2019 12:43:25 +0000 (09:43 -0300)]
perf dso: Adopt DSO related macros from symbol.h
Reducing the size of symbol.h by removing things that are better placed
somewhere else.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-edenkmjt1oe5fks2s6umd30b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tzvetomir Stoyanov [Mon, 5 Aug 2019 20:43:15 +0000 (16:43 -0400)]
libtraceevent: Change users plugin directory
To be compliant with XDG user directory layout, the user's plugin
directory is changed from ~/.traceevent/plugins to
~/.local/lib/traceevent/plugins/
Suggested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick McLean <chutzpah@gentoo.org>
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/linux-trace-devel/20190313144206.41e75cf8@patrickm/
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-4-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.344622683@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tzvetomir Stoyanov [Mon, 5 Aug 2019 20:43:14 +0000 (16:43 -0400)]
libtraceevent: Remove tep_register_trace_clock()
The tep_register_trace_clock() API is used to instruct the traceevent
library how to print the event time stamps. As event print interface if
redesigned, this API is not needed any more. The new event print API is
flexible and the user can specify how the time stamps are printed.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Patrick McLean <chutzpah@gentoo.org>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-3-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.195042846@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tzvetomir Stoyanov [Mon, 5 Aug 2019 20:43:13 +0000 (16:43 -0400)]
libtraceevent, perf tools: Changes in tep_print_event_* APIs
Libtraceevent APIs for printing various trace events information are
complicated, there are complex extra parameters. To control the way
event information is printed, the user should call a set of functions in
a specific sequence.
These APIs are reimplemented to provide a more simple interface for
printing event information.
Removed APIs:
tep_print_event_task()
tep_print_event_time()
tep_print_event_data()
tep_event_info()
tep_is_latency_format()
tep_set_latency_format()
tep_data_latency_format()
tep_set_print_raw()
A new API for printing event information is introduced:
void tep_print_event(struct tep_handle *tep, struct trace_seq *s,
struct tep_record *record, const char *fmt, ...);
where "fmt" is a printf-like format string, followed by the event
fields to be printed. Supported fields:
TEP_PRINT_PID, "%d" - event PID
TEP_PRINT_CPU, "%d" - event CPU
TEP_PRINT_COMM, "%s" - event command string
TEP_PRINT_NAME, "%s" - event name
TEP_PRINT_LATENCY, "%s" - event latency
TEP_PRINT_TIME, %d - event time stamp. A divisor and precision
can be specified as part of this format string:
"%precision.divisord". Example:
"%3.1000d" - divide the time by 1000 and print the first 3 digits
before the dot. Thus, the time stamp "
123456000" will be printed as
"123.456"
TEP_PRINT_INFO, "%s" - event information.
TEP_PRINT_INFO_RAW, "%s" - event information, in raw format.
Example:
tep_print_event(tep, s, record, "%16s-%-5d [%03d] %s %6.1000d %s %s",
TEP_PRINT_COMM, TEP_PRINT_PID, TEP_PRINT_CPU,
TEP_PRINT_LATENCY, TEP_PRINT_TIME, TEP_PRINT_NAME, TEP_PRINT_INFO);
Output:
ls-11314 [005] d.h. 185207.366383 function __wake_up
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linux-trace-devel@vger.kernel.org
Cc: Patrick McLean <chutzpah@gentoo.org>
Link: http://lore.kernel.org/linux-trace-devel/20190801074959.22023-2-tz.stoyanov@gmail.com
Link: http://lore.kernel.org/lkml/20190805204355.041132030@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 29 Aug 2019 20:19:02 +0000 (17:19 -0300)]
perf event: Remove needless include directives from event.h
bpf.h and build-id.h are not needed at all in event.h, remove them.
And fixup the fallout of files that were getting needed stuff from this
now pruned include.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-rdm3dgtlrndmmnlc4bafsg3b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 29 Aug 2019 20:10:59 +0000 (17:10 -0300)]
perf env: Remove env.h from other headers where just a fwd decl is needed
And fixup the fallout of c files not building due to now missing
headers.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-sw8k3kpla98pr3rqypbjk9hf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 29 Aug 2019 19:18:59 +0000 (16:18 -0300)]
perf debug: Remove needless include directives from debug.h
All we need there is a forward declaration for 'union perf_event', so
remove it from there and add missing header directives in places using
things from this indirect include.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Len Brown [Sat, 31 Aug 2019 18:40:39 +0000 (14:40 -0400)]
tools/power turbostat: update version number
Today is 19.08.31, at least in some parts of the world.
Signed-off-by: Len Brown <len.brown@intel.com>
Pu Wen [Sat, 31 Aug 2019 02:20:31 +0000 (10:20 +0800)]
tools/power turbostat: Add support for Hygon Fam 18h (Dhyana) RAPL
Commit
9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") and the commit
3316f99a9f1b68c578c5 ("tools/power
turbostat: Also read package power on AMD F17h (Zen)") add AMD Fam 17h
RAPL support.
Hygon Family 18h(Dhyana) support RAPL in bit 14 of CPUID 0x80000007 EDX,
and has MSRs RAPL_PWR_UNIT/CORE_ENERGY_STAT/PKG_ENERGY_STAT. So add Hygon
Dhyana Family 18h support for RAPL.
Already tested on Hygon multi-node systems and it shows correct per-core
energy usage and the total package power.
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Pu Wen [Sat, 31 Aug 2019 02:19:58 +0000 (10:19 +0800)]
tools/power turbostat: Fix caller parameter of get_tdp_amd()
Commit
9392bd98bba760be96ee ("tools/power turbostat: Add support for AMD
Fam 17h (Zen) RAPL") add a function get_tdp_amd(), the parameter is CPU
family. But the rapl_probe_amd() function use wrong model parameter.
Fix the wrong caller parameter of get_tdp_amd() to use family.
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Srinivas Pandruvada [Tue, 27 Aug 2019 17:57:14 +0000 (10:57 -0700)]
tools/power turbostat: Fix CPU%C1 display value
In some case C1% will be wrong value, when platform doesn't have MSR for
C1 residency.
For example:
Core CPU CPU%c1
- - 100.00
0 0 100.00
0 2 100.00
1 1 100.00
1 3 100.00
But adding Busy% will fix this
Core CPU Busy% CPU%c1
- - 99.77 0.23
0 0 99.77 0.23
0 2 99.77 0.23
1 1 99.77 0.23
1 3 99.77 0.23
This issue can be reproduced on most of the recent systems including
Broadwell, Skylake and later.
This is because if we don't select Busy% or Avg_MHz or Bzy_MHz then
mperf value will not be read from MSR, so it will be 0. But this
is required for C1% calculation when MSR for C1 residency is not present.
Same is true for C3, C6 and C7 column selection.
So add another define DO_BIC_READ(), which doesn't depend on user
column selection and use for mperf, C3, C6 and C7 related counters.
So when there is no platform support for C1 residency counters,
we still read these counters, if the CPU has support and user selected
display of CPU%c1.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Wed, 14 Aug 2019 17:12:56 +0000 (20:12 +0300)]
tools/power turbostat: do not enforce 1ms
Turbostat works by taking a snapshot of counters, sleeping, taking another
snapshot, calculating deltas, and printing out the table.
The sleep time is controlled via -i option or by user sending a signal or a
character to stdin. In the latter case, turbostat always adds 1 ms
sleep before it reads the counters, in order to avoid larger imprecisions
in the results in prints.
While the 1 ms delay may be a good idea for a "dumb" user, it is a
problem for an "aware" user. I do thousands and thousands of measurements
over a short period of time (like 2ms), and turbostat unconditionally adds
a 1ms to my interval, so I cannot get what I really need.
This patch removes the unconditional 1ms sleep. This is an expert user
tool, after all, and non-experts will unlikely ever use it in the non-fixed
interval mode anyway, so I think it is OK to remove the 1ms delay.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Wed, 14 Aug 2019 17:12:55 +0000 (20:12 +0300)]
tools/power turbostat: read from pipes too
Commit '
47936f944e78 tools/power turbostat: fix printing on input' make
a valid fix, but it completely disabled piped stdin support, which is
a valuable use-case. Indeed, if stdin is a pipe, turbostat won't read
anything from it, so it becomes impossible to get turbostat output at
user-defined moments, instead of the regular intervals.
There is no reason why this should works for terminals, but not for
pipes. This patch improves the situation. Instead of ignoring pipes, we
read data from them but gracefully handle the EOF case.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Rajneesh Bhardwaj [Fri, 14 Jun 2019 07:39:46 +0000 (13:09 +0530)]
tools/power turbostat: Add Ice Lake NNPI support
This enables turbostat utility on Ice Lake NNPI SoC.
Link: https://lkml.org/lkml/2019/6/5/1034
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 31 Aug 2019 18:16:07 +0000 (14:16 -0400)]
tools/power turbostat: rename has_hsw_msrs()
Perhaps if this more descriptive name had been used,
then we wouldn't have had the HSW ULT vs HSW CORE bug,
fixed by the previous commit.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 31 Aug 2019 18:09:29 +0000 (14:09 -0400)]
tools/power turbostat: Fix Haswell Core systems
turbostat: cpu0: msr offset 0x630 read failed: Input/output error
because Haswell Core does not have C8-C10.
Output C8-C10 only on Haswell ULT.
Fixes: f5a4c76ad7de ("tools/power turbostat: consolidate duplicate model numbers")
Reported-by: Prarit Bhargava <prarit@redhat.com>
Suggested-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Sun, 21 Apr 2019 08:30:22 +0000 (16:30 +0800)]
tools/power turbostat: add Jacobsville support
Jacobsville behaves like Denverton.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Naoya Horiguchi [Wed, 3 Apr 2019 07:02:14 +0000 (16:02 +0900)]
tools/power turbostat: fix buffer overrun
turbostat could be terminated by general protection fault on some latest
hardwares which (for example) support 9 levels of C-states and show 18
"tADDED" lines. That bloats the total output and finally causes buffer
overrun. So let's extend the buffer to avoid this.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Gustavo A. R. Silva [Mon, 8 Apr 2019 16:12:40 +0000 (11:12 -0500)]
tools/power turbostat: fix file descriptor leaks
Fix file descriptor leaks by closing fp before return.
Addresses-Coverity-ID:
1444591 ("Resource leak")
Addresses-Coverity-ID:
1444592 ("Resource leak")
Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Colin Ian King [Mon, 8 Apr 2019 09:00:44 +0000 (10:00 +0100)]
tools/power turbostat: fix leak of file descriptor on error return path
Currently the error return path does not close the file fp and leaks
a file descriptor. Fix this by closing the file.
Fixes: 5ea7647b333f ("tools/power turbostat: Warn on bad ACPI LPIT data")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Yazen Ghannam [Mon, 25 Mar 2019 17:32:42 +0000 (17:32 +0000)]
tools/power turbostat: Make interval calculation per thread to reduce jitter
Turbostat currently normalizes TSC and other values by dividing by an
interval. This interval is the delta between the start of one global
(all counters on all CPUs) sampling and the start of another. However,
this introduces a lot of jitter into the data.
In order to reduce jitter, the interval calculation should be based on
timestamps taken per thread and close to the start of the thread's
sampling.
Define a per thread time value to hold the delta between samples taken
on the thread.
Use the timestamp taken at the beginning of sampling to calculate the
delta.
Move the thread's beginning timestamp to after the CPU migration to
avoid jitter due to the migration.
Use the global time delta for the average time delta.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Sat, 31 Aug 2019 16:30:24 +0000 (12:30 -0400)]
tools/power turbostat: remove duplicate pc10 column
Remove the duplicate pc10 column.
Fixes: be0e54c4ebbf ("turbostat: Build-in "Low Power Idle" counters support")
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zephaniah E. Loss-Cutler-Hull [Sat, 9 Feb 2019 13:25:48 +0000 (05:25 -0800)]
tools/power x86_energy_perf_policy: Fix argument parsing
The -w argument in x86_energy_perf_policy currently triggers an
unconditional segfault.
This is because the argument string reads: "+a:c:dD:E:e:f:m:M:rt:u:vw" and
yet the argument handler expects an argument.
When parse_optarg_string is called with a null argument, we then proceed to
crash in strncmp, not horribly friendly.
The man page describes -w as taking an argument, the long form
(--hwp-window) is correctly marked as taking a required argument, and the
code expects it.
As such, this patch simply marks the short form (-w) as requiring an
argument.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Matt Lupfer [Thu, 20 Sep 2018 14:31:44 +0000 (10:31 -0400)]
tools/power: Fix typo in man page
From context, we mean EPB (Enegry Performance Bias).
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Ben Hutchings [Sun, 16 Sep 2018 15:06:10 +0000 (16:06 +0100)]
tools/power/x86: Enable compiler optimisations and Fortify by default
Compiling without optimisations is silly, especially since some
warnings depend on the optimiser. Use -O2.
Fortify adds warnings for unchecked I/O (among other things), which
seems to be a good idea for user-space code. Enable that too.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
Ben Hutchings [Sun, 16 Sep 2018 15:05:53 +0000 (16:05 +0100)]
tools/power x86_energy_perf_policy: Fix "uninitialized variable" warnings at -O2
x86_energy_perf_policy first uses __get_cpuid() to check the maximum
CPUID level and exits if it is too low. It then assumes that later
calls will succeed (which I think is architecturally guaranteed). It
also assumes that CPUID works at all (which is not guaranteed on
x86_32).
If optimisations are enabled, gcc warns about potentially
uninitialized variables. Fix this by adding an exit-on-error after
every call to __get_cpuid() instead of just checking the maximum
level.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sat, 31 Aug 2019 17:27:42 +0000 (10:27 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has a bunch of driver fixes and a core improvement to make the
on-going API transition more robust"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: disable zero-length transfers for mt8183
i2c: iproc: Stop advertising support of SMBUS quick cmd
MAINTAINERS: i2c mv64xxx: Update documentation path
i2c: piix4: Fix port selection for AMD Family 16h Model 30h
i2c: designware: Synchronize IRQs when unregistering slave client
i2c: i801: Avoid memory leak in check_acpi_smo88xx_device()
i2c: make i2c_unregister_device() ERR_PTR safe
Linus Torvalds [Sat, 31 Aug 2019 16:15:25 +0000 (09:15 -0700)]
Merge tag 'trace-v5.3-rc6' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Small fixes and minor cleanups for tracing:
- Make exported ftrace function not static
- Fix NULL pointer dereference in reading probes as they are created
- Fix NULL pointer dereference in k/uprobe clean up path
- Various documentation fixes"
* tag 'trace-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Correct kdoc formats
ftrace/x86: Remove mcount() declaration
tracing/probe: Fix null pointer dereference
tracing: Make exported ftrace_set_clr_event non-static
ftrace: Check for successful allocation of hash
ftrace: Check for empty hash and comment the race with registering probes
ftrace: Fix NULL pointer dereference in t_probe_next()
Linus Torvalds [Sat, 31 Aug 2019 16:02:36 +0000 (09:02 -0700)]
Merge tag 'riscv/for-v5.3-rc7' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
"One significant fix for 32-bit RISC-V systems:
Fix the RV32 memory map to prevent userspace from corrupting the
FIXMAP area. Without this patch, the system can crash very early
during the boot"
* tag 'riscv/for-v5.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix FIXMAP area corruption on RV32 systems
Linus Torvalds [Sat, 31 Aug 2019 15:51:48 +0000 (08:51 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"PPC:
- Fix bug which could leave locks held in the host on return to a
guest.
x86:
- Prevent infinitely looping emulation of a failing syscall while
single stepping.
- Do not crash the host when nesting is disabled"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Don't update RIP or do single-step on faulting emulation
KVM: x86: hyper-v: don't crash on KVM_GET_SUPPORTED_HV_CPUID when kvm_intel.nested is disabled
KVM: PPC: Book3S: Fix incorrect guest-to-user-translation error handling
Linus Torvalds [Sat, 31 Aug 2019 15:31:48 +0000 (08:31 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc mm fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: memcontrol: fix percpu vmstats and vmevents flush
mm, memcg: do not set reclaim_state on soft limit reclaim
mailmap: add aliases for Dmitry Safonov
mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
mm: memcontrol: flush percpu slab vmstats on kmem offlining
Jakub Kicinski [Wed, 28 Aug 2019 05:25:47 +0000 (22:25 -0700)]
tracing: Correct kdoc formats
Fix the following kdoc warnings:
kernel/trace/trace.c:1579: warning: Function parameter or member 'tr' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'tsk' not described in 'update_max_tr_single'
kernel/trace/trace.c:1579: warning: Function parameter or member 'cpu' not described in 'update_max_tr_single'
kernel/trace/trace.c:1776: warning: Function parameter or member 'type' not described in 'register_tracer'
kernel/trace/trace.c:2239: warning: Function parameter or member 'task' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2239: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo'
kernel/trace/trace.c:2269: warning: Function parameter or member 'prev' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'next' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:2269: warning: Function parameter or member 'flags' not described in 'tracing_record_taskinfo_sched_switch'
kernel/trace/trace.c:3078: warning: Function parameter or member 'ip' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'fmt' not described in 'trace_vbprintk'
kernel/trace/trace.c:3078: warning: Function parameter or member 'args' not described in 'trace_vbprintk'
Link: http://lkml.kernel.org/r/20190828052549.2472-2-jakub.kicinski@netronome.com
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jisheng Zhang [Mon, 26 Aug 2019 09:13:12 +0000 (09:13 +0000)]
ftrace/x86: Remove mcount() declaration
Commit
562e14f72292 ("ftrace/x86: Remove mcount support") removed the
support for using mcount, so we could remove the mcount() declaration
to clean up.
Link: http://lkml.kernel.org/r/20190826170150.10f101ba@xhacker.debian
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Xinpeng Liu [Wed, 7 Aug 2019 23:29:23 +0000 (07:29 +0800)]
tracing/probe: Fix null pointer dereference
BUG: KASAN: null-ptr-deref in trace_probe_cleanup+0x8d/0xd0
Read of size 8 at addr
0000000000000000 by task syz-executor.0/9746
trace_probe_cleanup+0x8d/0xd0
free_trace_kprobe.part.14+0x15/0x50
alloc_trace_kprobe+0x23e/0x250
Link: http://lkml.kernel.org/r/1565220563-980-1-git-send-email-danielliu861@gmail.com
Fixes: e3dc9f898ef9c ("tracing/probe: Add trace_event_call accesses APIs")
Signed-off-by: Xinpeng Liu <danielliu861@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Denis Efremov [Thu, 4 Jul 2019 17:21:10 +0000 (20:21 +0300)]
tracing: Make exported ftrace_set_clr_event non-static
The function ftrace_set_clr_event is declared static and marked
EXPORT_SYMBOL_GPL(), which is at best an odd combination. Because the
function was decided to be a part of API, this commit removes the static
attribute and adds the declaration to the header.
Link: http://lkml.kernel.org/r/20190704172110.27041-1-efremov@linux.com
Fixes: f45d1225adb04 ("tracing: Kernel access to Ftrace instances")
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Sat, 31 Aug 2019 01:56:08 +0000 (18:56 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a potential crash in the ccp driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Ignore unconfigured CCP device on suspend/resume
Linus Torvalds [Sat, 31 Aug 2019 01:47:15 +0000 (18:47 -0700)]
Partially revert "kfifo: fix kfifo_alloc() and kfifo_init()"
Commit
dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()") made
the kfifo code round the number of elements up. That was good for
__kfifo_alloc(), but it's actually wrong for __kfifo_init().
The difference? __kfifo_alloc() will allocate the rounded-up number of
elements, but __kfifo_init() uses an allocation done by the caller. We
can't just say "use more elements than the caller allocated", and have
to round down.
The good news? All the normal cases will be using power-of-two arrays
anyway, and most users of kfifo's don't use kfifo_init() at all, but one
of the helper macros to declare a KFIFO that enforce the proper
power-of-two behavior. But it looks like at least ibmvscsis might be
affected.
The bad news? Will Deacon refers to an old thread and points points out
that the memory ordering in kfifo's is questionable. See
https://lore.kernel.org/lkml/
20181211034032.32338-1-yuleixzhang@tencent.com/
for more.
Fixes: dfe2a77fd243 ("kfifo: fix kfifo_alloc() and kfifo_init()")
Reported-by: laokz <laokz@foxmail.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shakeel Butt [Fri, 30 Aug 2019 23:04:53 +0000 (16:04 -0700)]
mm: memcontrol: fix percpu vmstats and vmevents flush
Instead of using raw_cpu_read() use per_cpu() to read the actual data of
the corresponding cpu otherwise we will be reading the data of the
current cpu for the number of online CPUs.
Link: http://lkml.kernel.org/r/20190829203110.129263-1-shakeelb@google.com
Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg")
Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 30 Aug 2019 23:04:50 +0000 (16:04 -0700)]
mm, memcg: do not set reclaim_state on soft limit reclaim
Adric Blake has noticed[1] the following warning:
WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 set_task_reclaim_state+0x1e/0x40
[...]
Call Trace:
mem_cgroup_shrink_node+0x9b/0x1d0
mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
balance_pgdat+0x276/0x540
kswapd+0x200/0x3f0
? wait_woken+0x80/0x80
kthread+0xfd/0x130
? balance_pgdat+0x540/0x540
? kthread_park+0x80/0x80
ret_from_fork+0x35/0x40
---[ end trace
727343df67b2398a ]---
which tells us that soft limit reclaim is about to overwrite the
reclaim_state configured up in the call chain (kswapd in this case but
the direct reclaim is equally possible). This means that reclaim stats
would get misleading once the soft reclaim returns and another reclaim
is done.
Fix the warning by dropping set_task_reclaim_state from the soft reclaim
which is always called with reclaim_state set up.
[1] http://lkml.kernel.org/r/CAE1jjeePxYPvw1mw2B3v803xHVR_BNnz0hQUY_JDMN8ny29M6w@mail.gmail.com
Link: http://lkml.kernel.org/r/20190828071808.20410-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Adric Blake <promarbler14@gmail.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Safonov [Fri, 30 Aug 2019 23:04:46 +0000 (16:04 -0700)]
mailmap: add aliases for Dmitry Safonov
I don't work for Virtuozzo or Samsung anymore and I've noticed that they
have started sending annoying html email-replies.
And I prioritize my personal emails over work email box, so while at it
add an entry for Arista too - so I can reply faster when needed.
Link: http://lkml.kernel.org/r/20190827220346.11123-1-dima@arista.com
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo A. R. Silva [Fri, 30 Aug 2019 23:04:43 +0000 (16:04 -0700)]
mm/z3fold.c: fix lock/unlock imbalance in z3fold_page_isolate
Fix lock/unlock imbalance by unlocking *zhdr* before return.
Addresses Coverity ID
1452811 ("Missing unlock")
Link: http://lkml.kernel.org/r/20190826030634.GA4379@embeddedor
Fixes: d776aaa9895e ("mm/z3fold.c: fix race between migration and destruction")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Fri, 30 Aug 2019 23:04:39 +0000 (16:04 -0700)]
mm, memcg: partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"
Commit
766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
with the hierarchical ones") effectively decreased the precision of
per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
That's good for displaying in memory.stat, but brings a serious
regression into the reclaim process.
One issue I've discovered and debugged is the following:
lruvec_lru_size() can return 0 instead of the actual number of pages in
the lru list, preventing the kernel to reclaim last remaining pages.
Result is yet another dying memory cgroups flooding. The opposite is
also happening: scanning an empty lru list is the waste of cpu time.
Also, inactive_list_is_low() can return incorrect values, preventing the
active lru from being scanned and freed. It can fail both because the
size of active and inactive lists are inaccurate, and because the number
of workingset refaults isn't precise. In other words, the result is
pretty random.
I'm not sure, if using the approximate number of slab pages in
count_shadow_number() is acceptable, but issues described above are
enough to partially revert the patch.
Let's keep per-memcg vmstat_local batched (they are only used for
displaying stats to the userspace), but keep lruvec stats precise. This
change fixes the dead memcg flooding on my setup.
Link: http://lkml.kernel.org/r/20190817004726.2530670-1-guro@fb.com
Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 30 Aug 2019 23:04:35 +0000 (16:04 -0700)]
mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n
Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Fri, 30 Aug 2019 23:04:32 +0000 (16:04 -0700)]
mm: memcontrol: flush percpu slab vmstats on kmem offlining
I've noticed that the "slab" value in memory.stat is sometimes 0, even
if some children memory cgroups have a non-zero "slab" value. The
following investigation showed that this is the result of the kmem_cache
reparenting in combination with the per-cpu batching of slab vmstats.
At the offlining some vmstat value may leave in the percpu cache, not
being propagated upwards by the cgroup hierarchy. It means that stats
on ancestor levels are lower than actual. Later when slab pages are
released, the precise number of pages is substracted on the parent
level, making the value negative. We don't show negative values, 0 is
printed instead.
To fix this issue, let's flush percpu slab memcg and lruvec stats on
memcg offlining. This guarantees that numbers on all ancestor levels
are accurate and match the actual number of outstanding slab pages.
Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naveen N. Rao [Thu, 4 Jul 2019 14:34:42 +0000 (20:04 +0530)]
ftrace: Check for successful allocation of hash
In register_ftrace_function_probe(), we are not checking the return
value of alloc_and_copy_ftrace_hash(). The subsequent call to
ftrace_match_records() may end up dereferencing the same. Add a check to
ensure this doesn't happen.
Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 30 Aug 2019 20:30:01 +0000 (16:30 -0400)]
ftrace: Check for empty hash and comment the race with registering probes
The race between adding a function probe and reading the probes that exist
is very subtle. It needs a comment. Also, the issue can also happen if the
probe has has the EMPTY_HASH as its func_hash.
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Naveen N. Rao [Thu, 4 Jul 2019 14:34:41 +0000 (20:04 +0530)]
ftrace: Fix NULL pointer dereference in t_probe_next()
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP:
c00000000029d800 LR:
c00000000029dac4 CTR:
c0000000001e6ad0
REGS:
c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR:
9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR:
28022422 XER:
20040000
CFAR:
c00000000029d90c DAR:
0000000000000000 DSISR:
40000000 IRQMASK: 0
...
NIP [
c00000000029d800] t_probe_next+0x60/0x180
LR [
c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[
c0002017fae8bc90] [
c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[
c0002017fae8bce0] [
c0000000002a15b0] t_start+0xf0/0x1c0
[
c0002017fae8bd30] [
c0000000004ec2b4] seq_read+0x184/0x640
[
c0002017fae8bdd0] [
c0000000004a57bc] sys_read+0x10c/0x300
[
c0002017fae8be30] [
c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Fri, 30 Aug 2019 18:58:02 +0000 (11:58 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Three fixes for ARM this time around:
- A fix for update_sections_early() to cope with NULL ->mm pointers.
- A correction to the backtrace code to allow proper backtraces.
- Reinforcement of pfn_valid() with PFNs >= 4GiB"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8901/1: add a criteria for pfn_valid of arm
ARM: 8897/1: check stmfd instruction using right shift
ARM: 8874/1: mm: only adjust sections of valid mm structures
Eric Biggers [Fri, 30 Aug 2019 15:52:26 +0000 (16:52 +0100)]
keys: ensure that ->match_free() is called in request_key_and_link()
If check_cached_key() returns a non-NULL value, we still need to call
key_type::match_free() to undo key_type::match_preparse().
Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 30 Aug 2019 17:53:12 +0000 (10:53 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The majority of the fixes this time are for OMAP hardware, here is a
breakdown of the significant changes:
Various device tree bug fixes:
- TI am57xx boards need a voltage level fix to avoid damaging SD
cards
- vf610-bk4 fails to detect its flash due to an incorrect description
- meson-g12a USB phy configuration fails
- meson-g12b reboot should not power off the SD card
- Some corrections for apparently harmless differences from the
documentation.
Regression fixes:
- ams-delta FIQ interrupts broke in 5.3
- TI am3/am4 mmc controllers broke in 5.2
The logic_pio driver (used on some Huawei ARM servers) got a few bug
fixes for reliability.
And a couple of compile-time warning fixes"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
soc: ixp4xx: Protect IXP4xx SoC drivers by ARCH_IXP4XX || COMPILE_TEST
soc: ti: pm33xx: Make two symbols static
soc: ti: pm33xx: Fix static checker warnings
ARM: OMAP: dma: Mark expected switch fall-throughs
ARM: dts: Fix incomplete dts data for am3 and am4 mmc
bus: ti-sysc: Simplify cleanup upon failures in sysc_probe()
ARM: OMAP1: ams-delta-fiq: Fix missing irq_ack
ARM: dts: dra74x: Fix iodelay configuration for mmc3
ARM: dts: am335x: Fix UARTs length
ARM: OMAP2+: Fix omap4 errata warning on other SoCs
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
bus: hisi_lpc: Unregister logical PIO range to avoid potential use-after-free
lib: logic_pio: Add logic_pio_unregister_range()
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Fix RCU usage
arm64: dts: amlogic: odroid-n2: keep SD card regulator always on
arm64: dts: meson-g12a-sei510: enable IR controller
arm64: dts: meson-g12a: add missing dwc2 phy-names
ARM: dts: vf610-bk4: Fix qspi node description
ARM: dts: Fix incorrect dcan register mapping for am3, am4 and dra7
...
Linus Torvalds [Fri, 30 Aug 2019 16:23:45 +0000 (09:23 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fix from Doug Ledford:
"Much calmer week this week. Just one patch queued up:
The way the siw driver was locking around the traversal of the list of
ipv6 addresses on a device was causing a scheduling while atomic
issue. Bernard straightened it out by using the rtnl_lock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/siw: Fix IPv6 addr_list locking
Linus Torvalds [Fri, 30 Aug 2019 16:09:54 +0000 (09:09 -0700)]
Merge tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-client
Pull two ceph fixes from Ilya Dryomov:
"A fix for a -rc1 regression in rbd and a trivial static checker fix"
* tag 'ceph-for-5.3-rc7' of git://github.com/ceph/ceph-client:
rbd: restore zeroing past the overlap when reading from parent
libceph: don't call crypto_free_sync_skcipher() on a NULL tfm
Linus Torvalds [Fri, 30 Aug 2019 15:32:10 +0000 (08:32 -0700)]
Merge tag 'mmc-v5.3-rc5' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix init of SD cards reporting an invalid VDD range
MMC host:
- sprd: Fixes for clocks, card-detect, write-protect etc
- cadence: Fix ADMA 64-bit addressing
- tegra: Re-allow writing to SD card when GPIO pin is absent
- at91: Fix eMMC init by clearing HS200 cap as it's not supported"
* tag 'mmc-v5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-cadence: enable v4_mode to fix ADMA 64-bit addressing
mmc: sdhci-sprd: clear the UHS-I modes read from registers
mms: sdhci-sprd: add SDHCI_QUIRK_BROKEN_CARD_DETECTION
mmc: sdhci-sprd: add SDHCI_QUIRK2_PRESET_VALUE_BROKEN
mmc: sdhci-sprd: add get_ro hook function
mmc: sdhci-sprd: fixed incorrect clock divider
mmc: core: Fix init of SD cards reporting an invalid VDD range
mmc: sdhci-of-at91: add quirk for broken HS200
Revert "mmc: sdhci-tegra: drop ->get_ro() implementation"
Linus Torvalds [Fri, 30 Aug 2019 15:21:24 +0000 (08:21 -0700)]
Merge tag 'drm-fixes-2019-08-30' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Nothing too crazy, there's probably more patches than I'd like at this
stage, but they are all pretty self contained:
amdgpu:
- Fix GFXOFF regression for PCO and RV2
- Fix missing fence reference
- Fix VG20 power readings on certain SMU firmware versions
- Fix dpm level setup for VG20
- Add an ATPX laptop quirk
i915:
- Fix DP MST max BPC property creation after DRM register
- Fix unused ggtt deballooning and NULL dereference in guest
- Fix DSC eDP transcoder identification
- Fix WARN from DMA API debug by setting DMA max segment size
qxl:
- Make qxl reservel the vga ports using vgaargb to prevent switching to vga compatibility mode.
omap:
- Fix omap port lookup for SDI output
virtio:
- Use virtio_max_dma_size to fix an issue with swiotlb.
komeda:
- Compiler fixes to komeda.
- Add missing of_node_get() call in komeda.
- Reorder the komeda de-init functions"
* tag 'drm-fixes-2019-08-30' of git://anongit.freedesktop.org/drm/drm:
drm/komeda: Reordered the komeda's de-init functions
drm/amdgpu: fix GFXOFF on Picasso and Raven2
drm/amdgpu: Add APTX quirk for Dell Latitude 5495
drm/amd/powerplay: correct Vega20 dpm level related settings
drm/i915: Call dma_set_max_seg_size() in i915_driver_hw_probe()
drm/i915/dp: Fix DSC enable code to use cpu_transcoder instead of encoder->type
drm/i915: Don't deballoon unused ggtt drm_mm_node in linux guest
drm/i915: Do not create a new max_bpc prop for MST connectors
drm/powerplay: Fix Vega20 power reading again
drm/powerplay: Fix Vega20 Average Power value v4
drm/amdgpu: fix dma_fence_wait without reference
drm/komeda: Add missing of_node_get() call
drm/komeda: Clean warning 'komeda_component_add' might be a candidate for 'gnu_printf'
drm/komeda: Fix warning -Wunused-but-set-variable
drm/komeda: Fix error: not allocating enough data 1592 vs 1584
drm/virtio: use virtio_max_dma_size
drm/omap: Fix port lookup for SDI output
drm/qxl: get vga ioports
Hsin-Yi Wang [Thu, 22 Aug 2019 09:45:17 +0000 (17:45 +0800)]
i2c: mediatek: disable zero-length transfers for mt8183
Quoting from mt8183 datasheet, the number of transfers to be
transferred in one transaction should be set to bigger than 1,
so we should forbid zero-length transfer and update functionality.
Reported-by: Alexandru M Stan <amstan@chromium.org>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Qii Wang <qii.wang@mediatek.com>
[wsa: shortened commit message a little]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Lori Hikichi [Thu, 8 Aug 2019 03:37:52 +0000 (09:07 +0530)]
i2c: iproc: Stop advertising support of SMBUS quick cmd
The driver does not support the SMBUS Quick command so remove the
flag that indicates that level of support.
By default the i2c_detect tool uses the quick command to try and
detect devices at some bus addresses. If the quick command is used
then we will not detect the device, even though it is present.
Fixes: e6e5dd3566e0 (i2c: iproc: Add Broadcom iProc I2C Driver)
Signed-off-by: Lori Hikichi <lori.hikichi@broadcom.com>
Signed-off-by: Rayagonda Kokatanur <rayagonda.kokatanur@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Denis Efremov [Tue, 13 Aug 2019 06:09:13 +0000 (09:09 +0300)]
MAINTAINERS: i2c mv64xxx: Update documentation path
Update MAINTAINERS record to reflect the file move
from i2c-mv64xxx.txt to marvell,mv64xxx-i2c.yaml.
Fixes: f8bbde72ef44 ("dt-bindings: i2c: mv64xxx: Add YAML schemas")
Signed-off-by: Denis Efremov <efremov@linux.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Kim Phillips [Mon, 26 Aug 2019 19:57:30 +0000 (14:57 -0500)]
perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops
When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
sample bias, IBS hardware preloads the least significant 7 bits of
current count (IbsOpCurCnt) with random values, such that, after the
interrupt is handled and counting resumes, the next sample taken
will be slightly perturbed.
The current count bitfield is in the IBS execution control h/w register,
alongside the maximum count field.
Currently, the IBS driver writes that register with the maximum count,
leaving zeroes to fill the current count field, thereby overwriting
the random bits the hardware preloaded for itself.
Fix the driver to actually retain and carry those random bits from the
read of the IBS control register, through to its write, instead of
overwriting the lower current count bits with zeroes.
Tested with:
perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
'perf annotate' output before:
15.70 65: addsd %xmm0,%xmm1
17.30 add $0x1,%rax
15.88 cmp %rdx,%rax
je 82
17.32 72: test $0x1,%al
jne 7c
7.52 movapd %xmm1,%xmm0
5.90 jmp 65
8.23 7c: sqrtsd %xmm1,%xmm0
12.15 jmp 65
'perf annotate' output after:
16.63 65: addsd %xmm0,%xmm1
16.82 add $0x1,%rax
16.81 cmp %rdx,%rax
je 82
16.69 72: test $0x1,%al
jne 7c
8.30 movapd %xmm1,%xmm0
8.13 jmp 65
8.24 7c: sqrtsd %xmm1,%xmm0
8.39 jmp 65
Tested on Family 15h and 17h machines.
Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
affect their operation.
It is unknown why commit
db98c5faf8cb ("perf/x86: Implement 64-bit
counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
field; the number of preloaded random bits has always been 7, AFAICT.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Borislav Petkov" <bp@alien8.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: "Namhyung Kim" <namhyung@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.com