Arnaldo Carvalho de Melo [Thu, 6 Jun 2019 20:03:18 +0000 (17:03 -0300)]
perf data: Fix perf.data documentation for HEADER_CPU_TOPOLOGY
The 'die' info isn't in the same array as core and socket ids, and we
missed the 'dies' string list, that comes right after the 'core' +
'socket' id variable length array, followed by the VLA for the dies.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c9cb12c5ba08 ("perf header: Add die information in CPU topology")
Link: https://lkml.kernel.org/n/tip-nubi6mxp2n8ofvlx7ph6k3h6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 4 Jun 2019 22:50:44 +0000 (15:50 -0700)]
perf tools: Apply new CPU topology sysfs attributes
The existing "thread_siblings" and "thread_siblings_list" attribute will
be deprecated.
Use the new CPU topology sysfs attributes, "core_cpus" and
"core_cpus_list", which are synonymous with the deprecated attributes.
Check the new name first. If not available, use the deprecated name to
be compatible with old kernel.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1559688644-106558-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 4 Jun 2019 22:50:43 +0000 (15:50 -0700)]
perf header: Rename "sibling cores" to "sibling sockets"
The "sibling cores" actually shows the sibling CPUs of a socket. The
name "sibling cores" is very misleading.
Rename "sibling cores" to "sibling sockets"
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1559688644-106558-4-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 4 Jun 2019 22:50:42 +0000 (15:50 -0700)]
perf stat: Support per-die aggregation
It is useful to aggregate counts per die. E.g. Uncore becomes die-scope
on Xeon Cascade Lake-AP.
Introduce a new option "--per-die" to support per-die aggregation.
The global id for each core has been changed to socket + die id + core
id. The global id for each die is socket + die id.
Add die information for per-core aggregation. The output of per-core
aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts
which rely on the output format of per-core aggregation probably be
broken.
For 'perf stat record/report', there is no die information when
processing the old perf.data. The per-die result will be the same as
per-socket.
Committer notes:
Renamed 'die' variable to 'die_id' to fix the build in some systems:
CC /tmp/build/perf/builtin-script.o
cc1: warnings being treated as errors
builtin-stat.c: In function 'perf_env__get_die':
builtin-stat.c:963: error: declaration of 'die' shadows a global declaration
util/util.h:19: error: shadowed declaration is here
mv: cannot stat `/tmp/build/perf/.builtin-stat.o.tmp': No such file or directory
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-bsnhx7vgsuu6ei307mw60mbj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 4 Jun 2019 22:50:41 +0000 (15:50 -0700)]
perf header: Add die information in CPU topology
With the new CPUID.1F, a new level type of CPU topology, 'die', is
introduced. The 'die' information in CPU topology should be added in
perf header.
To be compatible with old perf.data, the patch checks the section size
before reading the die information. The new info is added at the end of
the cpu_topology section, the old perf tool ignores the extra data. It
never reads data crossing the section boundary.
The new perf tool with the patch can be used on legacy kernel. Add a new
function has_die_topology() to check if die topology information is
supported by kernel. The function only check X86 and CPU 0. Assuming
other CPUs have same topology.
Use similar method for core and socket to support die id and sibling
dies string.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1559688644-106558-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Tue, 4 Jun 2019 22:50:40 +0000 (15:50 -0700)]
perf cpumap: Retrieve die id information
There is no function to retrieve die id information of a given CPU.
Add cpu_map__get_die_id() to retrieve die id information.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1559688644-106558-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:08 +0000 (11:35 -0600)]
perf cs-etm: Add support for CPU-wide trace scenarios
Add support for CPU-wide trace scenarios by correlating range packets
with timestamp packets. That way range packets received on different
ETMQ/traceID channels can be processed and synthesized in chronological
order.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-18-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:07 +0000 (11:35 -0600)]
perf cs-etm: Add notion of time to decoding code
This patch deals with timestamp packets received from the decoding
library in order to give the front end packet processing loop a handle
on the time instruction conveyed by range packets have been executed at.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-17-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:06 +0000 (11:35 -0600)]
perf cs-etm: Linking PE contextID with perf thread mechanic
Link contextID packets received from the decoder with the perf tool
thread mechanic so that we know the specifics of the process currently
executing.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-16-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:05 +0000 (11:35 -0600)]
perf cs-etm: Add support for multiple traceID queues
When operating in CPU-wide trace mode with a source/sink topology of N:1
packets with multiple traceID will end up in the same cs_etm_queue. In
order to properly decode packets they need to be split in different
queues, i.e one queue per traceID.
As such add support for multiple traceID per cs_etm_queue by adding a
new cs_etm_traceid_queue every time a new traceID is discovered in the
trace stream.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-15-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:04 +0000 (11:35 -0600)]
perf cs-etm: Use traceID aware memory callback API
When working with CPU-wide traces different traceID may be found in the
same stream. As such we need to use the decoder callback that provides
the traceID in order to know the thread context being decoded.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-14-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:03 +0000 (11:35 -0600)]
perf cs-etm: Move tid/pid to traceid_queue
The tid/pid fields of structure cs_etm_queue are CPU dependent and as
such need to be part of the cs_etm_traceid_queue in order to support
CPU-wide trace scenarios.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-13-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:02 +0000 (11:35 -0600)]
perf cs-etm: Move thread to traceid_queue
The thread field of structure cs_etm_queue is CPU dependent and as such
need to be part of the cs_etm_traceid_queue in order to support CPU-wide
trace scenarios.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-12-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:01 +0000 (11:35 -0600)]
perf cs-etm: Get rid of unused cpu in struct cs_etm_queue
Nowadays the synthesize code is using the packet's cpu information,
making cs_etm_queue::cpu useless. As such simply remove it.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-11-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:35:00 +0000 (11:35 -0600)]
perf cs-etm: Introduce the concept of trace ID queues
In an ideal world there is one CPU per cs_etm_queue and as such, one
trace ID per cs_etm_queue. In the real world CoreSight topologies allow
multiple CPUs to use the same sink, which translates to multiple trace
IDs per cs_etm_queue.
To deal with this a new cs_etm_traceid_queue structure is introduced to
enclose all the information related to a single trace ID, allowing a
cs_etm_queue to handle traces generated by any number of CPUs.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-10-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:59 +0000 (11:34 -0600)]
perf cs-etm: Fix indentation in function cs_etm__process_decoder_queue()
Fixing wrong indentation of the while() loop - no change of
functionality.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 3fa0e83e2948 ("perf cs-etm: Modularize main packet processing loop")
Link: http://lkml.kernel.org/r/20190524173508.29044-9-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:58 +0000 (11:34 -0600)]
perf cs-etm: Move packet queue out of decoder structure
The decoder needs to work with more than one traceID queue if we want to
support CPU-wide scenarios with N:1 source/sink topologies. As such
move the packet buffer and related fields out of the decoder structure
and into the cs_etm_queue structure.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-8-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:57 +0000 (11:34 -0600)]
perf cs-etm: Refactor error path in cs_etm_decoder__new()
There is no point in having two different error goto statement since the
openCSD API to free a decoder handles NULL pointers. As such function
cs_etm_decoder__free() can be called to deal with all aspect of freeing
decoder memory.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-7-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:56 +0000 (11:34 -0600)]
perf cs-etm: Add handling of switch-CPU-wide events
Add handling of SWITCH-CPU-WIDE events in order to add the tid/pid of
the incoming process to the perf tools machine infrastructure. This
information is later retrieved when a contextID packet is found in the
trace stream.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-6-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:55 +0000 (11:34 -0600)]
perf cs-etm: Add handling of itrace start events
Add handling of ITRACE events in order to add the tid/pid of the
executing process to the perf tools machine infrastructure. This
information is later retrieved when a contextID packet is found in the
trace stream.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-5-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:54 +0000 (11:34 -0600)]
perf cs-etm: Configure SWITCH_EVENTS in CPU-wide mode
Ask the perf core to generate an event when processes are swapped in/out
of context. That way proper action can be taken by the decoding code
when faced with such event.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-4-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:53 +0000 (11:34 -0600)]
perf cs-etm: Configure timestamp generation in CPU-wide mode
When operating in CPU-wide mode tracers need to generate timestamps in
order to correlate the code being traced on one CPU with what is executed
on other CPUs.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-3-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mathieu Poirier [Fri, 24 May 2019 17:34:52 +0000 (11:34 -0600)]
perf cs-etm: Configure contextID tracing in CPU-wide mode
When operating in CPU-wide mode being notified of contextID changes is
required so that the decoding mechanic is aware of the process context
switch.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190524173508.29044-2-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 17 May 2019 11:33:47 +0000 (13:33 +0200)]
perf evsel: Remove superfluous nthreads system_wide setup in alloc_fd()
It's already setup in the only caller of this method in
perf_evsel__open(), right before calling perf_evsel__alloc_fd(), no need
to do it again.
Also it's better to have it out of the function before we move it to
libperf.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-1k8lhyjxfk7o8v4g3r7eyjc9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
yuzhoujian [Thu, 30 May 2019 13:29:22 +0000 (14:29 +0100)]
perf record: Add support to collect callchains from kernel or user space only
One can just record callchains in the kernel or user space with this new
options.
We can use it together with "--all-kernel" options.
This two options is used just like print_stack(sys) or print_ustack(usr)
for systemtap.
Shown below is the usage of this new option combined with "--all-kernel"
options:
1. Configure all used events to run in kernel space and just collect
kernel callchains.
$ perf record -a -g --all-kernel --kernel-callchains
2. Configure all used events to run in kernel space and just collect
user callchains.
$ perf record -a -g --all-kernel --user-callchains
Committer notes:
Improved documentation to state that asking for kernel callchains really
is asking for excluding user callchains, and vice versa.
Further mentioned that using both won't get both, but nothing, as both
will be excluded.
Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1559222962-22891-1-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 6 Jun 2019 13:56:55 +0000 (10:56 -0300)]
perf config: Bail out when a handler returns failure for a key-value pair
So perf_config() uses:
int ret = 0;
perf_config_set__for_each_entry(config_set, section, item) {
...
ret = fn();
if (ret < 0)
break;
}
return ret;
Expecting that that break will imediatelly go to function exit to return
that error value (ret).
The problem is that perf_config_set__for_each_entry() expands into two
nested for() loops, one traversing the sections in a config and the
second the items in each of those sections, so we have to change that
'break' to a goto label right before that final 'return ret'.
With that, for instance 'perf trace' now correctly bails out when a
event that is requested to be added via its 'trace.add_events'
~/.perfconfig entry gets rejected by the kernel BPF verifier:
# perf trace ls
event syntax error: '/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o'
\___ Kernel verifier blocks program loading
(add -v to see detail)
Run 'perf list' for a list of valid events
Error: wrong config key-value pair trace.add_events=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
#
While before it would continue and explode later, when trying to find
maps that would have been in place had that augmented_raw_syscalls.o
precompiled BPF proggie been accepted by the, humm, bast... rigorous
kernel BPF verifier 8-)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: 8a0a9c7e9146 ("perf config: Introduce new init() and exit()")
Link: https://lkml.kernel.org/n/tip-qvqxfk9d0rn1l7lcntwiezrr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Thu, 6 Jun 2019 13:38:59 +0000 (10:38 -0300)]
perf trace: Exit when failing to build eBPF program
On my Juno board with ARM64 CPUs, perf trace command reports the eBPF
program building failure but the command will not exit and continue to
run. If we define an eBPF event in config file, the event will be
parsed with below flow:
perf_config()
`> trace__config()
`> parse_events_option()
`> parse_events__scanner()
`-> parse_events_parse()
`> parse_events_load_bpf()
`> llvm__compile_bpf()
Though the low level functions return back error values when detect eBPF
building failure, but parse_events_option() returns 1 for this case and
trace__config() passes 1 to perf_config(); perf_config() doesn't treat
the returned value 1 as failure and it continues to parse other
configurations. Thus the perf command continues to run even without
enabling eBPF event successfully.
This patch changes error handling in trace__config(), when it detects
failure it will return -1 rather than directly pass error value (1);
finally, perf_config() will directly bail out and perf will exit for
this case.
Committer notes:
Simplified the patch to just check directly the return of
parse_events_option() and it it is non-zero, change err from its initial
zero value to -1.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: ac96287cae08 ("perf trace: Allow specifying a set of events to add in perfconfig")
Link: https://lkml.kernel.org/n/tip-x4i63f5kscykfok0hqim3zma@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 5 Jun 2019 13:53:06 +0000 (10:53 -0300)]
perf trace: Associate more argument names with the filename beautifier
For instance, the rename* family uses "oldname", "newname", so check if
"name" is at the end and treat it as a filename.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-wjy7j4bk06g7atzwoz1mid24@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 5 Jun 2019 13:34:47 +0000 (10:34 -0300)]
perf trace: Consume the augmented_raw_syscalls payload
To support the SCA_FILENAME beautifier in more than one syscall arg, as
needed for syscalls such as the rename* family, we need to, after
processing one such arg, bump the augmented pointers so that the next
augmented arg don't reuse data for the previous augmented arguments.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-4e4cmzyjxb3wkonfo1x9a27y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 31 May 2019 13:13:21 +0000 (15:13 +0200)]
perf jvmti: Address gcc string overflow warning for strncpy()
We are getting false positive gcc warning when we compile with gcc9 (9.1.1):
CC jvmti/libjvmti.o
In file included from /usr/include/string.h:494,
from jvmti/libjvmti.c:5:
In function ‘strncpy’,
inlined from ‘copy_class_filename.constprop’ at jvmti/libjvmti.c:166:3:
/usr/include/bits/string_fortified.h:106:10: error: ‘__builtin_strncpy’ specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jvmti/libjvmti.c: In function ‘copy_class_filename.constprop’:
jvmti/libjvmti.c:165:26: note: length computed here
165 | size_t file_name_len = strlen(file_name);
| ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
As per Arnaldo's suggestion use strlcpy(), which does the same thing and keeps
gcc silent.
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20190531131321.GB1281@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 4 Jun 2019 19:04:34 +0000 (16:04 -0300)]
perf augmented_raw_syscalls: Move reading filename to the loop
Almost there, next step is to copy more than one filename payload.
Probably to read syscall arg structs, etc we'll need just a variation of
this that will decide what to use, if probe_read_str() or plain
probe_read for structs, i.e. fixed size.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-uf6u0pld6xe4xuo16f04owlz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 4 Jun 2019 18:57:28 +0000 (15:57 -0300)]
perf augmented_raw_syscalls: Change helper to consider just the augmented_filename part
So that we can use it for multiple args, baby steps not to step into the
verifier toes.
In the process make sure we handle -EFAULT from bpf_prog_read_str(), as
this really is needed now that we'll handle more than one augmented
argument, i.e. if there is failure, then we have the argument that fails
have:
(size = 0, err = -EFAULT, value = [] )
followed by the next, lets say that worked for a second pathname:
(size = 4, err = 0, value = "/tmp" )
So we can skip the first while telling the user about the problem and
then process the second.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-deyvqi39um6gp6hux6jovos8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Tue, 4 Jun 2019 18:31:28 +0000 (15:31 -0300)]
perf augmented_raw_syscalls: Move the probe_read_str to a separate function
One more step into copying multiple filenames to support syscalls like
rename*.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-xdqtjexdyp81oomm1rkzeifl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Mon, 3 Jun 2019 19:52:46 +0000 (16:52 -0300)]
perf augmented_raw_syscalls: Tell which args are filenames and how many bytes to copy
Since we know what args are strings from reading the syscall
descriptions in tracefs and also already mark such args to be beautified
using the syscall_arg__scnprintf_filename() helper, all we need is to
fill in this info in the 'syscalls' BPF map we were using to state which
syscalls the user is interested in, i.e. the syscall filter.
Right now just set that with PATH_MAX and unroll the syscall arg in the
BPF program, as the verifier isn't liking something clang generates when
unrolling the loop.
This also makes the augmented_raw_syscalls.c program support all arches,
since we removed that set of defines with the hard coded syscall
numbers, all should be automatically set for all arches, with the
syscall id mapping done correcly.
Doing baby steps here, i.e. just the first string arg for a syscall is
printed, syscalls with more than one, say, the various rename* syscalls,
need further work, but lets get first something that the BPF verifier
accepts before increasing the complexity
To test it, something like:
# perf trace -e string -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
With:
# cat ~/.perfconfig
[llvm]
dump-obj = true
clang-opt = -g
[trace]
#add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
show_zeros = yes
show_duration = no
no_inherit = yes
show_timestamp = no
show_arg_names = no
args_alignment = 40
show_prefix = yes
#
That commented add_events line is needed for developing this
augmented_raw_syscalls.c BPF program, as if we add it via the
'add_events' mechanism so as to shorten the 'perf trace' command lines,
then we end up not setting up the -v option which precludes us having
access to the bpf verifier log :-\
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-dn863ya0cbsqycxuy0olvbt1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:28 +0000 (14:37 +0300)]
perf scripts python: exported-sql-viewer.py: Select find text when find bar is activated
The user probably wants to replace the find text, so select the find
text when the find bar is activated.
That is fairly standard behaviour for search text entry.
Entering text will replace the current text, but using edit keys
(arrows, home, end etc) cancels the selection and enables editing.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-23-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:27 +0000 (14:37 +0300)]
perf scripts python: exported-sql-viewer.py: Add IPC information to Call Tree
Enhance the call tree to display IPC information if it is available.
Committer testing:
[acme@quaco adrian.hunter]$ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py ~/c/adrian.hunter/simple-retpoline.db
Reports -> Call Tree, then expand a few trees, then select with the
mouse and press control+C (copy):
Call Path Object Call Time Time Time(%) Insn Insn Cyc Cyc IPC Branch Branch
▼ simple-retpolin (ns) Cnt Cnt(%) Cnt Cnt(%) Count Count(%)
▼ 23003:23003
▼ _start ld-2.28.so
112195670 218295 100.0 127746 100.0 207320 100.0 0.62 13046 100.0
▶ unknown unknown
112195987 3202 1.5 0 0.0 0 0.0 0 1 0.0
▶ _dl_start ld-2.28.so
112199189 188471 86.3 123394 96.6 180007 86.8 0.69 12529 96.0
▼ _dl_init ld-2.28.so
112387660 13406 6.1 3207 2.5 14868 7.2 0.22 327 2.5
▶ call_init.part.0 ld-2.28.so
112387773 117 0.9 70 2.2 639 4.3 0.11 3 0.9
▶ call_init.part.0 ld-2.28.so
112387890 13129 97.9 3103 96.8 14100 94.8 0.22 315 96.3
▶ call_init.part.0 ld-2.28.so
112401020 0 0.0 0 0.0 0 0.0 0 2 0.6
▼ _start simple-retpol
112401066 12899 5.9 1142 0.9 11561 5.6 0.10 184 1.4
▶ unknown unknown
112401388 846 6.6 0 0.0 0 0.0 0 1 0.5
▼ __libc_start_main libc-2.28.so
112402344 11621 90.1 1129 98.9 10350 89.5 0.11 181 98.4
▶ __cxa_atexit libc-2.28.so
112402360 2302 19.8 101 8.9 1817 17.6 0.06 13 7.2
▶ __libc_csu_init simple-retpol
112404673 121 1.0 43 3.8 340 3.3 0.13 8 4.4
▶ _setjmp libc-2.28.so
112404794 74 0.6 46 4.1 206 2.0 0.22 4 2.2
▼ main simple-retpol
112404892 44 0.4 23 2.0 126 1.2 0.18 12 6.6
▼ foo simple-retpol
112404892 19 43.2 12 52.2 55 43.7 0.22 5 41.7
bar simple-retpol
112404896 12 63.2 3 25.0 34 61.8 0.09 1 20.0
▼ foo simple-retpol
112404911 25 56.8 11 47.8 71 56.3 0.15 5 41.7
▶ bar simple-retpol
112404924 10 40.0 3 27.3 27 38.0 0.11 1 20.0
▶ exit libc-2.28.so
112404936 9029 77.7 878 77.8 7765 75.0 0.11 139 76.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-22-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:26 +0000 (14:37 +0300)]
perf scripts python: exported-sql-viewer.py: Add IPC information to Call Graph Graph
Enhance the call graph to display IPC information if it is available.
Committer testing:
[acme@quaco adrian.hunter]$ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py ~/c/adrian.hunter/simple-retpoline.db
Reports -> Context Sensitive Callgraph, then expand a few trees, then
select with the mouse and press control+C:
Call Path Object Count Time(ns) Time(%) Insn Insn Cyc Cyc IPC Branch Branch
▼ simple-retpolin Cnt Cnt(%) Cnt Cnt(%) Cnt Cnt(%)
▼ 23003:23003
▼ _start ld-2.28.so 1 218295 100.0 127746 100.0 207320 100.0 0.62 13046 100.0
▶ unknown unknown 1 3202 1.5 0 0.0 0 0.0 0 1 0.0
▶ _dl_start ld-2.28.so 1 188471 86.3 123394 96.6 180007 86.8 0.69 12529 96.0
▶ _dl_init ld-2.28.so 1 13406 6.1 3207 2.5 14868 7.2 0.22 327 2.5
▼ _start simple-retpoline 1 12899 5.9 1142 0.9 11561 5.6 0.10 184 1.4
▶ unknown unknown 1 846 6.6 0 0.0 0 0.0 0 1 0.5
▼ __libc_start_main libc-2.28.so 1 11621 90.1 1129 98.9 10350 89.5 0.11 181 98.4
▶ __cxa_atexit libc-2.28.so 1 2302 19.8 101 8.9 1817 17.6 0.06 13 7.2
▶ __libc_csu_init simple-retpoline 1 121 1.0 43 3.8 340 3.3 0.13 8 4.4
▼ _setjmp libc-2.28.so 1 74 0.6 46 4.1 206 2.0 0.22 4 2.2
▼ __sigsetjmp libc-2.28.so 1 74 100.0 46 100.0 206 100.0 0.22 3 75.0
▶ __sigjmp_save libc-2.28.so 1 0 0.0 0 0.0 0 0.0 0 1 33.3
▼ main simple-retpoline 1 44 0.4 23 2.0 126 1.2 0.18 12 6.6
▼ foo simple-retpoline 2 44 100.0 23 100.0 126 100.0 0.18 10 83.3
bar simple-retpoline 2 22 50.0 6 26.1 61 48.4 0.10 2 20.0
▶ exit libc-2.28.so 1 9029 77.7 878 77.8 7765 75.0 0.11 139 76.8
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-21-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:25 +0000 (14:37 +0300)]
perf scripts python: exported-sql-viewer.py: Add CallGraphModelParams
Add a parameter to call graph and call tree, to determine whether IPC
information is available.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-20-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:24 +0000 (14:37 +0300)]
perf scripts python: exported-sql-viewer.py: Add IPC information to the Branch reports
Enhance the "All branches" and "Selected branches" reports to display IPC
information if it is available.
Committer testing:
So, testing this I noticed that it all starts with the left arrow in every
line, that should mean there is some tree there, i.e. look at all those ▶
symbols:
Reports -> All Branches:
Time CPU Command PID TID Branch Type In Tx Insn Cnt Cyc Cnt IPC Branch
▶
187836112195670 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4f110
+_start (ld-2.28.so)
▶
187836112195987 7 simple-retpolin 23003 23003 trace end No 0 883 0
7f6f33d4f110 _start (ld-2.28.so) -> 0 unknown
+(unknown)
▶
187836112199189 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4f110
+_start (ld-2.28.so)
▶
187836112199189 7 simple-retpolin 23003 23003 call No 0 0 0
7f6f33d4f113 _start+0x3 (ld-2.28.so) ->
7f6f33d4ff50
+_dl_start (ld-2.28.so)
▶
187836112199544 7 simple-retpolin 23003 23003 trace end No 17 996 0.02
7f6f33d4ff73 _dl_start+0x23 (ld-2.28.so) -> 0
+unknown (unknown)
▶
187836112200939 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4ff73
+_dl_start+0x23 (ld-2.28.so)
▶
187836112201229 7 simple-retpolin 23003 23003 trace end No 1 816 0.00
7f6f33d4ff7a _dl_start+0x2a (ld-2.28.so) -> 0
+unknown (unknown)
▶
187836112203500 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4ff7a
+_dl_start+0x2a (ld-2.28.so)
But if you click on it, that ▶ disappears and a new click doesn't make
it reappear, looks buggy, minor oddity, reported to Adrian.
Reports -> Selected Branches, then ask for branches in the ld-2.28.so
DSO:
Time CPU Command PID TID Branch Type In Tx Insn Cnt Cyc Cnt IPC Branch
▶
187836112195987 7 simple-retpolin 23003 23003 trace end No 0 883 0
7f6f33d4f110 _start (ld-2.28.so) -> 0 unknown (unknown)
▶
187836112199189 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4f110 _start (ld-2.28.so)
▶
187836112199189 7 simple-retpolin 23003 23003 call No 0 0 0
7f6f33d4f113 _start+0x3 (ld-2.28.so) ->
7f6f33d4ff50 _dl_start (ld-2.28.so)
▶
187836112199544 7 simple-retpolin 23003 23003 trace end No 17 996 0.02
7f6f33d4ff73 _dl_start+0x23 (ld-2.28.so) -> 0 unknown (unknown)
▶
187836112200939 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4ff73 _dl_start+0x23 (ld-2.28.so)
▶
187836112201229 7 simple-retpolin 23003 23003 trace end No 1 816 0.00
7f6f33d4ff7a _dl_start+0x2a (ld-2.28.so) -> 0 unknown (unknown)
▶
187836112203500 7 simple-retpolin 23003 23003 trace begin No 0 0 0 0 unknown (unknown) ->
7f6f33d4ff7a _dl_start+0x2a (ld-2.28.so)
▶
187836112203528 7 simple-retpolin 23003 23003 unconditional jump No 0 0 0
7f6f33d4ffe7 _dl_start+0x97 (ld-2.28.so) ->
7f6f33d5000b _dl_start+0xbb (ld-2.28.so)
▶
187836112203528 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d5000f _dl_start+0xbf (ld-2.28.so) ->
7f6f33d4fffb _dl_start+0xab (ld-2.28.so)
▶
187836112203528 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d5000f _dl_start+0xbf (ld-2.28.so) ->
7f6f33d4fffb _dl_start+0xab (ld-2.28.so)
▶
187836112203539 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d50025 _dl_start+0xd5 (ld-2.28.so) ->
7f6f33d50210 _dl_start+0x2c0 (ld-2.28.so)
▶
187836112203539 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d5021a _dl_start+0x2ca (ld-2.28.so) ->
7f6f33d50360 _dl_start+0x410 (ld-2.28.so)
▶
187836112203539 7 simple-retpolin 23003 23003 unconditional jump No 0 0 0
7f6f33d50377 _dl_start+0x427 (ld-2.28.so) ->
7f6f33d4ffff _dl_start+0xaf (ld-2.28.so)
▶
187836112203539 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d5000f _dl_start+0xbf (ld-2.28.so) ->
7f6f33d4fffb _dl_start+0xab (ld-2.28.so)
▶
187836112203562 7 simple-retpolin 23003 23003 conditional jump No 0 0 0
7f6f33d5000f _dl_start+0xbf (ld-2.28.so) ->
7f6f33d4fffb _dl_start+0xab (ld-2.28.so)
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-19-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:23 +0000 (14:37 +0300)]
perf scripts python: export-to-postgresql.py: Export IPC information
Export cycle and instruction counts on samples and calls tables.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-18-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:22 +0000 (14:37 +0300)]
perf scripts python: export-to-sqlite.py: Export IPC information
Export cycle and instruction counts on samples and calls tables.
Committer testing:
First runs some workload collecting intel_pt with the 'cyc' ter just for
userspace:
[root@quaco adrian.hunter]# perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.035 MB simple-retpoline.perf.data ]
[root@quaco adrian.hunter]#
Then use the export-to-sqlite.py script to see if the changes in this
cset don't make it to break and if the changes in the db schema are the
ones expected:
[root@quaco adrian.hunter]# perf script -i simple-retpoline.perf.data --itrace=be -s ~acme/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
2019-05-31 11:50:46.942710 Creating database ...
2019-05-31 11:50:46.949663 Writing records...
2019-05-31 11:50:47.224033 Adding indexes
2019-05-31 11:50:47.231599 Done
[root@quaco adrian.hunter]#
Now lets use the db:
[root@quaco adrian.hunter]# sqlite3 simple-retpoline.db
SQLite version 3.26.0 2018-12-01 12:34:55
Enter ".help" for usage hints.
sqlite> .schema samples
CREATE TABLE samples (id integer NOT NULL PRIMARY KEY,evsel_id bigint,machine_id bigint,thread_id bigint,comm_id bigint,dso_id bigint,symbol_id bigint,sym_offset bigint,ip bigint,time bigint,cpuinteger,to_dso_id bigint,to_symbol_id bigint,to_sym_offset bigint,to_ip bigint,branch_type integer,in_tx boolean,call_path_id bigint,insn_count bigint,cyc_count bigint);
sqlite>
Cool, the 'insn_count' and 'cyc_count' are there, now lets see if we can
use them in a query:
sqlite> select insn_count,cyc_count from samples where cyc_count > 1500 and insn_count < 10;
6|1507
sqlite> select insn_count,cyc_count from samples where cyc_count > 1500;
118|2210
140|1516
3783|1861
132|1521
6|1507
sqlite>
Seems to work :-)
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-17-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:21 +0000 (14:37 +0300)]
perf db-export: Export IPC information
Export cycle and instruction counts on samples and call-returns.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-16-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:20 +0000 (14:37 +0300)]
perf db-export: Add brief documentation
Add brief documentation to explain how the database export maintains
backward and forward compatibility.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-15-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:19 +0000 (14:37 +0300)]
perf thread-stack: Accumulate IPC information
Cycle and instruction counts are added to the stack. The IPC of a
function and all functions it calls, is also recorded.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-14-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:18 +0000 (14:37 +0300)]
perf intel-pt: Document IPC usage
Add brief documentation about instructions-per-cycle (IPC) information
derived from Intel PT.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-13-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:17 +0000 (14:37 +0300)]
perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packets
When CYC packets are not available, it is still possible to count cycles
using TSC/TMA/MTC timestamps.
As the timestamp increments in TSC ticks, convert to CPU cycles using
the current core-to-bus ratio.
Do not accumulate cycles when control flow packet generation is not
enabled, nor when time has been "lost", typically due to mwait, which is
indicated by a TSC/TMA packet that is not part of PSB+.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-12-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:16 +0000 (14:37 +0300)]
perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip
To make it easier to add new code for different TIP cases, separate each
case.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:15 +0000 (14:37 +0300)]
perf intel-pt: Record when decoding PSB+ packets
In preparation for using MTC packets to count cycles, record whether
decoding is between a PSB and PSBEND packets.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-10-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:14 +0000 (14:37 +0300)]
perf script: Add output of IPC ratio
Add field 'ipc' to display instructions-per-cycle.
Example:
perf record -e intel_pt/cyc/u ls
perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid
ls
2670177.
697113434:
7f0dfdbcd090 _start+0x0 mov %rsp, %rdi IPC: 0.00 (1/877)
ls
2670177.
697113434:
7f0dfdbcd093 _start+0x3 callq 0x7f0dfdbce030
ls
2670177.
697113434:
7f0dfdbce030 _dl_start+0x0 pushq %rbp
ls
2670177.
697113434:
7f0dfdbce031 _dl_start+0x1 mov %rsp, %rbp
ls
2670177.
697113434:
7f0dfdbce034 _dl_start+0x4 pushq %r15
ls
2670177.
697113434:
7f0dfdbce036 _dl_start+0x6 pushq %r14
ls
2670177.
697113434:
7f0dfdbce038 _dl_start+0x8 pushq %r13
ls
2670177.
697113434:
7f0dfdbce03a _dl_start+0xa pushq %r12
ls
2670177.
697113434:
7f0dfdbce03c _dl_start+0xc mov %rdi, %r12
ls
2670177.
697113434:
7f0dfdbce03f _dl_start+0xf pushq %rbx
ls
2670177.
697113434:
7f0dfdbce040 _dl_start+0x10 sub $0x38, %rsp
ls
2670177.
697113434:
7f0dfdbce044 _dl_start+0x14 rdtsc
ls
2670177.
697113434:
7f0dfdbce046 _dl_start+0x16 mov %eax, %eax
ls
2670177.
697113434:
7f0dfdbce048 _dl_start+0x18 shl $0x20, %rdx
ls
2670177.
697113434:
7f0dfdbce04c _dl_start+0x1c or %rax, %rdx
ls
2670177.
697114471:
7f0dfdbce04f _dl_start+0x1f movq 0x27e22(%rip), %rax IPC: 0.00 (15/1685)
ls
2670177.
697116177:
7f0dfdbce056 _dl_start+0x26 movq %rdx, 0x27683(%rip) IPC: 0.00 (1/881)
Note, the IPC values are low due to page faults at the beginning of
execution. The additional cycles are due to the time to enter the
kernel, not the actual kernel page fault handler.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-9-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:13 +0000 (14:37 +0300)]
perf intel-pt: Add support for samples to contain IPC ratio
Copy the incremental instruction count and cycle count onto 'instructions'
and 'branches' samples.
Because Intel PT does not update the cycle count on every branch or
instruction, the incremental values will often be zero.
When there are values, they will be the number of instructions and
number of cycles since the last update, and thus represent the average
IPC since the last IPC value.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:12 +0000 (14:37 +0300)]
perf tools: Add IPC information to perf_sample
Add counts of instructions and cycles, in order to represent
instructions-per-cycle (IPC).
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:11 +0000 (14:37 +0300)]
perf intel-pt: Accumulate cycle count from CYC packets
In preparation for providing instructions-per-cycle (IPC) information,
accumulate cycle count from CYC packets.
Although CYC packets are optional (requires config term 'cyc' to enable
cycle-accurate mode when recording), the simplest way to count cycles is
with CYC packets.
The first complication is that cycles must be counted only when also
counting instructions.
That means when control flow packet generation is enabled i.e. between
TIP.PGE and TIP.PGD packets.
Also, sampling the cycle count follows the same rules as sampling the
timestamp, that is, not before the instruction to which the decoder is
walking is reached.
In addition, the cycle count is not accurate for any but the first
branch of a TNT packet.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 20 May 2019 11:37:10 +0000 (14:37 +0300)]
perf intel-pt: Factor out intel_pt_update_sample_time
To eliminate some duplication and make the code more understandable,
factor out intel_pt_update_sample_time.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190520113728.14389-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexey Budankov [Thu, 30 May 2019 19:03:36 +0000 (22:03 +0300)]
perf record: Allow mixing --user-regs with --call-graph=dwarf
When DWARF stacks were requested and at the same time that the user
specifies a register set using the --user-regs option the full register
context was being captured on samples:
$ perf record -g --call-graph dwarf,1024 --user-regs=IP,SP,BP -- stack_test2.g.O3
188143843893585 0x6b48 [0x4f8]: PERF_RECORD_SAMPLE(IP, 0x4002): 23828/23828: 0x401236 period:
1363819 addr: 0x7ffedbdd51ac
... FP chain: nr:0
... user regs: mask 0xff0fff ABI 64-bit
.... AX 0x53b
.... BX 0x7ffedbdd3cc0
.... CX 0xffffffff
.... DX 0x33d3a
.... SI 0x7f09b74c38d0
.... DI 0x0
.... BP 0x401260
.... SP 0x7ffedbdd3cc0
.... IP 0x401236
.... FLAGS 0x20a
.... CS 0x33
.... SS 0x2b
.... R8 0x7f09b74c3800
.... R9 0x7f09b74c2da0
.... R10 0xfffffffffffff3ce
.... R11 0x246
.... R12 0x401070
.... R13 0x7ffedbdd5db0
.... R14 0x0
.... R15 0x0
... ustack: size 1024, offset 0xe0
. data_src: 0x5080021
... thread: stack_test2.g.O:23828
...... dso: /root/abudanko/stacks/stack_test2.g.O3
I.e. the --user-regs=IP,SP,BP was being ignored, being overridden by the
needs of --call-graph=dwarf.
After applying the change in this patch the sample data contains the
user specified register, but making sure that at least the minimal set
of register needed for DWARF unwinding (DWARF_MINIMAL_REGS) is
requested.
The user is warned that DWARF unwinding may not work if extra registers
end up being needed.
-g call-graph dwarf,K full_regs
--user-regs=user_regs user_regs
-g call-graph dwarf,K --user-regs=user_regs user_regs + DWARF_MINIMAL_REGS
$ perf record -g --call-graph dwarf,1024 --user-regs=BP -- ls
WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced.
arch COPYING Documentation include Kbuild lbuild MAINTAINERS modules.builtin Module.symvers perf.data.old scripts System.map virt
block CREDITS drivers init Kconfig lib Makefile modules.builtin.modinfo net README security tools vmlinux
certs crypto fs ipc kernel LICENSES mm modules.order perf.data samples sound usr vmlinux.o
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]
188368474305373 0x5e40 [0x470]: PERF_RECORD_SAMPLE(IP, 0x4002): 23839/23839: 0x401236 period:
1260507 addr: 0x7ffd3d85e96c
... FP chain: nr:0
... user regs: mask 0x1c0 ABI 64-bit
.... BP 0x401260
.... SP 0x7ffd3d85cc20
.... IP 0x401236
... ustack: size 1024, offset 0x58
. data_src: 0x5080021
Committer notes:
Detected build failures on arches where PERF_REGS_ is not available,
such as debian:experimental-x-{mips,mips64,mipsel}, fedora 24 and 30 for
ARC uClibc and glibc, reported to Alexey that provided a patch moving
the DWARF_MINIMAL_REGS from evsel.c to util/perf_regs.h, where it is
guarded by an HAVE_PERF_REGS_SUPPORT ifdef.
Committer testing:
# perf record --user-regs=bp,ax -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.955 MB perf.data (1773 samples) ]
# perf script -F+uregs | grep AX: | head -5
perf 1719 [000] 181.272398: 1 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
perf 1719 [000] 181.272402: 1 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
perf 1719 [000] 181.272403: 8 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
perf 1719 [000] 181.272405: 181 cycles:
ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
perf 1719 [000] 181.272406: 4405 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffef828fb00
# perf record --call-graph=dwarf --user-regs=bp,ax -a sleep 1
WARNING: The use of --call-graph=dwarf may require all the user registers, specifying a subset with --user-regs may render DWARF unwinding unreliable, so the minimal registers set (IP, SP) is explicitly forced.
[ perf record: Woken up 55 times to write data ]
[ perf record: Captured and wrote 24.184 MB perf.data (2841 samples) ]
[root@quaco ~]# perf script --hide-call-graph -F+uregs | grep AX: | head -5
perf 1729 [000] 211.268006: 1 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
perf 1729 [000] 211.268014: 1 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
perf 1729 [000] 211.268017: 5 cycles:
ffffffffba06a7c4 native_write_msr+0x4 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
perf 1729 [000] 211.268020: 48 cycles:
ffffffffba06a7c6 native_write_msr+0x6 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
perf 1729 [000] 211.268024: 490 cycles:
ffffffffba00e471 intel_bts_enable_local+0x21 (/lib/modules/5.2.0-rc1+/build/vmlinux) ABI:2 AX:0xffffffffffffffda BP:0x7ffc8679abb0 SP:0x7ffc8679ab78 IP:0x7fa75223a0db
#
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e7fd37b1-af22-0d94-a0dc-5895e803bbfe@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Thu, 30 May 2019 09:38:01 +0000 (17:38 +0800)]
perf symbols: Remove unused variable 'err'
Variable 'err' is defined but never used in function symsrc__init(),
remove it and directly return -1 at the end of the function.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190530093801.20510-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 29 May 2019 18:50:50 +0000 (15:50 -0300)]
perf data: Document directory format header: HEADER_DIR_FORMAT
We forgot to update the perf.data file format document for the
HEADER_DIR_FORMAT header, do it now from comments in the patch
introducing it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chong Jiang <chongjiang@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Fixes: 258031c017c3 ("perf header: Add DIR_FORMAT feature to describe directory data")
Link: https://lkml.kernel.org/n/tip-jbrzb7ijb5al33gi8br6f9rr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 29 May 2019 18:43:51 +0000 (15:43 -0300)]
perf data: Document clockid header: HEADER_CLOCKID
We forgot to update the perf.data file format document for the
HEADER_CLOCKID header, do it now from comments in the patch introducing
it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chong Jiang <chongjiang@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Fixes: cf7905165fee ("perf record: Encode -k clockid frequency into Perf trace")
Link: https://lkml.kernel.org/n/tip-slhnjp06027j3ae17qqetzxj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 29 May 2019 18:35:03 +0000 (15:35 -0300)]
perf data: Document memory topology header: HEADER_MEM_TOPOLOGY
We forgot to update the perf.data file format document for the
HEADER_MEM_TOPOLOGY header, do it now from comments in the patch
introducing it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chong Jiang <chongjiang@chromium.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Simon Que <sque@chromium.org>
Fixes: e2091cedd51b ("perf tools: Add MEM_TOPOLOGY feature to perf data file")
Link: https://lkml.kernel.org/n/tip-h5lcm1nbe9ztxwm61gmadd56@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Song Liu [Tue, 21 May 2019 06:44:06 +0000 (23:44 -0700)]
perf data: Add description of header HEADER_BPF_PROG_INFO and HEADER_BPF_BTF
This patch addes description of HEADER_BPF_PROG_INFO and HEADER_BPF_BTF to
perf.data-file-format.txt.
Requested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 606f972b1361 ("perf bpf: Save bpf_prog_info information as headers to perf.data")
Link: http://lkml.kernel.org/r/20190521064406.2498925-1-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Mon, 3 Jun 2019 09:58:45 +0000 (11:58 +0200)]
Merge branch 'x86/topology' into perf/core, to prepare for new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:18 +0000 (17:55 +0200)]
perf/x86: Use update attribute groups for default attributes
Using the new pmu::update_attrs attribute group for default
attributes - freeze_on_smi, allow_tsx_force_abort.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-10-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:17 +0000 (17:55 +0200)]
perf/x86/intel: Use update attributes for skylake format
Using the new pmu::update_attrs attribute group for
skylake specific format attributes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-9-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:16 +0000 (17:55 +0200)]
perf/x86: Use update attribute groups for extra format
Using the new pmu::update_attrs attribute group for
extra "format" directory.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-8-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:15 +0000 (17:55 +0200)]
perf/x86: Use update attribute groups for caps
Using the new pmu::update_attrs attribute group for
"caps" directory.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-7-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:14 +0000 (17:55 +0200)]
perf/x86: Add is_visible attribute_group callback for base events
We dont need to pre-filter out unsupported base events,
we can just use its group's is_visible function to do this.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-6-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:13 +0000 (17:55 +0200)]
perf/x86: Use the new pmu::update_attrs attribute group
Using the new pmu::update_attrs attribute group to
create detected events for x86_pmu.
Moving the topdown/memory/tsx attributes to separate
attribute groups with specific is_visible functions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-5-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:12 +0000 (17:55 +0200)]
perf/x86: Get rid of x86_pmu::event_attrs
Nobody is using that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-4-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:11 +0000 (17:55 +0200)]
perf/core: Add attr_groups_update into struct pmu
Adding attr_update attribute group into pmu, to allow
having multiple attribute groups for same group name.
This will allow us to update "events" or "format"
directories with attributes that depend on various
HW conditions.
For example having group_format_extra group that updates
"format" directory only if pmu version is 2 and higher:
static umode_t
exra_is_visible(struct kobject *kobj, struct attribute *attr, int i)
{
return x86_pmu.version >= 2 ? attr->mode : 0;
}
static struct attribute_group group_format_extra = {
.name = "format",
.is_visible = exra_is_visible,
};
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-3-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Sun, 12 May 2019 15:55:10 +0000 (17:55 +0200)]
sysfs: Add sysfs_update_groups function
Adding sysfs_update_groups function to update
multiple groups.
sysfs_update_groups - given a directory kobject, create a bunch of attribute groups
@kobj: The kobject to update the group on
@groups: The attribute groups to update, NULL terminated
This function update a bunch of attribute groups. If an error occurs when
updating a group, all previously updated groups will be removed together
with already existing (not updated) attributes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190512155518.21468-2-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Gayatri Kammela [Sat, 11 May 2019 00:03:11 +0000 (17:03 -0700)]
perf/x86/intel/uncore: Add new IMC PCI IDs for KabyLake, AmberLake and WhiskeyLake CPUs
AmberLake and WhiskeyLake have same client uncore events as
KabyLake. Thus add the PCI IDs for AmberLake Y processor lines,
for WhiskeyLake U processor lines and for KabyLake, add H
processor line and workstation.
Platform Device ID
================================
AML Y 2 Core 590Ch
KBL H 4 Core 5910h
KBL 4 Core WorkStation 5918h
WHL U 4 Core 3ED0h
WHL U 4 Core 3E34h
WHL U 2 Core 3E35h
AML Y 4 Core 590Dh
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Charles Prestopine <charles.d.prestopine@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190511000311.20733-2-gayatri.kammela@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Gayatri Kammela [Sat, 11 May 2019 00:03:10 +0000 (17:03 -0700)]
perf/x86/intel/uncore: Add tabs to Uncore IMC PCI IDs
Improve code readability by adding tabs after #define macros
Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Charles Prestopine <charles.d.prestopine@intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190511000311.20733-1-gayatri.kammela@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Song Liu [Tue, 7 May 2019 16:15:45 +0000 (09:15 -0700)]
perf/core: Allow non-privileged uprobe for user processes
Currently, non-privileged user could only use uprobe with
kernel.perf_event_paranoid = -1
However, setting perf_event_paranoid to -1 leaks other users' processes to
non-privileged uprobes.
To introduce proper permission control of uprobes, we are building the
following system:
A daemon with CAP_SYS_ADMIN is in charge to create uprobes via tracefs;
Users asks the daemon to create uprobes;
Then user can attach uprobe only to processes owned by the user.
This patch allows non-privileged user to attach uprobe to processes owned
by the user.
The following example shows how to use uprobe with non-privileged user.
This is based on Brendan's blog post [1]
1. Create uprobe with root:
sudo perf probe -x 'readline%return +0($retval):string'
2. Then non-root user can use the uprobe as:
perf record -vvv -e probe_bash:readline__return -p <pid> sleep 20
perf script
[1] http://www.brendangregg.com/blog/2015-06-28/linux-ftrace-uprobe.html
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <kernel-team@fb.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190507161545.788381-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 3 Jun 2019 09:56:35 +0000 (11:56 +0200)]
Merge tag 'v5.2-rc3' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 2 Jun 2019 20:55:33 +0000 (13:55 -0700)]
Linux 5.2-rc3
Linus Torvalds [Sun, 2 Jun 2019 18:10:01 +0000 (11:10 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
KASAN related build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
x86/boot: Provide KASAN compatible aliases for string routines
Linus Torvalds [Sun, 2 Jun 2019 18:08:12 +0000 (11:08 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"On the kernel side there's a bunch of ring-buffer ordering fixes for a
reproducible bug, plus a PEBS constraints regression fix.
Plus tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf record: Fix s390 missing module symbol and warning for non-root users
perf machine: Read also the end of the kernel
perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms
perf session: Add missing swap ops for namespace events
perf namespace: Protect reading thread's namespace
tools headers UAPI: Sync drm/drm.h with the kernel
tools headers UAPI: Sync drm/i915_drm.h with the kernel
tools headers UAPI: Sync linux/fs.h with the kernel
tools headers UAPI: Sync linux/sched.h with the kernel
tools arch x86: Sync asm/cpufeatures.h with the with the kernel
tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls
perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel
perf data: Fix 'strncat may truncate' build failure with recent gcc
perf/ring-buffer: Use regular variables for nesting
perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
perf/ring_buffer: Add ordering to rb->nest increment
perf/ring_buffer: Fix exposing a temporarily decreased data_head
perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
Linus Torvalds [Sun, 2 Jun 2019 18:06:13 +0000 (11:06 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two EFI fixes: a quirk for weird systabs, plus add more robust error
handling in the old 1:1 mapping code"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Allow the number of EFI configuration tables entries to be zero
efi/x86/Add missing error handling to old_memmap 1:1 mapping code
Linus Torvalds [Sun, 2 Jun 2019 18:04:42 +0000 (11:04 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull stacktrace fix from Ingo Molnar:
"Fix a stack_trace_save_tsk_reliable() regression"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stacktrace: Unbreak stack_trace_save_tsk_reliable()
Linus Torvalds [Sun, 2 Jun 2019 17:22:38 +0000 (10:22 -0700)]
Merge tag 'spdx-5.2-rc3-2' of git://git./linux/kernel/git/gregkh/driver-core
Pull SPDX fixes from Greg KH:
"Here are just two small patches, that fix up some found SPDX
identifier issues.
The first patch fixes an error in a previous SPDX fixup patch, that
causes build errors when doing 'make clean' on the tree (the fact that
almost no one noticed it reflects the fact that kernel developers
don't like doing that option very often...)
The second patch fixes up a number of places in the tree where people
mistyped the string "SPDX-License-Identifier". Given that people can
not even type their own name all the time without mistakes, this was
bound to happen, and odds are, we will have to add some type of check
for this to checkpatch.pl to catch this happening in the future.
Both of these have passed testing by 0-day"
* tag 'spdx-5.2-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
treewide: fix typos of SPDX-License-Identifier
crypto: ux500 - fix license comment syntax error
Linus Torvalds [Sun, 2 Jun 2019 17:21:04 +0000 (10:21 -0700)]
Merge tag 'powerpc-5.2-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A minor fix to our IMC PMU code to print a less confusing error
message when the driver can't initialise properly.
A fix for a bug where a user requesting an unsupported branch sampling
filter can corrupt PMU state, preventing the PMU from counting
properly.
And finally a fix for a bug in our support for kexec_file_load(),
which prevented loading a kernel and initramfs. Most versions of kexec
don't yet use kexec_file_load().
Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi
Bangoria, Thiago Jung Bauermann"
* tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load()
powerpc/perf: Fix MMCRA corruption by bhrb_filter
powerpc/powernv: Return for invalid IMC domain
Linus Torvalds [Sun, 2 Jun 2019 17:19:39 +0000 (10:19 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Fixes for PPC and s390"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry()
KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9
KVM: PPC: Book3S HV: XIVE: Fix page offset when clearing ESB pages
KVM: PPC: Book3S HV: XIVE: Take the srcu read lock when accessing memslots
KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts
KVM: PPC: Book3S HV: XIVE: Introduce a new mutex for the XIVE device
KVM: PPC: Book3S HV: XIVE: Fix the enforced limit on the vCPU identifier
KVM: PPC: Book3S HV: XIVE: Do not test the EQ flag validity when resetting
KVM: PPC: Book3S HV: XIVE: Clear file mapping when device is released
KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup
KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions
KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID
kvm: fix compile on s390 part 2
Linus Torvalds [Sun, 2 Jun 2019 17:18:11 +0000 (10:18 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A memleak fix for the core, two driver bugfixes, as well as fixing
missing file patterns to MAINTAINERS"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
MAINTAINERS: add I2C DT bindings to ARM platforms
MAINTAINERS: add DT bindings to i2c drivers
i2c: synquacer: fix synquacer_i2c_doxfer() return value
i2c: mlxcpld: Fix wrong initialization order in probe
i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
Linus Torvalds [Sun, 2 Jun 2019 17:16:09 +0000 (10:16 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC fix from Eduardo Valentin:
"A single revert, detected to cause issues on the tsens driver"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
Revert "drivers: thermal: tsens: Add new operation to check if a sensor is enabled"
Linus Torvalds [Sun, 2 Jun 2019 17:14:25 +0000 (10:14 -0700)]
Merge tag 'led-fixes-for-5.2-rc3' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski:
"Fix for a recent change in LED core, that didn't take into account the
possibility of calling led_blink_setup() from atomic context"
* tag 'led-fixes-for-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: avoid flush_work in atomic context
Linus Torvalds [Sun, 2 Jun 2019 16:27:44 +0000 (09:27 -0700)]
Merge tag 'for-linus-
20190601' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- A set of patches fixing code comments / kerneldoc (Bart)
- Don't allow loop file change for exclusive open (Jan)
- Fix revalidate of hidden genhd (Jan)
- Init queue failure memory free fix (Jes)
- Improve rq limits failure print (John)
- Fixup for queue removal/addition (Ming)
- Missed error progagation for io_uring buffer registration (Pavel)
* tag 'for-linus-
20190601' of git://git.kernel.dk/linux-block:
block: print offending values when cloned rq limits are exceeded
blk-mq: Document the blk_mq_hw_queue_to_node() arguments
blk-mq: Fix spelling in a source code comment
block: Fix bsg_setup_queue() kernel-doc header
block: Fix rq_qos_wait() kernel-doc header
block: Fix blk_mq_*_map_queues() kernel-doc headers
block: Fix throtl_pending_timer_fn() kernel-doc header
block: Convert blk_invalidate_devt() header into a non-kernel-doc header
block/partitions/ldm: Convert a kernel-doc header into a non-kernel-doc header
blk-mq: Fix memory leak in error handling
block: don't protect generic_make_request_checks with blk_queue_enter
block: move blk_exit_queue into __blk_release_queue
block: Don't revalidate bdev of hidden gendisk
loop: Don't change loop device under exclusive opener
io_uring: Fix __io_uring_register() false success
Linus Torvalds [Sun, 2 Jun 2019 16:26:34 +0000 (09:26 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six minor fixes to device drivers and one to the multipath alua
handler.
The most extensive fix is the zfcp port remove prevention one, but
it's impact is only s390"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: delete sas port if expander discover failed
scsi: libsas: only clear phy->in_shutdown after shutdown event done
scsi: scsi_dh_alua: Fix possible null-ptr-deref
scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
Linus Torvalds [Sun, 2 Jun 2019 15:51:30 +0000 (08:51 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Various fixes and followups"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, compaction: make sure we isolate a valid PFN
include/linux/generic-radix-tree.h: fix kerneldoc comment
kernel/signal.c: trace_signal_deliver when signal_group_exit
drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used
spdxcheck.py: fix directory structures
kasan: initialize tag to 0xff in __kasan_kmalloc
z3fold: fix sheduling while atomic
scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set
mm/gup: continue VM_FAULT_RETRY processing even for pre-faults
ocfs2: fix error path kobject memory leak
memcg: make it work on sparse non-0-node systems
mm, memcg: consider subtrees in memory.events
prctl_set_mm: downgrade mmap_sem to read lock
prctl_set_mm: refactor checks from validate_prctl_map
kernel/fork.c: make max_threads symbol static
arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes
arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAK
mm/vmalloc.c: fix typo in comment
lib/sort.c: fix kernel-doc notation warnings
mm: fix Documentation/vm/hmm.rst Sphinx warnings
Suzuki K Poulose [Sat, 1 Jun 2019 05:30:59 +0000 (22:30 -0700)]
mm, compaction: make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).
Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page. This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008 [47/1825]
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp =
0000000082f94ae9
[
0000000000000008] pgd=
0000000000000000
Internal error: Oops:
96000004 [#1] SMP
...
CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
pstate:
60000005 (nZCv daif -PAN -UAO)
pc : set_pfnblock_flags_mask+0x58/0xe8
lr : compaction_alloc+0x300/0x950
[...]
Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
Call trace:
set_pfnblock_flags_mask+0x58/0xe8
compaction_alloc+0x300/0x950
migrate_pages+0x1a4/0xbb0
compact_zone+0x750/0xde8
compact_zone_order+0xd8/0x118
try_to_compact_pages+0xb4/0x290
__alloc_pages_direct_compact+0x84/0x1e0
__alloc_pages_nodemask+0x5e0/0xe18
alloc_pages_vma+0x1cc/0x210
do_huge_pmd_anonymous_page+0x108/0x7c8
__handle_mm_fault+0xdd4/0x1190
handle_mm_fault+0x114/0x1c0
__get_user_pages+0x198/0x3c0
get_user_pages_unlocked+0xb4/0x1d8
__gfn_to_pfn_memslot+0x12c/0x3b8
gfn_to_pfn_prot+0x4c/0x60
kvm_handle_guest_abort+0x4b0/0xcd8
handle_exit+0x140/0x1b8
kvm_arch_vcpu_ioctl_run+0x260/0x768
kvm_vcpu_ioctl+0x490/0x898
do_vfs_ioctl+0xc4/0x898
ksys_ioctl+0x8c/0xa0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common+0x74/0x118
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
Code:
f8607840 f100001f 8b011401 9a801020 (
f9400400)
---[ end trace
af6a35219325a9b6 ]---
The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.
This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jonathan Corbet [Sat, 1 Jun 2019 05:30:55 +0000 (22:30 -0700)]
include/linux/generic-radix-tree.h: fix kerneldoc comment
The DOC comment block section in include/linux/generic-radix-tree.h
contained a spurious colon, causing this warning in the documentation
build:
include/linux/generic-radix-tree.h:1: warning: no structured comments found
Remove the colon and make the docs build happy.
Link: http://lkml.kernel.org/r/20190524141933.74ae9050@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenliang Wei [Sat, 1 Jun 2019 05:30:52 +0000 (22:30 -0700)]
kernel/signal.c: trace_signal_deliver when signal_group_exit
In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.
Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.
Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com>
Reviewed-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ivan Delalande <colona@arista.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Sat, 1 Jun 2019 05:30:49 +0000 (22:30 -0700)]
drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used
Commit
cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".
drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
struct intel_iommu *iommu;
Fixed the warning by appending a compiler attribute __maybe_unused for it.
Link: http://lkml.kernel.org/r/20190523013314.2732-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincenzo Frascino [Sat, 1 Jun 2019 05:30:45 +0000 (22:30 -0700)]
spdxcheck.py: fix directory structures
The LICENSE directory has recently changed structure and this makes
spdxcheck fails as per below:
FAIL: "Blob or Tree named 'other' not found"
Traceback (most recent call last):
File "scripts/spdxcheck.py", line 240, in <module>
spdx = read_spdxdata(repo)
File "scripts/spdxcheck.py", line 41, in read_spdxdata
for el in lictree[d].traverse():
[...]
KeyError: "Blob or Tree named 'other' not found"
Fix the script to restore the correctness on checkpatch License checking.
References:
62be257e986d ("LICENSES: Rename other to deprecated")
References:
8ea8814fcdcb ("LICENSES: Clearly mark dual license only licenses")
Link: http://lkml.kernel.org/r/20190523084755.56739-1-vincenzo.frascino@arm.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nathan Chancellor [Sat, 1 Jun 2019 05:30:42 +0000 (22:30 -0700)]
kasan: initialize tag to 0xff in __kasan_kmalloc
When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
warns:
mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
used here [-Wuninitialized]
kasan_unpoison_shadow(set_tag(object, tag), size);
^~~
set_tag ignores tag in this configuration but clang doesn't realize it at
this point in its pipeline, as it points to arch_kasan_set_tag as being
the point where it is used, which will later be expanded to (void
*)(object) without a use of tag. Initialize tag to 0xff, as it removes
this warning and doesn't change the meaning of the code.
Link: https://github.com/ClangBuiltLinux/linux/issues/465
Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaly Wool [Sat, 1 Jun 2019 05:30:39 +0000 (22:30 -0700)]
z3fold: fix sheduling while atomic
kmem_cache_alloc() may be called from z3fold_alloc() in atomic context, so
we need to pass correct gfp flags to avoid "scheduling while atomic" bug.
Link: http://lkml.kernel.org/r/20190523153245.119dfeed55927e8755250ddd@gmail.com
Fixes: 7c2b8baa61fe5 ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fabiano Rosas [Sat, 1 Jun 2019 05:30:36 +0000 (22:30 -0700)]
scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set
CLK_GET_RATE_NOCACHE depends on CONFIG_COMMON_CLK. Importing constants.py
when CONFIG_COMMON_CLK is not defined causes:
(gdb) lx-symbols
(...)
File "scripts/gdb/linux/proc.py", line 15, in <module>
from linux import constants
File "scripts/gdb/linux/constants.py", line 2, in <module>
LX_CLK_GET_RATE_NOCACHE = gdb.parse_and_eval("CLK_GET_RATE_NOCACHE")
gdb.error: No symbol "CLK_GET_RATE_NOCACHE" in current context.
Link: http://lkml.kernel.org/r/20190523195313.24701-1-farosas@linux.ibm.com
Fixes: e7e6f462c1be ("scripts/gdb: print cached rate in lx-clk-summary")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: Leonard Crestez <leonard.crestez@nxp.com>
Cc: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Sat, 1 Jun 2019 05:30:33 +0000 (22:30 -0700)]
mm/gup: continue VM_FAULT_RETRY processing even for pre-faults
When get_user_pages*() is called with pages = NULL, the processing of
VM_FAULT_RETRY terminates early without actually retrying to fault-in all
the pages.
If the pages in the requested range belong to a VMA that has userfaultfd
registered, handle_userfault() returns VM_FAULT_RETRY *after* user space
has populated the page, but for the gup pre-fault case there's no actual
retry and the caller will get no pages although they are present.
This issue was uncovered when running post-copy memory restore in CRIU
after
d9c9ce34ed5c ("x86/fpu: Fault-in user stack if
copy_fpstate_to_sigframe() fails").
After this change, the copying of FPU state to the sigframe switched from
copy_to_user() variants which caused a real page fault to get_user_pages()
with pages parameter set to NULL.
In post-copy mode of CRIU, the destination memory is managed with
userfaultfd and lack of the retry for pre-fault case in get_user_pages()
causes a crash of the restored process.
Making the pre-fault behavior of get_user_pages() the same as the "normal"
one fixes the issue.
Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com
Fixes: d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Andrei Vagin <avagin@gmail.com> [https://travis-ci.org/avagin/linux/builds/533184940]
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobin C. Harding [Sat, 1 Jun 2019 05:30:29 +0000 (22:30 -0700)]
ocfs2: fix error path kobject memory leak
If a call to kobject_init_and_add() fails we should call kobject_put()
otherwise we leak memory.
Add call to kobject_put() in the error path of call to
kobject_init_and_add(). Please note, this has the side effect that the
release method is called if kobject_init_and_add() fails.
Link: http://lkml.kernel.org/r/20190513033458.2824-1-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Sat, 1 Jun 2019 05:30:26 +0000 (22:30 -0700)]
memcg: make it work on sparse non-0-node systems
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase
0000000000000000 Limit
00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Down [Sat, 1 Jun 2019 05:30:22 +0000 (22:30 -0700)]
mm, memcg: consider subtrees in memory.events
memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.
The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files. For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour. The same confusion applies to other counters in this file when
debugging issues.
Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.
After this patch, events are propagated up the hierarchy:
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 0
oom 0
oom_kill 0
[root@ktst ~]# systemd-run -p MemoryMax=1 true
Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 7
oom 1
oom_kill 1
As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set. However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.
akpm: this is a behaviour change, so Cc:stable. THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.
Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
Signed-off-by: Chris Down <chris@chrisdown.name>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Koutný [Sat, 1 Jun 2019 05:30:19 +0000 (22:30 -0700)]
prctl_set_mm: downgrade mmap_sem to read lock
The commit
a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit
88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided
the coarse use of mmap_sem in similar situations. But there still
remained two places that (mis)use mmap_sem.
get_cmdline should also use arg_lock instead of mmap_sem when it reads the
boundaries.
The second place that should use arg_lock is in prctl_set_mm. By
protecting the boundaries fields with the arg_lock, we can downgrade
mmap_sem to reader lock (analogous to what we already do in
prctl_set_mm_map).
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com
Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct")
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Co-developed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>