Frederic Weisbecker [Fri, 23 Mar 2012 18:06:50 +0000 (19:06 +0100)]
perf tools: Fix display of first level of callchains
The callchain stdio mode display was written using a sorted by symbol
report. In this mode we have only one callchain root per hist so we
forgot to handle cases where we have multiple callchain root, as in per
dso sorting for example.
Fix this by handling these roots like any other branch, with the hist as
the parent.
Before:
1.97% libpthread-2.12.1.so
|
--- __libc_write
create_worker
bench_sched_messaging
cmd_bench
run_builtin
main
__libc_start_main
|
--- __libc_read
create_worker
bench_sched_messaging
cmd_bench
run_builtin
main
__libc_start_main
After:
1.97% libpthread-2.12.1.so
|
|--36.97%-- __libc_write
| create_worker
| bench_sched_messaging
| cmd_bench
| run_builtin
| main
| __libc_start_main
|
|--31.47%-- __libc_read
| create_worker
| bench_sched_messaging
| cmd_bench
| run_builtin
| main
| __libc_start_main
...
Single roots keep their entry without percentage because they have
the same overhead than the hist they refer to. ie: 100% in fractal
mode and the percentage of the hist in graph mode:
0.00% [k] reschedule_interrupt
|
--- default_idle
amd_e400_idle
cpu_idle
start_secondary
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1332526010-15400-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 23 Mar 2012 14:41:20 +0000 (15:41 +0100)]
perf: Move mmap page data_head offset assertion out of header
Having the build time assertion in header is making the perf
build fail on x86 with:
../../include/linux/perf_event.h:411:32: error: variably modified \
‘__assert_mmap_data_head_offset’ at file scope [-Werror]
I'm moving the build time validation out of the header, because
I think it's better than to lessen the perf build warn/error
check.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: acme@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: paulus@samba.org
Cc: cjashfor@linux.vnet.ibm.com
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/r/1332513680-7870-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 24 Mar 2012 07:19:09 +0000 (08:19 +0100)]
Merge branch 'tip/perf/urgent' of git://git./linux/kernel/git/rostedt/linux-trace into perf/urgent
Peter Zijlstra [Thu, 22 Mar 2012 16:26:36 +0000 (17:26 +0100)]
perf: Fix mmap_page capabilities and docs
Complete the syscall-less self-profiling feature and address
all complaints, namely:
- capabilities, so we can detect what is actually available at runtime
Add a capabilities field to perf_event_mmap_page to indicate
what is actually available for use.
- on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending
properly.
- ABI documentation as to how all this stuff works.
Also improve the documentation for the new features.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 23 Mar 2012 08:16:03 +0000 (09:16 +0100)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Cleanups and fixes for perf/core:
. Short term fix for 'diff' tool breakage related to perf.data files
with multiple events. From Jiri Olsa
. Cleanup for event id tracepoint reading routine, from Borislav Petkov
. 32-bit compilation fixes from Jiri Olsa
. Event parsing modifier assignment fixes from Jiri Olsa
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Thu, 22 Mar 2012 13:37:26 +0000 (14:37 +0100)]
perf diff: Fix to work with new hists design
The perf diff command is broken since:
perf hists: Threaded addition and sorting of entries
commit
1980c2ebd7020d82c024b8c4046849b38e78e7da
Several places were broken:
- hists data need to be collected into opened sessions instead
of into events
- session's hists data need to be initialized properly when the
session is created
- hist_entry__pcnt_snprintf: the percentage and displacement
buffer preparation must not use 'ret' because it's used
as a pointer to the final buffer
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120322133726.GB1601@m.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 20 Mar 2012 18:15:40 +0000 (19:15 +0100)]
perf tools: Fix modifier to be applied on correct events
The event modifier needs to be applied only on the event definition it
is attached to.
The current state is that in case of multiple events definition (in
single '-e' option, separated by ',') all will get modifier of the last
one.
Fixing this by adding separated list for each event definition, so the
modifier is applied only to proper event(s). Added automated test to
catch this, plus some other modifier tests.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332267341-26338-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Tue, 20 Mar 2012 18:15:39 +0000 (19:15 +0100)]
perf tools: Fix various casting issues for 32 bits
- util/parse-events.c(parse_events_add_breakpoint)
need to use unsigned long instead of u64, otherwise
we get following gcc error on 32 bits:
error: cast from pointer to integer of different size
- util/header.c(print_event_desc)
cannot retype to signed type, otherwise we get following
gcc error on 32 bits:
error: comparison between signed and unsigned integer expressions
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332267341-26338-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Borislav Petkov [Wed, 21 Mar 2012 14:15:47 +0000 (15:15 +0100)]
perf tools: Simplify event_read_id exit path
We're freeing the token in any case so simplify the exit path by
unifying it.
No functional change.
Signed-off-by: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/1332339347-21342-1-git-send-email-bp@amd64.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Thu, 22 Mar 2012 18:09:08 +0000 (15:09 -0300)]
Merge branch 'perf/urgent' into perf/core
Merge Reason: to pick the fix:
commit
e7f01d1
perf tools: Use scnprintf where applicable
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wolfgang Mauerer [Thu, 22 Mar 2012 10:18:20 +0000 (11:18 +0100)]
tracing: Fix ftrace stack trace entries
8 hex characters tell only half the tale for 64 bit CPUs,
so use the appropriate length.
Link: http://lkml.kernel.org/r/1332411501-8059-2-git-send-email-wolfgang.mauerer@siemens.com
Cc: stable@vger.kernel.org
Signed-off-by: Wolfgang Mauerer <wolfgang.mauerer@siemens.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Ingo Molnar [Thu, 22 Mar 2012 08:17:57 +0000 (09:17 +0100)]
Merge branch 'tip/perf/core' of git://git./linux/kernel/git/rostedt/linux-trace into perf/urgent
Dan Carpenter [Tue, 20 Mar 2012 16:58:06 +0000 (16:58 +0000)]
AFS: checking wrong bit in afs_readpages()
We should be testing "if (vnode->flags & (1 << 4))" instead of
"if (vnode->flags & 4) {". The current test checks if the data was
modified instead of deleted.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 20 Mar 2012 17:32:09 +0000 (10:32 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer changes for v3.4 from Ingo Molnar
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
ntp: Fix integer overflow when setting time
math: Introduce div64_long
cs5535-clockevt: Allow the MFGPT IRQ to be shared
cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
x86/time: Eliminate unused irq0_irqs counter
clocksource: scx200_hrt: Fix the build
x86/tsc: Reduce the TSC sync check time for core-siblings
timer: Fix bad idle check on irq entry
nohz: Remove ts->Einidle checks before restarting the tick
nohz: Remove update_ts_time_stat from tick_nohz_start_idle
clockevents: Leave the broadcast device in shutdown mode when not needed
clocksource: Load the ACPI PM clocksource asynchronously
clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
clocksource: Get rid of clocksource_calc_mult_shift()
clocksource: dbx500: convert to clocksource_register_hz()
clocksource: scx200_hrt: use pr_<level> instead of printk
time: Move common updates to a function
time: Reorder so the hot data is together
time: Remove most of xtime_lock usage in timekeeping.c
ntp: Add ntp_lock to replace xtime_locking
...
Linus Torvalds [Tue, 20 Mar 2012 17:31:44 +0000 (10:31 -0700)]
Merge branch 'sched-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler changes for v3.4 from Ingo Molnar
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
printk: Make it compile with !CONFIG_PRINTK
sched/x86: Fix overflow in cyc2ns_offset
sched: Fix nohz load accounting -- again!
sched: Update yield() docs
printk/sched: Introduce special printk_sched() for those awkward moments
sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
sched: Cleanup cpu_active madness
sched: Fix load-balance wreckage
sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
sched: Ditch per cgroup task lists for load-balancing
sched: Rename load-balancing fields
sched: Move load-balancing arguments into helper struct
sched/rt: Do not submit new work when PI-blocked
sched/rt: Prevent idle task boosting
sched/wait: Add __wake_up_all_locked() API
sched/rt: Document scheduler related skip-resched-check sites
sched/rt: Use schedule_preempt_disabled()
sched/rt: Add schedule_preempt_disabled()
sched/rt: Do not throttle when PI boosting
sched/rt: Keep period timer ticking when rt throttling is active
...
Linus Torvalds [Tue, 20 Mar 2012 17:29:15 +0000 (10:29 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
...
static_key_slow_inc();
...
static_key_slow_inc();
...
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
Linus Torvalds [Tue, 20 Mar 2012 17:28:56 +0000 (10:28 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq/core changes for v3.4 from Ingo Molnar
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Remove paranoid warnons and bogus fixups
genirq: Flush the irq thread on synchronization
genirq: Get rid of unnecessary IRQTF_DIED flag
genirq: No need to check IRQTF_DIED before stopping a thread handler
genirq: Get rid of unnecessary irqaction field in task_struct
genirq: Fix incorrect check for forced IRQ thread handler
softirq: Reduce invoke_softirq() code duplication
genirq: Fix long-term regression in genirq irq_set_irq_type() handling
x86-32/irq: Don't switch to irq stack for a user-mode irq
Linus Torvalds [Tue, 20 Mar 2012 00:12:34 +0000 (17:12 -0700)]
Merge branch 'core-rcu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU changes for v3.4 from Ingo Molnar. The major features of this
series are:
- making RCU more aggressive about entering dyntick-idle mode in order
to improve energy efficiency
- converting a few more call_rcu()s to kfree_rcu()s
- applying a number of rcutree fixes and cleanups to rcutiny
- removing CONFIG_SMP #ifdefs from treercu
- allowing RCU CPU stall times to be set via sysfs
- adding CPU-stall capability to rcutorture
- adding more RCU-abuse diagnostics
- updating documentation
- fixing yet more issues located by the still-ongoing top-to-bottom
inspection of RCU, this time with a special focus on the CPU-hotplug
code path.
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rcu: Stop spurious warnings from synchronize_sched_expedited
rcu: Hold off RCU_FAST_NO_HZ after timer posted
rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
rcu: Remove redundant check for rcu_head misalignment
PTR_ERR should be called before its argument is cleared.
rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
rcu: Trace only after NULL-pointer check
rcu: Call out dangers of expedited RCU primitives
rcu: Rework detection of use of RCU by offline CPUs
lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
rcu: No interrupt disabling for rcu_prepare_for_idle()
rcu: Move synchronize_sched_expedited() to rcutree.c
rcu: Check for illegal use of RCU from offlined CPUs
rcu: Update stall-warning documentation
rcu: Add CPU-stall capability to rcutorture
rcu: Make documentation give more realistic rcutorture duration
rcutorture: Permit holding off CPU-hotplug operations during boot
rcu: Print scheduling-clock information on RCU CPU stall-warning messages
...
Steven Rostedt [Tue, 20 Mar 2012 16:28:29 +0000 (12:28 -0400)]
tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
The tracing_on/off() declarations were under CONFIG_RING_BUFFER, but
the functions are now only defined under CONFIG_TRACING as they are
specific to ftrace and not the ring buffer.
But the declarations were still defined under the ring buffer and
this caused the build to fail when CONFIG_RING_BUFFER was set but
CONFIG_TRACING was not.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Tue, 20 Mar 2012 00:11:15 +0000 (17:11 -0700)]
Merge branch 'core-locking-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core/locking changes for v3.4 from Ingo Molnar
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Simplify return logic
futex: Cover all PI opcodes with cmpxchg enabled check
Linus Torvalds [Tue, 20 Mar 2012 00:10:38 +0000 (17:10 -0700)]
Merge branch 'core-iommu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core/iommu changes for v3.4 from Ingo Molnar
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu/intel: Increase the number of iommus supported to MAX_IO_APICS
x86/iommu/intel: Fix identity mapping for sandy bridge
Linus Torvalds [Mon, 19 Mar 2012 23:37:28 +0000 (16:37 -0700)]
Merge branch 'dcache-word-accesses'
* branch 'dcache-word-accesses':
vfs: use 'unsigned long' accesses for dcache name comparison and hashing
This does the name hashing and lookup using word-sized accesses when
that is efficient, namely on x86 (although any little-endian machine
with good unaligned accesses would do).
It does very much depend on little-endian logic, but it's a very hot
couple of functions under some real loads, and this patch improves the
performance of __d_lookup_rcu() and link_path_walk() by up to about 30%.
Giving a 10% improvement on some very pathname-heavy benchmarks.
Because we do make unaligned accesses past the filename, the
optimization is disabled when CONFIG_DEBUG_PAGEALLOC is active, and we
effectively depend on the fact that on x86 we don't really ever have the
last page of usable RAM followed immediately by any IO memory (due to
ACPI tables, BIOS buffer areas etc).
Some of the bit operations we do are a bit "subtle". It's commented,
but you do need to really think about the code. Or just consider it
black magic.
Thanks to people on G+ for some of the optimized bit tricks.
Linus Torvalds [Mon, 19 Mar 2012 23:19:53 +0000 (16:19 -0700)]
vfs: get rid of batshit-insane pointless dentry hash calculations
For some odd historical reason, the final mixing round for the dentry
cache hash table lookup had an insane "xor with big constant" logic. In
two places.
The big constant that is being xor'ed is GOLDEN_RATIO_PRIME, which is a
fairly random-looking number that is designed to be *multiplied* with so
that the bits get spread out over a whole long-word.
But xor'ing with it is insane. It doesn't really even change the hash -
it really only shifts the hash around in the hash table. To make
matters worse, the insane big constant is different on 32-bit and 64-bit
builds, even though the name hash bits we use are always 32-bit (and the
bits from the pointer we mix in effectively are too).
It's all total voodoo programming, in other words.
Now, some testing and analysis of the hash chains shows that the rest of
the hash function seems to be fairly good. It does pick the right bits
of the parent dentry pointer, for example, and while it's generally a
bad idea to use an xor to mix down the upper bits (because if there is a
repeating pattern, the xor can cause "destructive interference"), it
seems to not have been a disaster.
For example, replacing the hash with the normal "hash_long()" code (that
uses the GOLDEN_RATIO_PRIME constant correctly, btw) actually just makes
the hash worse. The hand-picked hash knew which bits of the pointer had
the highest entropy, and hash_long() ends up mixing bits less optimally
at least in some trivial tests.
So the hash function overall seems fine, it just has that really odd
"shift result around by a constant xor".
So get rid of the silly xor, and replace the down-mixing of the bits
with an add instead of an xor that tends to not have the same kind of
destructive interference issues. Some stats on the resulting hash
chains shows that they look statistically identical before and after,
but the code is simpler and no longer makes you go "WTF?".
Also, the incoming hash really is just "unsigned int", not a long, and
there's no real point to worry about the high 26 bits of the dentry
pointer for the 64-bit case, because they are all going to be identical
anyway.
So also change the hashing to be done in the more natural 'unsigned int'
that is the real size of the actual hashed data anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Mon, 19 Mar 2012 19:14:46 +0000 (20:14 +0100)]
Merge tag 'perf-core-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/core
Fixes for the last batch from Namhyung and the initial GTK report browser
from Pekka.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pekka Enberg [Mon, 19 Mar 2012 18:13:29 +0000 (15:13 -0300)]
perf report: Add a simple GTK2-based 'perf report' browser
This patch adds a simple GTK2-based browser to 'perf report' that's
based on the TTY-based browser in builtin-report.c.
To launch "perf report" using the new GTK interface just type:
$ perf report --gtk
The interface is somewhat limited in features at the moment:
- No callgraph support
- No KVM guest profiling support
- No color coding for percentages
- No sorting from the UI
- ..and many, many more!
That said, I think this patch a reasonable start to build future features on.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cc: Colin Walters <walters@verbum.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202231952410.6689@tux.localdomain
[ committer note: Added #pragma to make gtk no strict prototype problem go
away as suggested by Colin Walters modulo avoiding push/pop ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 19 Mar 2012 02:53:48 +0000 (11:53 +0900)]
perf report: Document --symbol-filter option
Add missing description of --symbol-filter in Documentation/perf-report.txt.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332125628-23088-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 19 Mar 2012 02:46:20 +0000 (11:46 +0900)]
perf ui browser: Clean lines inside of the input window
As Arnaldo pointed out, it should be cleared to prevent the window from
displaying overlapped strings on the region.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1332125180-23041-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 18 Mar 2012 23:15:34 +0000 (16:15 -0700)]
Linux 3.3
Jason Baron [Fri, 16 Mar 2012 20:34:03 +0000 (16:34 -0400)]
Don't limit non-nested epoll paths
Commit
28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).
The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.
This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 18 Mar 2012 02:22:24 +0000 (19:22 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking changes from David Miller:
"1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
crashes, particularly during shutdown. Reported by Dave Jones and
fixed by Eric Dumazet.
2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
already freed the SKB, which causes crashes as to the caller this
means requeue the packet. Fixes from Eric Dumazet.
3) usbnet driver doesn't allocate the right amount of headroom on
fresh RX SKBs, fix from Eric Dumazet.
4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
abolutely should not take a reference to 'dev', this leads to
leaks. Fix from RonQing Li.
5) Fix netfilter ctnetlink race between delete and timeout expiration.
From Pablo Neira Ayuso.
6) Revert SFQ change which causes regressions, specifically queueing
to tail can lead to unavoidable flow starvation. From Eric
Dumazet.
7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
from Michal Schmidt."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: ctnetlink: fix race between delete and timeout expiration
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
net/hyperv: fix erroneous NETDEV_TX_BUSY use
net/usbnet: reserve headroom on rx skbs
bnx2x: fix memory leak in bnx2x_init_firmware()
bnx2x: fix a crash on corrupt firmware file
sch_sfq: revert dont put new flow at the end of flows
ipv6: fix icmp6_dst_alloc()
Linus Torvalds [Sat, 17 Mar 2012 16:54:16 +0000 (09:54 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools, x86: Build perf on older user-space as well
perf tools: Use scnprintf where applicable
perf tools: Incorrect use of snprintf results in SEGV
Pablo Neira Ayuso [Fri, 16 Mar 2012 02:00:34 +0000 (02:00 +0000)]
netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.
After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Thu, 15 Mar 2012 22:54:14 +0000 (22:54 +0000)]
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in
96b52e61be1 (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 17 Mar 2012 00:14:55 +0000 (17:14 -0700)]
Merge branch 'akpm' (more patches from Andrew)
Merge some more email patches from Andrew Morton:
"A couple of nilfs fixes"
* emailed from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
nilfs2: clamp ns_r_segments_percentage to [1, 99]
Ryusuke Konishi [Sat, 17 Mar 2012 00:08:39 +0000 (17:08 -0700)]
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:
BUG: unable to handle kernel NULL pointer dereference at
00000048
IP: [<
d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
*pde =
00000000
Oops: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<
d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
[<
d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
[<
c0226636>] mount_fs+0x36/0x180
[<
c023d961>] vfs_kern_mount+0x51/0xa0
[<
c023ddae>] do_kern_mount+0x3e/0xe0
[<
c023f189>] do_mount+0x169/0x700
[<
c023fa9b>] sys_mount+0x6b/0xa0
[<
c04abd1f>] sysenter_do_call+0x12/0x28
Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
EIP: [<
d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:
ca9bbdcc
CR2:
0000000000000048
This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.
This patch fixes it and eliminates the reported oops.
Reported-by: Slicky Devil <slicky.dvl@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Slicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.30+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haogang Chen [Sat, 17 Mar 2012 00:08:38 +0000 (17:08 -0700)]
nilfs2: clamp ns_r_segments_percentage to [1, 99]
ns_r_segments_percentage is read from the disk. Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation. This patch reports error when mounting such
bogus volumes.
Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 17 Mar 2012 00:04:02 +0000 (17:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security
Pull maintainer update from James Morris:
"Please pull this patch which adds Serge as maintainer of the
capabilities code, as discussed on lwn and the lsm list.
New capabilities must be signed off by the maintainer, and new uses of
any capabilities should at be cc'd to the maintainer."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
MAINTAINERS: Add Serge as maintainer of capabilities
Linus Torvalds [Sat, 17 Mar 2012 00:03:15 +0000 (17:03 -0700)]
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x bugfix from Mark Salter:
"Remove dead code from entry.S which causes a build failure when using
a newer assembler (v2.22 complains about it, v2.20 ignores it)."
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove dead code from entry.S
Anton Blanchard [Fri, 16 Mar 2012 10:28:19 +0000 (10:28 +0000)]
afs: Remote abort can cause BUG in rxrpc code
When writing files to afs I sometimes hit a BUG:
kernel BUG at fs/afs/rxrpc.c:179!
With a backtrace of:
afs_free_call
afs_make_call
afs_fs_store_data
afs_vnode_store_data
afs_write_back_from_locked_page
afs_writepages_region
afs_writepages
The cause is:
ASSERT(skb_queue_empty(&call->rx_queue));
Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:
rx abort fs reply store-data error diskquota exceeded (32)
So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.
By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Fri, 16 Mar 2012 10:28:07 +0000 (10:28 +0000)]
afs: Read of file returns EBADMSG
A read of a large file on an afs mount failed:
# cat junk.file > /dev/null
cat: junk.file: Bad message
Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:
_enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...
if (call->offset < count) {
if (last) {
_leave(" = -EBADMSG [%d < %zu]", call->offset, count);
return -EBADMSG;
}
Which matches the trace:
[cat ] ==> afs_extract_data({65132},{524},1,,65536)
[cat ] <== afs_extract_data() = -EBADMSG [0 < 65536]
call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Namhyung Kim [Fri, 16 Mar 2012 08:50:55 +0000 (17:50 +0900)]
perf report: Treat an argument as a symbol filter
As Ingo requested, it'd be better off treating first (and the only)
argument as a symbol filter, so that user doesn't need to input the
symbol on the dialog window on TUI.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-5-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:50:54 +0000 (17:50 +0900)]
perf report: Add --symbol-filter option
Add new --symbol-filter command line option to set appropriate filter
string.
Its short version is missing as I couldn't find an ideal one and
--filter option of perf record also has no short version.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-4-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:50:53 +0000 (17:50 +0900)]
perf ui browser: Add 's' key to filter by symbol name
Now user can enter symbol name interested via ui_browser__input_window,
and perf can process it using hists__filter_by_symbol(). Giving empty
symbol (by pressing 's' followed by ENTER) will disable the filtering.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-3-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:50:52 +0000 (17:50 +0900)]
perf ui browser: Introduce ui_browser__input_window
The ui_browser__input_window() function is to get user's key input.
Current implementation can handle maximum 49 characters.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:50:51 +0000 (17:50 +0900)]
perf hists: Add hists__filter_by_symbol
This function will be used for simple (sub-)string matching filter based
on user input.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887855-874-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:42:20 +0000 (17:42 +0900)]
perf tools: Do not disable members of group event
When event group is enabled for forked task (i.e. no target task/cpu
was specified) all events were disabled and marked ->enable_on_exec.
However they wouldn't be counted at all since only group leader will
be enabled on exec actually.
In contrast to perf stat, perf record doesn't have a real problem
as it enables all the event before proceeding. But it needs to be
fixed anyway IMHO.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887340-32448-2-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 16 Mar 2012 08:42:19 +0000 (17:42 +0900)]
perf stat: Fix event grouping on forked task
When event group is enabled for forked task (i.e. no target task was
specified) all events were disabled and marked ->enable_on_exec.
However they are not counted at all since only group leader will be
enabled on exec actually. So the result looked like below:
$ ./perf stat --group -- sleep 1
Performance counter stats for 'sleep 1':
0.554926 task-clock # 0.001 CPUs utilized
<not counted> context-switches
<not counted> CPU-migrations
<not counted> page-faults
<not counted> cycles
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
1.
001228093 seconds time elapsed
Fix it by disabling group leader only.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331887340-32448-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 15 Mar 2012 19:09:18 +0000 (20:09 +0100)]
perf tools: Add support to specify pmu style event
Added new event rule to the event definition grammar:
event_def: event_pmu |
...
event_pmu: PE_NAME '/' event_config '/'
Using this rule, event could be now specified like:
cpu/config=1,config1=2,config2=3/u
where pmu name 'cpu' is looked up via following path:
${sysfs_mount}/bus/event_source/devices/${pmu}
and config options are bound to the pmu's format definiton:
${sysfs_mount}/bus/event_source/devices/${pmu}/format
The hardcoded config options still stays and have precedence
over any format field defined with same name.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-50d8nr94f8k4wkezutrxvthe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 15 Mar 2012 19:09:17 +0000 (20:09 +0100)]
perf tools: Add perf pmu object to access pmu format definition
Adding pmu object which provides interface to pmu's sysfs
event format definition located at:
${sysfs_mount}/bus/event_source/devices/${pmu}/format
Following interface is exported:
struct perf_pmu* perf_pmu__find(char *name);
- this function returns pmu object, which is then
passed as a handle to other interface functions
int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
struct list_head *head_terms);
- this function configures perf_event_attr struct based
on pmu's format definitions and config terms data,
containined in head_terms list.
Parser generator is used to retrive the pmu's format definition.
The generated parser is part of the patch. Added makefile rule
'pmu-parser' to generate the parser code out of the bison/flex
sources.
Added builtin test 'Test perf pmu format parsing', which could
be run like:
perf test pmu
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-errz96u1668gj9wlop1zhpht@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 15 Mar 2012 19:09:16 +0000 (20:09 +0100)]
perf tools: Add config options support for event parsing
Adding a new rule to the event grammar to be able to specify
values of additional attributes of symbolic event.
The new syntax for event symbolic definition is:
event_legacy_symbol: PE_NAME_SYM '/' event_config '/' |
PE_NAME_SYM sep_slash_dc
event_config: event_config ',' event_term | event_term
event_term: PE_NAME '=' PE_NAME |
PE_NAME '=' PE_VALUE
PE_NAME
sep_slash_dc: '/' | ':' |
At the moment the config options are hardcoded to be used for legacy
symbol events to define several perf_event_attr fields. It is:
'config' to define perf_event_attr::config
'config1' to define perf_event_attr::config1
'config2' to define perf_event_attr::config2
'period' to define perf_event_attr::sample_period
Legacy events could be now specified as:
cycles/period=100000/
If term is specified without the value assignment, then 1 is
assigned by default.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-mgkavww9790jbt2jdkooyv4q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 15 Mar 2012 19:09:15 +0000 (20:09 +0100)]
perf tools: Add parser generator for events parsing
Changing event parsing to use flex/bison parse generator.
The event syntax stays as it was.
grammar description:
events: events ',' event | event
event: event_def PE_MODIFIER_EVENT | event_def
event_def: event_legacy_symbol sep_dc |
event_legacy_cache sep_dc |
event_legacy_breakpoint sep_dc |
event_legacy_tracepoint sep_dc |
event_legacy_numeric sep_dc |
event_legacy_raw sep_dc
event_legacy_symbol: PE_NAME_SYM
event_legacy_cache: PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT |
PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT |
PE_NAME_CACHE_TYPE
event_legacy_raw: PE_SEP_RAW PE_VALUE
event_legacy_numeric: PE_VALUE ':' PE_VALUE
event_legacy_breakpoint: PE_SEP_BP ':' PE_VALUE ':' PE_MODIFIER_BP
event_breakpoint_type: PE_MODIFIER_BPTYPE | empty
PE_NAME_SYM: cpu-cycles|cycles |
stalled-cycles-frontend|idle-cycles-frontend |
stalled-cycles-backend|idle-cycles-backend |
instructions |
cache-references |
cache-misses |
branch-instructions|branches |
branch-misses |
bus-cycles |
cpu-clock |
task-clock |
page-faults|faults |
minor-faults |
major-faults |
context-switches|cs |
cpu-migrations|migrations |
alignment-faults |
emulation-faults
PE_NAME_CACHE_TYPE: L1-dcache|l1-d|l1d|L1-data |
L1-icache|l1-i|l1i|L1-instruction |
LLC|L2 |
dTLB|d-tlb|Data-TLB |
iTLB|i-tlb|Instruction-TLB |
branch|branches|bpu|btb|bpc |
node
PE_NAME_CACHE_OP_RESULT: load|loads|read |
store|stores|write |
prefetch|prefetches |
speculative-read|speculative-load |
refs|Reference|ops|access |
misses|miss
PE_MODIFIER_EVENT: [ukhp]{0,5}
PE_MODIFIER_BP: [rwx]
PE_SEP_BP: 'mem'
PE_SEP_RAW: 'r'
sep_dc: ':' |
Added flex/bison files for event grammar parsing. The generated
parser is part of the patch. Added makefile rule 'event-parser'
to generate the parser code out of the bison/flex sources.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-u4pfig5waq3ll2bfcdex8fgi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 15 Mar 2012 19:09:14 +0000 (20:09 +0100)]
perf: Adding sysfs group format attribute for pmu device
Adding sysfs group 'format' attribute for pmu device that
contains a syntax description on how to construct raw events.
The event configuration is described in following
struct pefr_event_attr attributes:
config
config1
config2
Each sysfs attribute within the format attribute group,
describes mapping of name and bitfield definition within
one of above attributes.
eg:
"/sys/...<dev>/format/event" contains "config:0-7"
"/sys/...<dev>/format/umask" contains "config:8-15"
"/sys/...<dev>/format/usr" contains "config:16"
the attribute value syntax is:
line: config ':' bits
config: 'config' | 'config1' | 'config2"
bits: bits ',' bit_term | bit_term
bit_term: VALUE '-' VALUE | VALUE
Adding format attribute definitions for x86 cpu pmus.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/n/tip-vhdk5y2hyype9j63prymty36@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mark Salter [Fri, 16 Mar 2012 13:27:57 +0000 (09:27 -0400)]
C6X: remove dead code from entry.S
The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is
outside of the conditional block with the matching ENTRY() macro. This
leads a newer (v2.22 vs. v2.20) assembler to complain:
/tmp/ccGZBaPT.s: Assembler messages:
/tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant
The conditional block became dead code when c6x switched to generic
unistd.h and should be removed along with the offending ENDPROC().
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Thomas Gleixner [Thu, 15 Mar 2012 21:55:21 +0000 (22:55 +0100)]
genirq: Remove paranoid warnons and bogus fixups
Alexander pointed out that the warnons in the regular exit path are
bogus and the thread_mask one actually could be triggered when
__setup_irq() hands out that thread_mask again after __free_irq()
dropped irq_desc->lock.
Thinking more about it, neither IRQTF_RUNTHREAD nor the bit in
thread_mask can be set as this is the regular exit path. We come here
due to:
__free_irq()
remove action from desc
synchronize_irq()
kthread_stop()
So synchronize_irq() makes sure that the thread finished running and
cleaned up both the thread_active count and thread_mask. After that
point nothing can set IRQTF_RUNTHREAD on this action. So the warnons
and the cleanups are pointless.
Reported-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/20120315190755.GA6732@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Eric Dumazet [Wed, 14 Mar 2012 09:21:44 +0000 (09:21 +0000)]
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Also increments tx_dropped counter
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 08:53:34 +0000 (08:53 +0000)]
net/hyperv: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
This is mostly a revert of commit
bf769375c (staging: hv: fix the return
status of netvsc_start_xmit())
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 06:56:25 +0000 (06:56 +0000)]
net/usbnet: reserve headroom on rx skbs
network drivers should reserve some headroom on incoming skbs so that we
dont need expensive reallocations, eg forwarding packets in tunnels.
This NET_SKB_PAD padding is done in various helpers, like
__netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and
NET_IP_ALIGN magic.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:29 +0000 (14:08 +0000)]
bnx2x: fix memory leak in bnx2x_init_firmware()
When cycling the interface down and up, bnx2x_init_firmware() knows that
the firmware is already loaded, but nevertheless it allocates certain
arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old
arrays are leaked.
Fix the leaks by returning early if the firmware was already loaded.
Because if the firmware is loaded, so are the arrays.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:28 +0000 (14:08 +0000)]
bnx2x: fix a crash on corrupt firmware file
If the requested firmware is deemed corrupt and then released, reset the
pointer to NULL in order to avoid double-freeing it in
bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 13 Mar 2012 18:04:25 +0000 (18:04 +0000)]
sch_sfq: revert dont put new flow at the end of flows
This reverts commit
d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)
As Jesper found out, patch sounded great but has bad side effects.
In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.
It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.
A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 21:13:11 +0000 (21:13 +0000)]
ipv6: fix icmp6_dst_alloc()
commit
87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()
Many thanks to Dave Jones for providing a very complete bug report.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morris [Fri, 16 Mar 2012 01:05:48 +0000 (12:05 +1100)]
MAINTAINERS: Add Serge as maintainer of capabilities
Add Serge as maintainer of capabilities, per suggestion on LWN:
http://lwn.net/Articles/486306/
Signed-off-by: James Morris <james.l.morris@oracle.com>
Linus Torvalds [Fri, 16 Mar 2012 00:16:22 +0000 (17:16 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
"Nine patches - some bug fixes and some MAINTAINERS fiddling."
* emailed from Andrew Morton <akpm@linux-foundation.org>:
drivers/video/backlight/s6e63m0.c: fix corruption storing gamma mode
MAINTAINERS: add entry for exynos mipi display drivers
MAINTAINERS: fix link to Gustavo Padovans tree
MAINTAINERS: add Johan to Bluetooth maintainers
MAINTAINERS: Gustavo has moved
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
rapidio/tsi721: fix bug in register offset definitions
MAINTAINERS: update ST's Mailing list for SPEAr
memcg: free mem_cgroup by RCU to fix oops
Linus Torvalds [Fri, 16 Mar 2012 00:14:35 +0000 (17:14 -0700)]
Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pull i2c subsystem fixes from Jean Delvare.
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-algo-bit: Fix spurious SCL timeouts under heavy load
i2c-core: Comment says "transmitted" but means "received"
Linus Torvalds [Fri, 16 Mar 2012 00:13:39 +0000 (17:13 -0700)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (zl6100) Enable interval between chip accesses for all chips
hwmon: (w83627ehf) Describe undocumented pwm attributes
hwmon: (w83627ehf) Fix temp2 source for W83627UHG
hwmon: (w83627ehf) Fix memory leak in probe function
hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F
Linus Torvalds [Fri, 16 Mar 2012 00:07:25 +0000 (17:07 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm exynos/intel updates from Dave Airlie:
"Two minor updates from Jesse for Intel SNB fixes, and a few fixes from
Samsung for exynos. The pull req has Alan's commit in it since Intel
based their tree on my tree at that time, but it all seems fine wrt
merging."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm exynos: use drm_fb_helper_set_par directly
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
drm/exynos: fix runtime_pm fimd device state on probe
drm/exynos: use correct 'exynos-drm' name for platform device
drm/i915: support 32 bit BGR formats in sprite planes
drm/i915: fix color order for BGR formats on SNB
drm/gma500: Fix Cedarview boot failures in 3.3-rc
Linus Torvalds [Fri, 16 Mar 2012 00:06:05 +0000 (17:06 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For 4 fixes for 3.3 (all trivial):
- uvc video driver: fixes a division by zero;
- davinci: add module.h to fix compilation;
- smsusb: fix the delivery system setting;
- smsdvb: the get_frontend implementation there is broken.
The smsdvb patch has 127 lines, but it is trivial: instead of
returning a cache of the set_frontend (with is wrong, as it doesn't
have the updated values for the data, and the implementation there is
buggy), it copies the information of the detected DVB parameters from
the smsdvb private structures into the corresponding DVBv5 struct
fields."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] smsdvb: fix get_frontend
[media] smsusb: fix the default delivery system setting
[media] media: davinci: added module.h to resolve unresolved macros
[media] [FOR,v3.3] uvcvideo: Avoid division by 0 in timestamp calculation
Linus Torvalds [Fri, 16 Mar 2012 00:04:56 +0000 (17:04 -0700)]
Merge branch '3.3-urgent' of git://git./linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
"This series addresses two recently reported regression bugs related to
legacy SCSI reservation usage in target core, and iscsi-target
reservation conflict handling.
The second patch in particular addresses possible data-corruption with
SCSI reservations that is specific to iscsi-target fabric LUNs with
multiple client writers. Both patches need to go into v3.2 stable
ASAP, and the branch based on the last target-pending/3.3-rc-fixes
HEAD.
Again, thanks to Martin Svec for his help to identify and address this
regression bug with iscsi-target."
* '3.3-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iscsi-target: Fix reservation conflict -EBUSY response handling bug
target: Fix compatible reservation handling (CRH=1) with legacy RESERVE/RELEASE
Dan Carpenter [Thu, 15 Mar 2012 22:17:12 +0000 (15:17 -0700)]
drivers/video/backlight/s6e63m0.c: fix corruption storing gamma mode
strict_strtoul() writes a long but ->gamma_mode only has space to store an
int, so on 64 bit systems we end up scribbling over ->gamma_table_count as
well. I've changed it to use kstrtouint() instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Donghwa Lee [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add entry for exynos mipi display drivers
I'd like to add Inki Dae, Donghwa Lee and Kyungmin Park as maintainers
who developers for exynos mipi display drivers for
video/driver/exynos/exynos_mipi* and include/video/exynos_mipi*.
Signed-off-by: Donghwa Lee <dh09.lee@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johan Hedberg [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: fix link to Gustavo Padovans tree
Gustavo's tree is called just bluetooth.git and not bluetooth-2.6.git
anymore.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: "Gustavo F. Padovan" <padovan@profusion.mobi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johan Hedberg [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add Johan to Bluetooth maintainers
I've been coordinating Bluetooth patches in my tree for some time and
it's possible I'll do it in the future too, so add myself to the
Bluetooth sections as well as mention my tree there.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: "Gustavo F. Padovan" <padovan@profusion.mobi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gustavo Padovan [Thu, 15 Mar 2012 22:17:10 +0000 (15:17 -0700)]
MAINTAINERS: Gustavo has moved
This is going to be the primary e-mail for kernel development.
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Thu, 15 Mar 2012 22:17:10 +0000 (15:17 -0700)]
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
CAP_SYS_ADMIN is already overloaded left and right, so to have more
fine-grained access control use CAP_SYS_RESOURCE here.
The CAP_SYS_RESOUCE is chosen because this prctl option allows a current
process to adjust some fields of memory map descriptor which rather
represents what the process owns: pointers to code, data, stack
segments, command line, auxiliary vector data and etc.
Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexandre Bounine [Thu, 15 Mar 2012 22:17:09 +0000 (15:17 -0700)]
rapidio/tsi721: fix bug in register offset definitions
Fix indexed register offset definitions that use decimal (wrong) instead
of hexadecimal (correct) notation for indexing multipliers.
Incorrect definitions do not affect Tsi721 driver in its current default
configuration because it uses only IDB queue 0. Loss of inbound
doorbell functionality should be observed if queue other than 0 is used.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@vger.kernel.org> [3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Viresh Kumar [Thu, 15 Mar 2012 22:17:09 +0000 (15:17 -0700)]
MAINTAINERS: update ST's Mailing list for SPEAr
We have created a ST's Mailing list for SPEAr. This can be accessed
from non-st email ids. I want people to cc this list, when they have
changes specific to SPEAr. So, its better to get this updated in
MAINTAINERS file.
linux-arm-kernel@lists.infradead.org is also added for SPEAr.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 15 Mar 2012 22:17:07 +0000 (15:17 -0700)]
memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().
Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.
It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.
But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.
However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work(). The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin [Thu, 15 Mar 2012 16:36:14 +0000 (12:36 -0400)]
ntp: Fix integer overflow when setting time
'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.
Use div64_long() instead.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sasha Levin [Thu, 15 Mar 2012 16:36:13 +0000 (12:36 -0400)]
math: Introduce div64_long
Add a div64_long macro which is used to devide a 64bit number by a long (which
can be 4 bytes on 32bit systems and 8 bytes on 64bit systems).
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-1-git-send-email-levinsasha928@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jan Beulich [Thu, 8 Mar 2012 09:29:28 +0000 (09:29 +0000)]
perf tools: Adjust make rules
Add rules to generate pre-processed files (just like are available for
the normal kernel build), and adjust the rule to create assembly files
from C ones to produce its output in the output directory rather than
in the source tree.
E.g.
$ make -C tools/perf/ O=/tmp/perf /tmp/perf/builtin-top.i
make: Entering directory `/home/git/linux/tools/perf'
CC /tmp/perf/builtin-top.i
make: Leaving directory `/home/git/linux/tools/perf'
$ wc -l /tmp/perf/builtin-top.i
31379 /tmp/perf/builtin-top.i
$
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F588A0802000078000770FF@nat28.tlf.novell.com
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ville Syrjala [Thu, 15 Mar 2012 17:11:05 +0000 (18:11 +0100)]
i2c-algo-bit: Fix spurious SCL timeouts under heavy load
When the system is under heavy load, there can be a significant delay
between the getscl() and time_after() calls inside sclhi(). That delay
may cause the time_after() check to trigger after SCL has gone high,
causing sclhi() to return -ETIMEDOUT.
To fix the problem, double check that SCL is still low after the
timeout has been reached, before deciding to return -ETIMEDOUT.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Wolfram Sang [Thu, 15 Mar 2012 17:11:05 +0000 (18:11 +0100)]
i2c-core: Comment says "transmitted" but means "received"
Fix that. Also convert this and the related comment to proper commenting
style.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Peter Zijlstra [Thu, 15 Mar 2012 11:35:37 +0000 (12:35 +0100)]
printk: Make it compile with !CONFIG_PRINTK
Commit
3ccf3e830615 ("printk/sched: Introduce special
printk_sched() for those awkward moments") overlooked
an #ifdef, so move code around to respect these directives.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Link: http://lkml.kernel.org/r/1331811337.18960.179.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dave Airlie [Thu, 15 Mar 2012 09:41:26 +0000 (09:41 +0000)]
Merge branch 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung into drm-fixes
* 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung:
drm exynos: use drm_fb_helper_set_par directly
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
drm/exynos: fix runtime_pm fimd device state on probe
drm/exynos: use correct 'exynos-drm' name for platform device
Sascha Hauer [Wed, 14 Mar 2012 10:44:54 +0000 (19:44 +0900)]
drm exynos: use drm_fb_helper_set_par directly
info->fix.visual already is correctly set from drm_fb_helper_fill_fix.
info->fix.line_length is also set from drm_fb_helper_fill_fix,
so drm_fb_helper_set_par directly instead of a custom
exynos_drm_fbdev_set_par.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Laurent Pinchart [Fri, 9 Mar 2012 00:45:21 +0000 (09:45 +0900)]
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
The fb_videomode structure stores the front porch and back porch in the
right_margin and left_margin fields respectively. right_margin should
thus be computed with hsync_start - hdisplay, and left_margin with
htotal - hsync_end. The same holds for the vertical direction.
Active Front Sync Back
Region Porch Porch
<-------------------><----------------><-------------><---------------->
//////////////////|
////////////////// |
////////////////// |.................. ..................
_______________
<------ xres -------><- right_margin -><- hsync_len -><- left_margin -->
<---- hdisplay ----->
<------------ hsync_start ------------>
<--------------------- hsync_end -------------------->
<--------------------------------- htotal ----------------------------->
Fix the fb_videomode <-> drm_mode_modeinfo conversion functions
accordingly.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Marek Szyprowski [Thu, 8 Mar 2012 01:28:56 +0000 (10:28 +0900)]
drm/exynos: fix runtime_pm fimd device state on probe
A call to pm_runtime_set_active() forces device to be at the active
state and skips calling its runtime suspend/resume callbacks. This
results in a freeze with a new power domain code based on gen_pd. Fimd
driver does all required runtime power management calls, so this
pm_runtime_set_active call is buggy. This patch removes it and corrects
clock management in probe function (clocks are now enabled by
pm_runtime_get_sync() call).
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Marek Szyprowski [Mon, 5 Mar 2012 11:02:30 +0000 (12:02 +0100)]
drm/exynos: use correct 'exynos-drm' name for platform device
Currently Exynos DRM driver uses DRIVER_NAME ('exynos') name for the
core platform device. This is confusing, because it doesn't refer to the
function the platform device is performing. This patch renames the
platform device to the 'exynos-drm', which matches the convention for
naming the platform devices. The name used inside DRM subsystem has not
been changed.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Linus Torvalds [Thu, 15 Mar 2012 00:16:45 +0000 (17:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Been sitting on this for a while, but lets get this out the door.
This fixes various important bugs for 3.3 final, along with a few more
trivial ones. Please pull!"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix ioc leak in put_io_context
block, sx8: fix pointer math issue getting fw version
Block: use a freezable workqueue for disk-event polling
drivers/block/DAC960: fix -Wuninitialized warning
drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning
block: fix __blkdev_get and add_disk race condition
block: Fix setting bio flags in drivers (sd_dif/floppy)
block: Fix NULL pointer dereference in sd_revalidate_disk
block: exit_io_context() should call elevator_exit_icq_fn()
block: simplify ioc_release_fn()
block: replace icq->changed with icq->flags
Linus Torvalds [Thu, 15 Mar 2012 00:16:02 +0000 (17:16 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Another small batch of driver specific bug fixes, a couple more errors
in the da9052 driver and a bad return value in the tps6524x driver."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: da9052: Ensure the selected voltage falls within the specified range
regulator: Set n_voltages for da9052 regulators
regulator: Fix setting selector in tps6524x set_voltage function
Linus Torvalds [Thu, 15 Mar 2012 00:13:49 +0000 (17:13 -0700)]
Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile update to run "make minconfig" on the tile defconfigs
from Chris Metcalf.
This removes almost three thousand lines of inane defconfig chatter.
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile/configs: convert to minimal configs via "make savedefconfig"
Chris Metcalf [Wed, 14 Mar 2012 18:33:16 +0000 (14:33 -0400)]
arch/tile/configs: convert to minimal configs via "make savedefconfig"
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Dave Airlie [Wed, 14 Mar 2012 18:32:27 +0000 (18:32 +0000)]
Merge branch 'drm-intel-fixes' of git://git./linux/kernel/git/keithp/linux into drm-fixes
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux:
drm/i915: support 32 bit BGR formats in sprite planes
drm/i915: fix color order for BGR formats on SNB
drm/gma500: Fix Cedarview boot failures in 3.3-rc
Ingo Molnar [Wed, 14 Mar 2012 17:49:05 +0000 (18:49 +0100)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent
Some corner case fixes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Guenter Roeck [Tue, 13 Mar 2012 16:05:14 +0000 (09:05 -0700)]
hwmon: (zl6100) Enable interval between chip accesses for all chips
Intersil reports that all chips supported by the zl6100 driver require
an interval between chip accesses, even ZL2004 and ZL6105 which were thought
to be safe.
Reported-by: Vivek Gani <vgani@intersil.com>
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Ingo Molnar [Wed, 14 Mar 2012 15:42:34 +0000 (12:42 -0300)]
perf tools, x86: Build perf on older user-space as well
On ancient systems I get this build failure:
util/../../../arch/x86/include/asm/unistd.h:67:29: error: asm/unistd_64.h: No such file or directory
In file included from util/cache.h:7,
from builtin-test.c:8:
util/../perf.h: In function ‘sys_perf_event_open’:In file included from util/../perf.h:16
perf.h:170: error: ‘__NR_perf_event_open’ undeclared (first use in this function)
The reason is that this old system does not have the split
unistd.h headers yet, from which to pick up the syscall
definitions.
Add the syscall numbers to the already existing i386 and x86_64
blocks in perf.h, and also provide empty include file stubs.
With this patch perf builds and works fine on 5 years old
user-space as well.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-jctwg64le1w47tuaoeyftsg9@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Wed, 14 Mar 2012 15:29:29 +0000 (12:29 -0300)]
perf tools: Use scnprintf where applicable
Several places were expecting that the value returned was the number of
characters printed, not what would be printed if there was space.
Fix it by using the scnprintf and vscnprintf variants we inherited from
the kernel sources.
Some corner cases where the number of printed characters were not
accounted were fixed too.
Reported-by: Anton Blanchard <anton@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/n/tip-kwxo2eh29cxmd8ilixi2005x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anton Blanchard [Wed, 7 Mar 2012 00:42:49 +0000 (11:42 +1100)]
perf tools: Incorrect use of snprintf results in SEGV
I have a workload where perf top scribbles over the stack and we SEGV.
What makes it interesting is that an snprintf is causing this.
The workload is a c++ gem that has method names over 3000 characters
long, but snprintf is designed to avoid overrunning buffers. So what
went wrong?
The problem is we assume snprintf returns the number of characters
written:
ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", self->level);
...
ret += repsep_snprintf(bf + ret, size - ret, "%s", self->ms.sym->name);
Unfortunately this is not how snprintf works. snprintf returns the
number of characters that would have been written if there was enough
space. In the above case, if the first snprintf returns a value larger
than size, we pass a negative size into the second snprintf and happily
scribble over the stack. If you have 3000 character c++ methods thats a
lot of stack to trample.
This patch fixes repsep_snprintf by clamping the value at size - 1 which
is the maximum snprintf can write before adding the NULL terminator.
I get the sinking feeling that there are a lot of other uses of snprintf
that have this same bug, we should audit them all.
Cc: David Ahern <dsahern@gmail.com>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/20120307114249.44275ca3@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Xiaotian Feng [Wed, 14 Mar 2012 14:34:48 +0000 (15:34 +0100)]
block: fix ioc leak in put_io_context
When put_io_context is called, if ioc->icq_list is empty and refcount
is 1, kernel will not free the ioc.
This is caught by following kmemleak:
unreferenced object 0xffff880036349fe0 (size 216):
comm "sh", pid 2137, jiffies
4294931140 (age 290579.412s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01 00 01 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
backtrace:
[<
ffffffff8169f926>] kmemleak_alloc+0x26/0x50
[<
ffffffff81195a9c>] kmem_cache_alloc_node+0x1cc/0x2a0
[<
ffffffff81356b67>] create_io_context_slowpath+0x27/0x130
[<
ffffffff81356d2b>] get_task_io_context+0xbb/0xf0
[<
ffffffff81055f0e>] copy_process+0x188e/0x18b0
[<
ffffffff8105609b>] do_fork+0x11b/0x420
[<
ffffffff810247f8>] sys_clone+0x28/0x30
[<
ffffffff816d3373>] stub_clone+0x13/0x20
[<
ffffffffffffffff>] 0xffffffffffffffff
ioc should be freed if ioc->icq_list is empty.
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jiri Olsa [Tue, 13 Mar 2012 23:03:02 +0000 (00:03 +0100)]
perf: Add ifdef to remove unused enum switch warnings
Fix for unused symbols in switch warnings.
Link: http://lkml.kernel.org/r/20120313230302.GA1514@m.redhat.com
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>