git.openwrt.org Git - openwrt/staging/blogic.git/log

tracing: Move locking of trace_cmdline_lock into start/stop seq calls

With the conversion of the saved_cmdlines output to use seq_read, there
is now a race between accessing the values of the saved_cmdlines and
the writing to them. The trace_cmdline_lock needs to be taken at
the start and stop of the seq calls.

A new __trace_find_cmdline() call is created to allow for the look up
to happen without taking the lock.

Fixes: 42584c81c5ad tracing: Have saved_cmdlines use the seq_read infrastructure
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Try again for saved cmdline if failed due to locking

In order to prevent the saved cmdline cache from being filled when
tracing is not active, the comms are only recorded after a trace event
is recorded.

The problem is, a comm can fail to be recorded if the trace_cmdline_lock
is held. That lock is taken via a trylock to allow it to happen from
any context (including NMI). If the lock fails to be taken, the comm
is skipped. No big deal, as we will try again later.

But! Because of the code that was added to only record after an event,
we may not try again later as the recording is made as a oneshot per
event per CPU.

Only disable the recording of the comm if the comm is actually recorded.

Fixes: 7ffbd48d5cab "tracing: Cache comms only after an event occurred"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Have saved_cmdlines use the seq_read infrastructure

Current tracing_saved_cmdlines_read() implementation is naive; It allocates
a large buffer, constructs output data to that buffer for each read
operation, and then copies a portion of the buffer to the user space
buffer. This has several issues such as slow memory allocation, high
CPU usage, and even corruption of the output data.

The seq_read infrastructure is made to handle this type of work.
By converting it to use seq_read() the code becomes smaller, simplified,
as well as correct.

Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add tracepoint benchmark tracepoint

In order to help benchmark the time tracepoints take, a new config
option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option
is set a tracepoint is created called "benchmark:benchmark_event".
When the tracepoint is enabled, it kicks off a kernel thread that
goes into an infinite loop (calling cond_sched() to let other tasks
run), and calls the tracepoint. Each iteration will record the time
it took to write to the tracepoint and the next iteration that
data will be passed to the tracepoint itself. That is, the tracepoint
will report the time it took to do the previous tracepoint.
The string written to the tracepoint is a static string of 128 bytes
to keep the time the same. The initial string is simply a write of
"START". The second string records the cold cache time of the first
write which is not added to the rest of the calculations.

As it is a tight loop, it benchmarks as hot cache. That's fine because
we care most about hot paths that are probably in cache already.

An example of the output:

     START
     first=3672 [COLD CACHED]
     last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
     last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
     last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
     last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
     last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
     last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Print nasty banner when trace_printk() is in use

trace_printk() is used to debug fast paths within the kernel. Places
that gets called in any context (interrupt or NMI) or thousands of
times a second. Something you do not want to do with a printk().

In order to make it completely lockless as it needs a temporary buffer
to handle some of the string formatting, a page is created per cpu for
every context (four per cpu; normal, softirq, irq, NMI).

Since trace_printk() should only be used for debugging purposes,
there's no reason to waste memory on these buffers on a production
system. That means, trace_printk() should never be used unless a
developer is debugging their kernel. There's macro magic to allocate
the buffers if trace_printk() is used anywhere in the kernel.

To help enforce that trace_printk() isn't used outside of development,
when it is used, a nasty banner is displayed on bootup (or when a module
is loaded that uses trace_printk() and the kernel core does not).

Here's the banner:

**********************************************************
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**                                                      **
** trace_printk() being used. Allocating extra memory.  **
**                                                      **
** This means that this is a DEBUG kernel and it is     **
** unsafe for produciton use.                           **
**                                                      **
** If you see this message and you are not debugging    **
** the kernel, report this immediately to your vendor!  **
**                                                      **
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**********************************************************

That should hopefully keep developers from trying to sneak in a
trace_printk() or two.

Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add funcgraph_tail option to print function name after closing braces

In the function-graph tracer, add a funcgraph_tail option
to print the function name on all } lines, not just
functions whose first line is no longer in the trace
buffer.

If a function calls other traced functions, its total
time appears on its } line. This change allows grep
to be used to determine the function for which the
line corresponds.

Update Documentation/trace/ftrace.txt to describe
this new option.

Link: http://lkml.kernel.org/p/20140520221041.8359.6782.stgit@beardog.cce.hp.com
Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Eliminate duplicate TRACE_GRAPH_PRINT_xx defines

Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
in trace_functions_graph.c that are already in
trace.h.

Add TRACE_GRAPH_PRINT_IRQS to trace.h, which is
the only one that is missing.

Link: http://lkml.kernel.org/p/20140520221031.8359.24733.stgit@beardog.cce.hp.com
Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add __bitmask() macro to trace events to cpumasks and other bitmasks

Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.

The following macros were added for use with the TRACE_EVENT() macro:

  __bitmask()
  __assign_bitmask()
  __get_bitmask()

To test this, I added this to the sched_migrate_task event, which
looked like this:

TRACE_EVENT(sched_migrate_task,

TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),

TP_ARGS(p, dest_cpu, cpus),

TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
__field( pid_t, pid )
__field( int, prio )
__field( int, orig_cpu )
__field( int, dest_cpu )
__bitmask( cpumask, num_possible_cpus() )
),

TP_fast_assign(
memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->orig_cpu = task_cpu(p);
__entry->dest_cpu = dest_cpu;
__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
),

TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
  __entry->comm, __entry->pid, __entry->prio,
  __entry->orig_cpu, __entry->dest_cpu,
  __get_bitmask(cpumask))
);

With the output of:

        ksmtuned-3613  [003] d..2   485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
     migration/1-13    [001] d..5   485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
             awk-3615  [002] d.H5   485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
     migration/2-18    [002] d..5   485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f

Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace/x86: Move the mcount/fentry code out of entry_64.S

As the mcount code gets more complex, it really does not belong
in the entry.S file. By moving it into its own file "mcount.S"
keeps things a bit cleaner.

Link: http://lkml.kernel.org/p/20140508152152.2130e8cf@gandalf.local.home
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Remove FTRACE_UPDATE_MODIFY_CALL_REGS flag

As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Use the ftrace_addr helper functions to find the ftrace_addr

With the moving of the functions that determine what the mcount call site
should be replaced with into the generic code, there is a few places
in the generic code that can use them instead of hard coding it as it
does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global

Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.

This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.

ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.

ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace/x86: Get the current mcount addr for add_breakpoint()

The add_breakpoint() code in the ftrace updating gets the address
of what the call will become, but if the mcount address is changing
from regs to non-regs ftrace_caller or vice versa, it will use what
the record currently is.

This is rather silly as the code should always use what is currently
there regardless of if it's changing the regs function or just converting
to a nop.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Always inline ftrace_hash_empty() helper function

The ftrace_hash_empty() function is a simple test:

return !hash || !hash->count;

But gcc seems to want to make it a call. As this is in an extreme
hot path of the function tracer, there's no reason it needs to be
a call. I only wrote it to be a helper function anyway, otherwise
it would have been inlined manually.

Force gcc to inline it, as it could have also been a macro.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Write in missing comment from a very old commit

Back in 2011 Commit ed926f9b35cda "ftrace: Use counters to enable
functions to trace" changed the way ftrace accounts for enabled
and disabled traced functions. There was a comment started as:

/*
*
*/

But never finished. Well, that's rather useless. I probably forgot
to save the file before committing it. And it passed review from all
this time.

Anyway, better late than never. I updated the comment to express what
is happening in that somewhat complex code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Remove boolean of hash_enable and hash_disable

Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler, but it left a variable "hash_enable" that was used
to know if the hash functions should be updated or not. It was
updated if the global_ops did not override them. As the global_ops
are now no different than any other ftrace_ops, the hash always
gets updated and there's no reason to use the hash_enable boolean.

The same goes for hash_disable used in ftrace_shutdown().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add trace_<tracepoint>_enabled() function

There are some code paths in the kernel that need to do some preparations
before it calls a tracepoint. As that code is worthless overhead when
the tracepoint is not enabled, it would be prudent to have that code
only run when the tracepoint is active. To accomplish this, all tracepoints
now get a static inline function called "trace_<tracepoint-name>_enabled()"
which returns true when the tracepoint is enabled and false otherwise.

As an added bonus, that function uses the static_key of the tracepoint
such that no branch is needed.

  if (trace_mytracepoint_enabled()) {
arg = process_tp_arg();
trace_mytracepoint(arg);
  }

Will keep the "process_tp_arg()" (which may be expensive to run) from
being executed when the tracepoint isn't enabled.

It's best to encapsulate the tracepoint itself in the if statement
just to keep races. For example, if you had:

  if (trace_mytracepoint_enabled())
arg = process_tp_arg();
  trace_mytracepoint(arg);

There's a chance that the tracepoint could be enabled just after the
if statement, and arg will be undefined when calling the tracepoint.

Link: http://lkml.kernel.org/r/20140506094407.507b6435@gandalf.local.home
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Replace __get_cpu_var uses with this_cpu_ptr

Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr.

Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Remove myself as a tracing maintainer

It has been a while since I last sent a tracing patch. I always keep an
eye on tracing evolutions and contributions in general but given
how busy I am with nohz, isolation and more generally core cleanups stuff,
I seldom have time left to provide deep reviews of tracing patches nor
simply for reviews to begin with.

I've been very lucky to start kernel development on a very young subsystem
with tons of low hanging fruits back in 2008. Given that it deals with
a lot of tricky stuffs all around (sched, timers, irq, preemption, NMIs,
SMP, RCU, ....) I basically learned everything there.

Steve has been doing most of the incredible work these last years.
Thanks a lot!

Of course consider me always available to help on tracing if any hard
days happen.

Link: http://lkml.kernel.org/p/1399131991-13216-1-git-send-email-fweisbec@gmail.com
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Have function graph tracer use global_ops for filtering

Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler. But it left out function graph filtering which
also depended on that code. The function graph filtering still
needs to use global_ops as the filter otherwise it wont filter
at all.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Remove mock up poll wait function

Now that the ring buffer has a built in way to wake up readers
when there's data, using irq_work such that it is safe to do it
in any context. But it was still using the old "poor man's"
wait polling that checks every 1/10 of a second to see if it
should wake up a waiter. This makes the latency for a wake up
excruciatingly long. No need to do that anymore.

Completely remove the different wait_poll types from the tracers
and have them all use the default one now.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Break out of tracing_wait_pipe() before wait_pipe() is called

When reading from trace_pipe, if tracing is off but nothing was read
it should block. If something is read and tracing is off, then EOF
is returned. If tracing is on and there's nothing to read, it will block.

But because the check of whether tracing is off and something was read
is done after the block on the pipe, it is hit or miss if the EOF is
returned or not leading to inconsistent behavior.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Fix documentation of ftrace_set_global_{filter,notrace}()

The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
still have their old names in the kernel doc (ftrace_set_filter and
ftrace_set_notrace respectively). Replace these with the real names.

Link: http://lkml.kernel.org/p/1398006644-5935-3-git-send-email-wangjiaxing@insigma.com.cn
Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing/stack_trace: Skip 4 instead of 3 when using ftrace_ops_list_func

When using ftrace_ops_list_func, we should skip 4 instead of 3,
to avoid ftrace_call+0x5/0xb appearing in the stack trace:

        Depth    Size   Location    (110 entries)
        -----    ----   --------
  0)     2956       0   update_curr+0xe/0x1e0
  1)     2956      68   ftrace_call+0x5/0xb
  2)     2888      92   enqueue_entity+0x53/0xe80
  3)     2796      80   enqueue_task_fair+0x47/0x7e0
  4)     2716      28   enqueue_task+0x45/0x70
  5)     2688      12   activate_task+0x22/0x30

Add a function using_ftrace_ops_list_func() to test for this while keeping
ftrace_ops_list_func to remain static.

Link: http://lkml.kernel.org/p/1398006644-5935-2-git-send-email-wangjiaxing@insigma.com.cn
Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Add static to local functions

This patch adds static to the following functions:
-cycle_t buffer_ftrace_now
-void free_snapshot
-int trace_selftest_startup_dynamic_tracing

Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Statically initialize pm notifier block

Instead of initializing the pm notifier block in register_ftrace_graph(),
initialize it statically. This safes us some code.

Found in the PaX patch, written by the PaX Team.

Link: http://lkml.kernel.org/p/1396186310-3156-1-git-send-email-minipli@googlemail.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace/x86: Fix order of warning messages when ftrace modifies code

The colon at the end of the printk message suggests that it should get printed
before the details printed by ftrace_bug().

When touching the line, let's use the preferred pr_warn() macro as suggested
by checkpatch.pl.

Link: http://lkml.kernel.org/r/1392650573-3390-5-git-send-email-pmladek@suse.cz
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Allow irq/preempt tracers to be used by instances

The irqsoff, preemptoff and preemptirqsoff tracers can now be used by
instances. But they may only be used by one instance at a time (including
the top level directory). This allows multiple tracers to run while the
irqsoff (and friends) tracer is running simultaneously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Allow wakeup tracers to be used by instances

The wakeup and wakeup_rt tracers can now be used by instances.
But they may only be used by one instance at a time (including the
top level directory). This allows multiple tracers to run while
the wakeup tracer is running simultaneously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Move ftrace_max_lock into trace_array

In preparation for having tracers enabled in instances, the max_lock
should be unique as updating the max for one tracer is a separate
operation than updating it for another tracer using a different max.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

tracing: Move tracing_max_latency into trace_array

In preparation for letting the latency tracers be used by instances,
remove the global tracing_max_latency variable and add a max_latency
field to the trace_array that the latency tracers will now use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

ftrace: Remove global function list and call function directly

Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.

Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.

This removes a lot of code as well as the complexity involved with it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Linux 3.15-rc2

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Back from long weekend here in India and now the time to send fixes
  for slave dmaengine.
   - Dan's fix of sirf xlate code
   - Jean's fix for timberland
   - edma fixes by Sekhar for SG handling and Yuan for changing init
     call"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dma: fix eDMA driver as a subsys_initcall
  dmaengine: sirf: off by one in of_dma_sirfsoc_xlate()
  platform: Fix timberdale dependencies
  dma: edma: fix incorrect SG list handling

Merge tag 'iommu-fixes-v3.15-rc1' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
"Fixes for regressions:

   - fix wrong IOMMU enumeration causing some SCSI device drivers
     initialization failures
   - ARM-SMMU fixes for a panic condition and a wrong return value"

* tag 'iommu-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/arm-smmu: fix panic in arm_smmu_alloc_init_pte
  iommu/arm-smmu: Return 0 on unmap failure
  iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors
  iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges
  iommu/vt-d: fix memory leakage caused by commit ea8ea46

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf tooling fixes from Ingo Molnar:
"Three small tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Improve error reporting
  perf tools: Adjust symbols in VDSO
  perf kvm: Fix 'Min time' counting in report command

Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/jolsa/perf into perf/urgent

Pull perf/urgent fixes from Jiri Olsa:

User visible changes:

  * Adjust symbols in VDSO to properly resolve its function names (Vladimir Nikulichev)

  * Improve error reporting for record session failure (Adrien BAK)

  * Fix 'Min time' counting in report command (Alexander Yarygin)

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

perf tools: Improve error reporting

In the current version, when using perf record, if something goes
wrong in tools/perf/builtin-record.c:375
session = perf_session__new(file, false, NULL);

The error message:
"Not enough memory for reading per file header"

is issued. This error message seems to be outdated and is not very
helpful. This patch proposes to replace this error message by
"Perf session creation failed"

I believe this issue has been brought to lkml:
https://lkml.org/lkml/2014/2/24/458
although this patch only tackles a (small) part of the issue.

Additionnaly, this patch improves error reporting in
tools/perf/util/data.c open_file_write.

Currently, if the call to open fails, the user is unaware of it.
This patch logs the error, before returning the error code to
the caller.

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Adrien BAK <adrien.bak@metascale.org>
Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast
[ Reorganize the changelog into paragraphs ]
[ Added empty line after fd declaration in open_file_write ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>

perf tools: Adjust symbols in VDSO

pert-report doesn't resolve function names in VDSO:

$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
            8.76%
               0x7fff6b1fe861
               __gettimeofday
               ACE_OS::gettimeofday()
...

In this case symbol values should be adjusted the same way as for executables,
relocatable objects and prelinked libraries.

After fix:

$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
            8.76%
               __vdso_gettimeofday
               __gettimeofday
               ACE_OS::gettimeofday()

Signed-off-by: Vladimir Nikulichev <nvs@tbricks.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvs
Signed-off-by: Jiri Olsa <jolsa@redhat.com>

perf kvm: Fix 'Min time' counting in report command

Every event in the perf-kvm has a 'stats' structure, which contains
max/min/average/etc times of handling this event.
The problem is that the 'perf-kvm stat report' command always shows
that 'min time' is 0us for every event. Example:

# perf kvm stat report

Analyze events for all VCPUs:

    VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
  [..]
  0xB2 MSCH         12     0.07%     0.00%        0us        8us 7.31us ( +-   2.11% )
  0xB2 CHSC         12     0.07%     0.00%        0us       18us 9.39us ( +-   9.49% )
  0xB2 STPX          8     0.05%     0.00%        0us        2us 1.88us ( +-   7.18% )
  0xB2 STSI          7     0.04%     0.00%        0us       44us 16.49us ( +-  38.20% )
  [..]

This happens because the 'stats' structure is not initialized and
stats->min equals to 0. Lets initialize the structure for every
event after its allocation using init_stats() function. This initializes
stats->min to -1 and makes 'Min time' statistics counting work:

# perf kvm stat report

Analyze events for all VCPUs:

    VM-EXIT    Samples  Samples%     Time%   Min Time   Max Time Avg time
  [..]
  0xB2 MSCH         12     0.07%     0.00%        6us        8us 7.31us ( +-   2.11% )
  0xB2 CHSC         12     0.07%     0.00%        7us       18us 9.39us ( +-   9.49% )
  0xB2 STPX          8     0.05%     0.00%        1us        2us 1.88us ( +-   7.18% )
  0xB2 STSI          7     0.04%     0.00%        1us       44us 16.49us ( +-  38.20% )
  [..]

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com
[ Fixing the perf examples changelog output ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>

coredump: fix va_list corruption

A va_list needs to be copied in case it needs to be used twice.

Thanks to Hugh for debugging this issue, leading to various panics.

Tested:

  lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

'produce_core' is simply : main() { *(int *)0 = 1;}

  lpq84:~# ./produce_core
  Segmentation fault (core dumped)
  lpq84:~# dmesg | tail -1
  [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)

After fix :

  lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
  lpq83:~# ./produce_core
  Segmentation fault
  lpq83:~# dmesg | tail -1
  [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
"This fixes the preemption-count imbalance crash reported by Owen
Kibel"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix CMCI preemption bugs

Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"Two fixes:

   - a SCHED_DEADLINE task selection fix
   - a sched/numa related lockdep splat fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Check for stop task appearance when balancing happens
  sched/numa: Fix task_numa_free() lockdep splat

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Two kernel side fixes:

   - an Intel uncore PMU driver potential crash fix
   - a kprobes/perf-call-graph interaction fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
  kprobes/x86: Fix page-fault handling logic

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Unfortunately this contains no easter eggs, its a bit larger than I'd
  like, but I included a patch that just moves code from one file to
  another and I'd like to avoid merge conflicts with that later, so it
  makes it seem worse than it is,

  Otherwise:
   - radeon: fixes to use new microcode to stabilise some cards, use
     some common displayport code, some runtime pm fixes, pll regression
     fixes
   - i915: fix for some context oopses, a warn in a used path, backlight
     fixes
   - nouveau: regression fix
   - omap: a bunch of fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (51 commits)
  drm: bochs: drop unused struct fields
  drm: bochs: add power management support
  drm: cirrus: add power management support
  drm: Split out drm_probe_helper.c from drm_crtc_helper.c
  drm/plane-helper: Don't fake-implement primary plane disabling
  drm/ast: fix value check in cbr_scan2
  drm/nouveau/bios: fix a bit shift error introduced by 457e77b
  drm/radeon/ci: make sure mc ucode is loaded before checking the size
  drm/radeon/si: make sure mc ucode is loaded before checking the size
  drm/radeon: improve PLL params if we don't match exactly v2
  drm/radeon: memory leak on bo reservation failure. v2
  drm/radeon: fix VCE fence command
  drm/radeon: re-enable mclk dpm on R7 260X asics
  drm/radeon: add support for newer mc ucode on CI (v2)
  drm/radeon: add support for newer mc ucode on SI (v2)
  drm/radeon: apply more strict limits for PLL params v2
  drm/radeon: update CI DPM powertune settings
  drm/radeon: fix runpm handling on APUs (v4)
  drm/radeon: disable mclk dpm on R7 260X
  drm/tegra: Remove gratuitous pad field
  ...

Merge branch 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux into drm-next

Some i2c fixes over DisplayPort.

* 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon: Improve vramlimit module param documentation
  drm/radeon: fix audio pin counts for DCE6+ (v2)
  drm/radeon/dp: switch to the common i2c over aux code
  drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
  drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
  drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
  drm/i915: support address only i2c-over-aux transactions
  drm/tegra: dp: Support address-only I2C-over-AUX transactions

Merge git://git./linux/kernel/git/davem/net

Pull more networking fixes from David Miller:

1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
    context, not synchronize it.  From Chris Mason.

2) Ipv4 flow input interface should never be zero, it should be
    LOOPBACK_IFINDEX instead.  From Cong Wang and Julian Anastasov.

3) Properly configure MAC to PHY connection in mvneta devices, from
    Thomas Petazzoni.

4) sys_recv should use SYSCALL_DEFINE.  From Jan Glauber.

5) Tunnel driver ioctls do not use the correct namespace, fix from
    Nicolas Dichtel.

6) Fix memory leak on seccomp filter attach, from Kees Cook.

7) Fix lockdep warning for nested vlans, from Ding Tianhong.

8) Crashes can happen in SCTP due to how the auth_enable value is
    managed, fix from Vlad Yasevich.

9) Wireless fixes from John W Linville and co.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  net: sctp: cache auth_enable per endpoint
  tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
  vlan: Fix lockdep warning when vlan dev handle notification
  seccomp: fix memory leak on filter attach
  isdn: icn: buffer overflow in icn_command()
  ip6_tunnel: use the right netns in ioctl handler
  sit: use the right netns in ioctl handler
  ip_tunnel: use the right netns in ioctl handler
  net: use SYSCALL_DEFINEx for sys_recv
  net: mdio-gpio: Add support for separate MDI and MDO gpio pins
  net: mdio-gpio: Add support for active low gpio pins
  net: mdio-gpio: Use devm_ functions where possible
  ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
  ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
  mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
  net: mvneta: properly configure the MAC <-> PHY connection in all situations
  net: phy: add minimal support for QSGMII PHY
  sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
  mwifiex: fix hung task on command timeout
  mwifiex: process event before command response
  ...

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"A set of 5 small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cif: fix dead code
  cifs: fix error handling cifs_user_readv
  fs: cifs: remove unused variable.
  Return correct error on query of xattr on file with empty xattrs
  cifs: Wait for writebacks to complete before attempting write.

Merge tag 'char-misc-3.15-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are a few driver fixes for char/misc drivers that resolve
  reported issues.

  All have been in linux-next successfully for a few days"

* tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts
  Tools: hv: Handle the case when the target file exists correctly
  vme_tsi148: Utilize to_pci_dev() macro
  vme_tsi148: Fix PCI address mapping assumption
  vme_tsi148: Fix typo in tsi148_slave_get()
  w1: avoid recursive device_add
  w1: fix netlink refcnt leak on error path
  misc: Grammar s/addition/additional/
  drivers: mcb: fix memory leak in chameleon_parse_cells() error path
  mei: ignore client writing state during cb completion
  mei: me: do not load the driver if the FW doesn't support MEI interface
  GenWQE: Increase driver version number
  GenWQE: Fix multithreading problems
  GenWQE: Ensure rc is not returning an uninitialized value
  GenWQE: Add wmb before DDCB is started
  GenWQE: Enable access to VPD flash area

Merge tag 'driver-core-3.15-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 3.15-rc2.  Also in here are some
  documentation updates, as well as an API removal that had to wait for
  after -rc1 due to the cleanups coming into you from multiple developer
  trees (this one and the PPC tree.)

  All have been in linux next successfully"

* tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base/dd.c incorrect pr_debug() parameters
  Documentation: Update stable address in Chinese and Japanese translations
  topology: Fix compilation warning when not in SMP
  Chinese: add translation of io_ordering.txt
  stable_kernel_rules: spelling/word usage
  sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
  kernfs: protect lazy kernfs_iattrs allocation with mutex
  fs: Don't return 0 from get_anon_bdev

Merge tag 'staging-3.15-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are a few staging driver fixes for issues that have been reported
  for 3.15-rc2.

  Also dominating the diffstat for the pull request is the removal of
  the rtl8187se driver.  It's no longer needed in staging as a "real"
  driver for this hardware is now merged in the tree in the "correct"
  location in drivers/net/

  All of these patches have been tested in linux-next"

* tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: r8188eu: Fix case where ethtype was never obtained and always be checked against 0
  staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0
  staging: r8188eu: Calling rtw_get_stainfo() with a NULL sta_addr will return NULL
  staging: comedi: fix circular locking dependency in comedi_mmap()
  staging: r8723au: Add missing initialization of change_inx in sort algorithm
  Staging: unisys: use after free in list_for_each()
  staging: unisys: use after free in error messages
  staging: speakup: fix misuse of kstrtol() in handle_goto()
  staging: goldfish: Call free_irq in error path
  staging: delete rtl8187se wireless driver
  staging: rtl8723au: Fix buffer overflow in rtw_get_wfd_ie()
  staging: gs_fpgaboot: remove __TIMESTAMP__ macro
  staging: vme: fix memory leak in vme_user_probe()
  staging: fpgaboot: clean up Makefile
  staging/usbip: fix store_attach() sscanf return value check
  staging/usbip: userspace - fix usbipd SIGSEGV from refresh_exported_devices()
  staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls
  staging/rtl8821ae: Fix OOM handling in _rtl_init_deferred_work()

Merge tag 'tty-3.15-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty/serial driver fixes for 3.15-rc2.  Also
  in here are some Documentation file removals for drivers that we
  removed a long time ago, no need to keep it around any longer.

  All of these have been in linux-next for a bit"

* tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250, disable "too much work" messages"
  serial: amba-pl011: fix regression, causing an Oops on rmmod
  tty: Fix help text of SYNCLINK_CS
  tty: fix memleak in alloc_pid
  ttyprintk: Allow built as a module
  ttyprintk: Fix wrong tty_unregister_driver() call in the error path
  serial: 8250, disable "too much work" messages
  Documentation/serial: Delete obsolete driver documentation
  serial: omap: Fix missing pm_runtime_resume handling by simplifying code
  serial_core: Fix pm imbalance on unbind
  serial: pl011: change Rx burst size to half of trigger level
  serial: timberdale: Depend on X86_32
  serial: st-asc: Fix SysRq char handling
  Revert "serial: clps711x: Give a chance to perform useful tasks during wait loop"
  serial_core: Fix conditional start_tx on ring buffer not empty
  serial: efm32: use $vendor,$device scheme for compatible string
  serial: omap: free the wakeup settings in remove

Merge tag 'usb-3.15-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of tiny USB fixes and new device ids for 3.15-rc2.
  Nothing major, just issues some people have reported.

  All of these have been in linux-next"

* tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  uas: fix deadlocky memory allocations
  uas: fix error handling during scsi_scan()
  uas: fix GFP_NOIO under spinlock
  uwb: adds missing error handling
  USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver
  USB: ohci-jz4740: FEAT_POWER is a port feature, not a hub feature
  USB: ohci-jz4740: Fix uninitialized variable warning
  USB: EHCI: tegra: set txfill_tuning
  usb: ehci-platform: Return immediately from suspend if ehci_suspend fails
  usb: ehci-exynos: Return immediately from suspend if ehci_suspend fails
  USB: fix crash during hotplug of PCI USB controller card
  USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate()
  usb: usb-common: fix typo for usb_state_string
  USB: usb_wwan: fix handling of missing bulk endpoints
  USB: pl2303: add ids for Hewlett-Packard HP POS pole displays
  USB: cp210x: Add 8281 (Nanotec Plug & Drive)
  usb: option driver, add support for Telit UE910v2
  Revert "USB: serial: add usbid for dell wwan card to sierra.c"
  USB: serial: ftdi_sio: add id for Brainboxes serial cards

Merge branch 'akpm' (incoming from Andrew)

Merge misc fixes from Andrew Morton:
"13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  thp: close race between split and zap huge pages
  mm: fix new kernel-doc warning in filemap.c
  mm: fix CONFIG_DEBUG_VM_RB description
  mm: use paravirt friendly ops for NUMA hinting ptes
  mips: export flush_icache_range
  mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
  wait: explain the shadowing and type inconsistencies
  Shiraz has moved
  Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
  powerpc/mm: fix ".__node_distance" undefined
  kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
  init/Kconfig: move the trusted keyring config option to general setup
  vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()

thp: close race between split and zap huge pages

Sasha Levin has reported two THP BUGs[1][2].  I believe both of them
have the same root cause.  Let's look to them one by one.

The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".  It's
BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().  From my
testing I see that page_mapcount() is higher than mapcount here.

I think it happens due to race between zap_huge_pmd() and
page_check_address_pmd().  page_check_address_pmd() misses PMD which is
under zap:

CPU0 CPU1
zap_huge_pmd()
  pmdp_get_and_clear()
__split_huge_page()
  anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
      page_check_address_pmd()
        mm_find_pmd()
  /*
   * We check if PMD present without taking ptl: no
   * serialization against zap_huge_pmd(). We miss this PMD,
   * it's not accounted to 'mapcount' in __split_huge_page().
   */
  pmd_present(pmd) == 0

  BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!

  page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)

The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().

This happens in similar way:

CPU0 CPU1
zap_huge_pmd()
  pmdp_get_and_clear()
  page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)
__split_huge_page()
  anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
      page_check_address_pmd()
        mm_find_pmd()
  pmd_present(pmd) == 0 /* The same comment as above */
  /*
   * No crash this time since we already decremented page->_mapcount in
   * zap_huge_pmd().
   */
  BUG_ON(mapcount != page_mapcount(page))

  /*
   * We split the compound page here into small pages without
   * serialization against zap_huge_pmd()
   */
  __split_huge_page_refcount()
VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!

So my understanding the problem is pmd_present() check in mm_find_pmd()
without taking page table lock.

The bug was introduced by me commit with commit 117b0791ac42. Sorry for
that. :(

Let's open code mm_find_pmd() in page_check_address_pmd() and do the
check under page table lock.

Note that __page_check_address() does the same for PTE entires
if sync != 0.

I've stress tested split and zap code paths for 36+ hours by now and
don't see crashes with the patch applied. Before it took <20 min to
trigger the first bug and few hours for second one (if we ignore
first).

[1] https://lkml.kernel.org/g/<53440991.9090001@oracle.com>
[2] https://lkml.kernel.org/g/<5310C56C.60709@oracle.com>

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix new kernel-doc warning in filemap.c

Fix new kernel-doc warning in mm/filemap.c:

Warning(mm/filemap.c:2600): Excess function parameter 'ppos' description in '__generic_file_aio_write'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix CONFIG_DEBUG_VM_RB description

This appears to be a copy/paste error. Update the description to
reflect extra rbtree debug and checks for the config option instead of
duplicating CONFIG_DEBUG_VM.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: use paravirt friendly ops for NUMA hinting ptes

David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

Xen PV guest page tables require that their entries use machine
addresses if the preset bit (_PAGE_PRESENT) is set, and (for
successful migration) non-present PTEs must use pseudo-physical
addresses.  This is because on migration MFNs in present PTEs are
translated to PFNs (canonicalised) so they may be translated back
to the new MFN in the destination domain (uncanonicalised).

pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
set and clear the _PAGE_PRESENT bit using pte_set_flags(),
pte_clear_flags(), etc.

In a Xen PV guest, these functions must translate MFNs to PFNs
when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
_PAGE_PRESENT.

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for
protections.

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break
Xen.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mips: export flush_icache_range

The lkdtm module performs tests against executable memory ranges, so it
needs to flush the icache for proper behaviors. Other architectures
already export this, so do the same for MIPS.

[akpm@linux-foundation.org: relocate export sites]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()

soft lockup in freeing gigantic hugepage fixed in commit 55f67141a892 "mm:
hugetlb: fix softlockup when a large number of hugepages are freed." can
happen in return_unused_surplus_pages(), so let's fix it.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

wait: explain the shadowing and type inconsistencies

Stick in a comment before someone else tries to fix the sparse warning
this generates.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Shiraz has moved

shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company. Replace ST's id with shiraz.linux.kernel@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt

In document numa_memory_policy.txt, the following examples for flag
MPOL_F_RELATIVE_NODES are incorrect.

For example, consider a task that is attached to a cpuset with
mems 2-5 that sets an Interleave policy over the same set with
MPOL_F_RELATIVE_NODES.  If the cpuset's mems change to 3-7, the
interleave now occurs over nodes 3,5-6.  If the cpuset's mems
then change to 0,2-3,5, then the interleave occurs over nodes
0,3,5.

According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
the nodemasks the user specifies should be considered relative to the
current task's mems_allowed.

(https://lkml.org/lkml/2008/2/29/428)

And according to numa_memory_policy.txt, if the user's nodemask includes
nodes that are outside the range of the new set of allowed nodes, then
the remap wraps around to the beginning of the nodemask and, if not
already set, sets the node in the mempolicy nodemask.

So in the example, if the user specifies 2-5, for a task whose
mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
fifth, sixth node in mems_allowed.  like the following:

mems_allowed:       3  4  5  6  7

relative index:     0  1  2  3  4
                    5

So the nodemasks should be remapped to 3,5-7, but not 3,5-6.

And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
remapped to 0,2-3,5, but not 0,3,5.

mems_allowed:       0  2  3  5

        relative index:     0  1  2  3
                            4  5

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

powerpc/mm: fix ".__node_distance" undefined

  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  ...
  Building modules, stage 2.
WARNING: 1 bad relocations
c0000000013d6a30 R_PPC64_ADDR64    uprobes_fetch_type_table
  WRAP    arch/powerpc/boot/zImage.pseries
  WRAP    arch/powerpc/boot/zImage.epapr
  MODPOST 1849 modules
ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
make: *** Waiting for unfinished jobs....

The reason is symbol "__node_distance" not been exported in powerpc.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()

Fix:

  BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
  caller is __this_cpu_preempt_check+0x13/0x20
  CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G        W     3.15.0-rc1 #9
  Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
  Call Trace:
    check_preemption_disabled+0xe1/0xf0
    __this_cpu_preempt_check+0x13/0x20
    touch_nmi_watchdog+0x28/0x40

Reported-by: Luis Henriques <luis.henriques@canonical.com>
Tested-by: Luis Henriques <luis.henriques@canonical.com>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

init/Kconfig: move the trusted keyring config option to general setup

The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it
to show up in the toplevel of the kernel configuration. Fix this by
moving it under the General Setup menu.

Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()

Seems to be called with preemption enabled. Therefore it must use
mod_zone_page_state instead.

Signed-off-by: Christoph Lameter <cl@linux.com>
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: sctp: cache auth_enable per endpoint

Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:

Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0
03e00008 00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt

What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.

After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.

The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.

Fix in joint work with Daniel Borkmann.

Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
pull request: wireless 2014-04-17

Please pull this batch of fixes intended for the 3.15 stream...

For the mac80211 bits, Johannes says:

"We have a fix from Chun-Yeow to not look at management frame bitrates
that are typically really low, two fixes from Felix for AP_VLAN
interfaces, a fix from Ido to disable SMPS settings when a monitor
interface is enabled, a radar detection fix from Michał and a fix from
myself for a very old remain-on-channel bug."

For the iwlwifi bits, Emmanuel says:

"I have new device IDs and a new firmware API. These are the trivial
ones. The less trivial ones are Johannes's fix that delays the
enablement of an interrupt coalescing hardware until after association
- this fixes a few connection problems seen in the field. Eyal has a
bunch of rate control fixes. I decided to add these for 3.15 because
they fix some disconnection and packet loss scenarios which were
reported by the field. I also have a fix for a memory leak that
happens only with a very new NIC."

Along with those...

Amitkumar Karwar fixes a couple of problems relating to driver/firmware
interactions in mwifiex.

Christian Engelmayer avoids a couple of potential memory leaks in
the new rsi driver.

Eliad Peller provides a wl18xx mailbox alignment fix for problems
when using new firmware.

Frederic Danis adds a couple of missing debugging strings to the
cw1200 driver.

Geert Uytterhoeven adds a variable initialization inside of the
rsi driver.

Luciano Coelho patches the wlcore code to ignore dummy packet events
in PLT mode in order to work around a firmware bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled

The patch fixes a problem with dropped jumbo frames after usage of
'ethtool -G ... rx'.

Scenario:
1. ip link set eth0 up
2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo
3. ip link set mtu 9000 dev eth0

The ethtool command set rx_jumbo_pending to zero so any received jumbo
packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N'
to workaround the issue.
The patch changes the logic so rx_jumbo_pending value is changed only if
jumbo frames are enabled (MTU > 1500).

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: Fix lockdep warning when vlan dev handle notification

When I open the LOCKDEP config and run these steps:

modprobe 8021q
vconfig add eth2 20
vconfig add eth2.20 30
ifconfig eth2 xx.xx.xx.xx

then the Call Trace happened:

[32524.386288] =============================================
[32524.386293] [ INFO: possible recursive locking detected ]
[32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G           O
[32524.386302] ---------------------------------------------
[32524.386306] ifconfig/3103 is trying to acquire lock:
[32524.386310]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386326]
[32524.386326] but task is already holding lock:
[32524.386330]  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386341]
[32524.386341] other info that might help us debug this:
[32524.386345]  Possible unsafe locking scenario:
[32524.386345]
[32524.386350]        CPU0
[32524.386352]        ----
[32524.386354]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386359]   lock(&vlan_netdev_addr_lock_key/1);
[32524.386364]
[32524.386364]  *** DEADLOCK ***
[32524.386364]
[32524.386368]  May be due to missing lock nesting notation
[32524.386368]
[32524.386373] 2 locks held by ifconfig/3103:
[32524.386376]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81431d42>] rtnl_lock+0x12/0x20
[32524.386387]  #1:  (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff8141af83>] dev_set_rx_mode+0x23/0x40
[32524.386398]
[32524.386398] stack backtrace:
[32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G           O 3.14.0-rc2-0.7-default+ #35
[32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[32524.386414]  ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
[32524.386421]  ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
[32524.386428]  000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
[32524.386435] Call Trace:
[32524.386441]  [<ffffffff814f68a2>] dump_stack+0x6a/0x78
[32524.386448]  [<ffffffff810a35fb>] __lock_acquire+0x7ab/0x1940
[32524.386454]  [<ffffffff810a323a>] ? __lock_acquire+0x3ea/0x1940
[32524.386459]  [<ffffffff810a4874>] lock_acquire+0xe4/0x110
[32524.386464]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386471]  [<ffffffff814fc07a>] _raw_spin_lock_nested+0x2a/0x40
[32524.386476]  [<ffffffff814275f4>] ? dev_mc_sync+0x64/0xb0
[32524.386481]  [<ffffffff814275f4>] dev_mc_sync+0x64/0xb0
[32524.386489]  [<ffffffffa0500cab>] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
[32524.386495]  [<ffffffff8141addf>] __dev_set_rx_mode+0x5f/0xb0
[32524.386500]  [<ffffffff8141af8b>] dev_set_rx_mode+0x2b/0x40
[32524.386506]  [<ffffffff8141b3cf>] __dev_open+0xef/0x150
[32524.386511]  [<ffffffff8141b177>] __dev_change_flags+0xa7/0x190
[32524.386516]  [<ffffffff8141b292>] dev_change_flags+0x32/0x80
[32524.386524]  [<ffffffff8149ca56>] devinet_ioctl+0x7d6/0x830
[32524.386532]  [<ffffffff81437b0b>] ? dev_ioctl+0x34b/0x660
[32524.386540]  [<ffffffff814a05b0>] inet_ioctl+0x80/0xa0
[32524.386550]  [<ffffffff8140199d>] sock_do_ioctl+0x2d/0x60
[32524.386558]  [<ffffffff81401a52>] sock_ioctl+0x82/0x2a0
[32524.386568]  [<ffffffff811a7123>] do_vfs_ioctl+0x93/0x590
[32524.386578]  [<ffffffff811b2705>] ? rcu_read_lock_held+0x45/0x50
[32524.386586]  [<ffffffff811b39e5>] ? __fget_light+0x105/0x110
[32524.386594]  [<ffffffff811a76b1>] SyS_ioctl+0x91/0xb0
[32524.386604]  [<ffffffff815057e2>] system_call_fastpath+0x16/0x1b

========================================================================

The reason is that all of the addr_lock_key for vlan dev have the same class,
so if we change the status for vlan dev, the vlan dev and its real dev will
hold the same class of addr_lock_key together, so the warning happened.

we should distinguish the lock depth for vlan dev and its real dev.

v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
could support to add 8 vlan id on a same vlan dev, I think it is enough for current
scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
and the vlan dev would not meet the same class key with its real dev.

The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
dev could get a suitable class key.

v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
and its real dev, but it make no sense, because the difference for subclass in the
lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
to distinguish the different depth for every vlan dev, the same depth of the vlan dev
could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
I think it is enough here, the lockdep should never exceed that value.

v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
and use the depth as the subclass for addr_lock_key.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rdma-for-linus' of git://git./linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:

- mostly cxgb4 fixes unblocked by the merge of some prerequisites via
   the net tree

- drop deprecated MSI-X API use.

- a couple other miscellaneous things.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cxgb4: Fix over-dereference when terminating
  RDMA/cxgb4: Use uninitialized_var()
  RDMA/cxgb4: Add missing debug stats
  RDMA/cxgb4: Initialize reserved fields in a FW work request
  RDMA/cxgb4: Use pr_warn_ratelimited
  RDMA/cxgb4: Max fastreg depth depends on DSGL support
  RDMA/cxgb4: SQ flush fix
  RDMA/cxgb4: rmb() after reading valid gen bit
  RDMA/cxgb4: Endpoint timeout fixes
  RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices
  IB/mlx5: Add block multicast loopback support
  IB/mthca: Use pci_enable_msix_exact() instead of pci_enable_msix()
  IB/qib: Use pci_enable_msix_range() instead of pci_enable_msix()

ARC: Delete stale barrier.h

Commit 93ea02bb8435 ("arch: Clean up asm/barrier.h implementations")
wired generic barrier.h for ARC, but failed to delete the existing file.

In 3.15, due to rcupdate.h updates, this causes a build breakage on ARC:

      CC      arch/arc/kernel/asm-offsets.s
    In file included from include/linux/sched.h:45:0,
                     from arch/arc/kernel/asm-offsets.c:9:
    include/linux/rculist.h: In function __list_add_rcu:
    include/linux/rculist.h:54:2: error: implicit declaration of function smp_store_release [-Werror=implicit-function-declaration]
      rcu_assign_pointer(list_next_rcu(prev), new);
      ^

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'pci-v3.15-fixes-1' of git://git./linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
"These are fixes for a powerpc NULL pointer dereference, an OF
  interrupt mapping issue on some of the new host bridges, and a
  DesignWare iATU issue.

  Host bridge drivers
   - Fix OF interrupt mapping for DesignWare, R-Car, Tegra (Lucas Stach)
   - Fix DesignWare iATU programming (Mohit Kumar)

  Miscellaneous
    - Fix powerpc NULL dereference from list_for_each_entry() update (Mike Qiu)"

* tag 'pci-v3.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: tegra: Use new OF interrupt mapping when possible
  PCI: rcar: Use new OF interrupt mapping when possible
  PCI: designware: Use new OF interrupt mapping when possible
  PCI: designware: Fix iATU programming for cfg1, io and mem viewport
  PCI: designware: Fix comment for setting number of lanes
  powerpc/PCI: Fix NULL dereference in sys_pciconfig_iobase() list traversal

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:
- fix for merge window mismerge in hid-sony, from Frank Praznik
- fix for Surface Type/Touch Cover 2 device, from Benjamin Tissoires
- quirk for ThinkPad Helix sensor hub from Stephen Chandler Paul

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: core: do not scan constant input report
  Revert "HID: microsoft: Add ID's for Surface Type/Touch Cover 2"
  HID: sensor-hub: add sensor hub quirk for ThinkPad Helix
  HID: sony: Fix cancel_work_sync mismerge

Merge tag 'sound-3.15-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a copule of HD-audio device/codec-specific quirks, and a trivial
  replacement of udelay() with mdelay() in the old es18xx driver code.
  All should be safe to apply"

* tag 'sound-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Add headset Mic support for Dell machine
  ALSA: hda - add headset mic detect quirk for a Dell laptop
  ALSA: es18xx driver should use udelay error
  ALSA: hda/realtek - Add support of ALC288 codec

Merge tag 'dt-fixes-for-3.15' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:
- fix error handling in of_update_property
- fix section mismatch warnings in __reserved_mem_check_root
- add empty of_find_node_by_path for !OF builds
- add various missing binding documentation

* tag 'dt-fixes-for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: add empty of_find_node_by_path() for !OF
  of: Clean up of_update_property
  DT: add vendor prefix for EBV Elektronik
  of: Fix the section mismatch warnings.
  of: Add vendor prefix for Digi International Inc.
  DT: I2C: Add trivial bindings used by kirkwood boards
  DT: Vendor: Add prefixes used by Kirkwood devices
  DT: bindings: add missing Marvell Kirkwood SoC documentation
  dt-bindings: add vendor-prefix for Newhaven Display
  of: add vendor prefix for I2SE GmbH
  of: add vendor prefix for ISEE 2007 S.L.

Merge tag 'xfs-for-linus-3.15-rc2' of git://oss.sgi.com/xfs/xfs

Pull xfs bug fixes from Dave Chinner:
"The fixes are for data corruption issues, memory corruption and
  regressions for changes merged in -rc1.

  Data corruption fixes:
   - fix a bunch of delayed allocation state mismatches
   - fix collapse/zero range bugs
   - fix a direct IO block mapping bug @ EOF

  Other fixes:
   - fix a use after free on metadata IO error
   - fix a use after free on IO error during unmount
   - fix an incorrect error sign on direct IO write errors
   - add missing O_TMPFILE inode security context initialisation"

* tag 'xfs-for-linus-3.15-rc2' of git://oss.sgi.com/xfs/xfs:
  xfs: fix tmpfile/selinux deadlock and initialize security
  xfs: fix buffer use after free on IO error
  xfs: wrong error sign conversion during failed DIO writes
  xfs: unmount does not wait for shutdown during unmount
  xfs: collapse range is delalloc challenged
  xfs: don't map ranges that span EOF for direct IO
  xfs: zeroing space needs to punch delalloc blocks
  xfs: xfs_vm_write_end truncates too much on failure
  xfs: write failure beyond EOF truncates too much data
  xfs: kill buffers over failed write ranges properly

Merge tag 'trace-fixes-v3.15-rc1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"This contains two fixes.

  The first is to remove a duplication of creating debugfs files that
  already exist and causes an error report to be printed due to the
  failure of the second creation.

  The second is a memory leak fix that was introduced in 3.14"

* tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/uprobes: Fix uprobe_cpu_buffer memory leak
  tracing: Do not try to recreated toplevel set_ftrace_* files

of: add empty of_find_node_by_path() for !OF

Add an empty version of of_find_node_by_path().
This fixes following build error for asoc tree:
sound/soc/fsl/fsl_ssi.c: In function 'fsl_ssi_probe':
sound/soc/fsl/fsl_ssi.c:1471:2: error: implicit declaration of function 'of_find_node_by_path' [-Werror=implicit-function-declaration]
sprop = of_get_property(of_find_node_by_path("/"), "compatible", NULL);

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Rob Herring <robh@kernel.org>

perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU

CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

drm: bochs: drop unused struct fields

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: bochs: add power management support

bochs kms driver lacks power management support, thus
the vga display doesn't work any more after S3 resume.

Fix this by adding suspend and resume functions.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: cirrus: add power management support

cirrus kms driver lacks power management support, thus
the vga display doesn't work any more after S3 resume.

Fix this by adding suspend and resume functions.
Also make the mode_set function unblank the screen.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm: Split out drm_probe_helper.c from drm_crtc_helper.c

This is leftover stuff from my previous doc round which I kinda wanted
to do but didn't yet due to rebase hell.

The modeset helpers and the probing helpers a independent and e.g.
i915 uses the probing stuff but has its own modeset infrastructure. It
hence makes to split this up. While at it add a DOC: comment for the
probing libraray.

It would be rather neat to pull some of the DocBook documenting these
two helpers into in-line DOC: comments. But unfortunately kerneldoc
doesn't support markdown or something similar to make nice-looking
documentation, so the current state is better.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/plane-helper: Don't fake-implement primary plane disabling

After thinking about this topic a bit more I've reached the conclusion
that implementing this doesn't make sense:

- The locking is all wrong: set_config(NULL) will also unlink encoders
  and connectors, but those links are protected with the mode_config
  mutex. In the ->disable_plane callback we only hold all modeset
  locks, but eventually we want to switch to just grabbing the
  per-crtc (and maybe per-plane) locks as needed, maybe based on
  ww_mutexes. Having a callback which absolutely needs all modeset
  locks is bad for this conversion.

  Note that the same isn't true for the provided ->update_plane since
  we've audited the crtc helpers to make sure that not encoder or
  connector links are changed.

- There's no way to re-enable the plane with an ->update_plane: The
  connectors/encoder links are lost and so we can't re-enable the
  CRTC. Even without that issue the driver might have reassigned some
  shared resources (as opposed to e.g. DPMS off, where drivers are not
  allowed to do that to make sure the CRTC can be enabled again).

- The semantics don't make much sense: Userspace asked to scan out
  black (or some other color if the driver supports a background
  color), not that the screen be disabled.

- Implementing proper primary plane support (i.e. actually disabling
  the primary plane without disabling the CRTC) is really simple, at
  least if all the hw needs is flipping a bit. The big task is
  auditing all the interactions with other ioctls when the CRTC is on
  but there's no primary plane (e.g. pageflips). And some of that work
  still needs to be done.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/ast: fix value check in cbr_scan2

this is a typo vs the ums driver, fix to check correct value.

Found initially by Coverity.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/nouveau/bios: fix a bit shift error introduced by 457e77b

Commit 457e77b26428ab4a24998eecfb99f27fa4195397 added two checks applied to a
value received from nv_rd32(bios, 0x619f04). But after this new piece of code
is executed, the addr local variable does not hold the same value it used to
hold before the commit. Here is what is was assigned in the original code:
(u64)(nv_rd32(bios, 0x619f04) & 0xffffff00) << 8
in the committed code it ends up with this value:
(u64)(nv_rd32(bios, 0x619f04) >> 8) << 8
These expressions are obviously not equivalent.

My Nvidia video card does not show anything on the display when I boot a
kernel containing this commit.

The patch fixes the code so that the new checks are still done, but the
side effect of an incorrect addr value is gone.

Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'omapdrm-fixes-3.15' of git://git./linux/kernel/git/tomba/linux into drm-next

Fixes for omapdrm, some of which were already present in 3.14, and some which
appeared in 3.15-rc1:

- fixes for primary-plane handling which caused crashes
- fix all kinds of uninit issues which prevented from unloading the omapdrm
  module.
- fixes for HDMI enable/disable issues

* tag 'omapdrm-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  drm/omap: fix the handling of fb ref counts
  drm/omap: protect omap_crtc's event with event_lock spinlock
  drm/omap: Use old_fb to synchronize between successive page flips
  drm/omap: Fix crash when using LCD3 overlay manager
  drm/omap: gem sync: wait on correct events
  drm/omap: Fix memory leak in omap_gem_op_async
  drm/omap: remove warn from debugfs
  drm/omap: remove extra plane->destroy from crtc destroy
  drm/omap: print warning when rotating non-TILER fb
  drm/omap: fix missing unref to fb's buf object
  drm/omap: fix plane rotation
  drm/omap: fix enabling/disabling of video pipeline
  drm/omap: fix missing disable for unused encoder
  drm/omap: fix race issue when unloading omapdrm
  drm/omap: fix DMM driver (un)registration
  drm/omap: fix uninit order in pdev_remove()
  drm/omap: fix output enable/disable sequence

Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-next

1. Fixing PLL regressions
2. A couple of memory reclocking and DPM fixes
3. Small cleanups

* 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
  drm/radeon/ci: make sure mc ucode is loaded before checking the size
  drm/radeon/si: make sure mc ucode is loaded before checking the size
  drm/radeon: improve PLL params if we don't match exactly v2
  drm/radeon: memory leak on bo reservation failure. v2
  drm/radeon: fix VCE fence command
  drm/radeon: re-enable mclk dpm on R7 260X asics
  drm/radeon: add support for newer mc ucode on CI (v2)
  drm/radeon: add support for newer mc ucode on SI (v2)
  drm/radeon: apply more strict limits for PLL params v2
  drm/radeon: update CI DPM powertune settings
  drm/radeon: fix runpm handling on APUs (v4)
  drm/radeon: disable mclk dpm on R7 260X

Merge tag 'drm/tegra/for-3.15-rc2' of git://anongit.freedesktop.org/tegra/linux into drm-next

drm/tegra: Fixes for v3.15-rc2

This contains a fix for the host1x driver writing to non-existent syncpt
registers.

A second commit removes an excess pad field in the parameter structure
for the DRM_TEGRA_SUBMIT IOCTL. Archeaology on earlier versions of this
file indicates that this was once there to pad an uneven number of u32
u32 fields, of which one was subsequently removed. Unfortunately nobody
remembered to get rid of the padding when that happened.

Both of these commits are Cc: stable because they fix issues that were
introduced back in v3.10.

* tag 'drm/tegra/for-3.15-rc2' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: Remove gratuitous pad field
gpu: host1x: handle the correct # of syncpt regs

Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"Viresh unearthed the following three hickups in the timer/timekeeping
  code:

   - Negated check for the result of a clock event selection

   - A missing early exit in the jiffies update path which causes
     update_wall_time to be called for nothing causing lock contention
     and wasted cycles in the timer interrupt

   - Checking a variable in the NOHZ code enable code for true which can
     only be set by that very code after the check succeeds.  That
     results in a rock solid runtime disablement of that feature"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
  tick-sched: Don't call update_wall_time() when delta is lesser than tick_period
  tick-common: Fix wrong check in tick_check_replacement()

Merge branch 'parisc-3.15' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:
"There are two major changes in this patchset:

  The major fix is that the epoll_pwait() syscall for 32bit userspace
  was not using the compat wrapper on a 64bit kernel.

  Secondly we changed the value of SHMLBA from 4MB to PAGE_SIZE to
  reflect that we can actually mmap to any multiple of PAGE_SIZE.  The
  only thing which needs care is that shared mmaps need to be mapped at
  the same offset inside the 4MB cache window"

* 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix epoll_pwait syscall on compat kernel
  parisc: change value of SHMLBA from 0x00400000 to PAGE_SIZE
  parisc: Replace __get_cpu_var uses for address calculation

Merge branch 'ipmi' (emailed ipmi fixes)

Merge ipmi fixes from Corey Minyard:
"Things collected since last kernel release.

  Some of these are pretty important.  The first three are bug fixes.
  The next two are to hopefully make everyone happy about allowing
  ACPI to be on all the time and not have IPMI have an effect on the
  system when not in use.  The last is a little cleanup"

* emailed patches from Corey Minyard <cminyard@mvista.com>:
  ipmi: boolify some things
  ipmi: Turn off all activity on an idle ipmi interface
  ipmi: Turn off default probing of interfaces
  ipmi: Reset the KCS timeout when starting error recovery
  ipmi: Fix a race restarting the timer
  Char: ipmi_bt_sm, fix infinite loop

ipmi: boolify some things

Convert some ints to bools.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipmi: Turn off all activity on an idle ipmi interface

The IPMI driver would wake up periodically looking for events and
watchdog pretimeouts.  If there is nothing waiting for these events,
it's really kind of pointless to be checking for them.  So modify the
driver so the message handler can pass down if it needs the lower layer
to be waiting for these.  Modify the system interface lower layer to
turn off all timer and thread activity if the upper layer doesn't need
anything and it is not currently handling messages.  And modify the
message handler to not restart the timer if its timer is not needed.

The timers and kthread will still be enabled if:
- the SI interface is handling a message.
- a user has enabled watching for events.
- the IPMI watchdog timer is in use (since it uses pretimeouts).
- the message handler is waiting on a remote response.
- a user has registered to receive commands.

This mostly affects interfaces without interrupts.  Interfaces with
interrupts already don't use CPU in the system interface when the
interface is idle.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipmi: Turn off default probing of interfaces

The default probing can cause problems with some system, slow booting,
extra CPU usages, etc. Turn it off by default and give a config option
to enable it.

From: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipmi: Reset the KCS timeout when starting error recovery

The OBF timer in KCS was not reset in one situation when error recovery
was started, resulting in an immediate timeout.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipmi: Fix a race restarting the timer

With recent changes it is possible for the timer handler to detect an
idle interface and not start the timer, but the thread to start an
operation at the same time.  The thread will not start the timer in that
instance, resulting in the timer not running.

Instead, move all timer operations under the lock and start the timer in
the thread if it detect non-idle and the timer is not already running.
Moving under locks allows the last timeout to be set in both the thread
and the timer.  'Timer is not running' means that the timer is not
pending and smi_timeout() is not running.  So we need a flag to detect
this correctly.

Also fix a few other timeout bugs: setting the last timeout when the
interrupt has to be disabled and the timer started, and setting the last
timeout in check_start_timer_thread possibly racing with the timer

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Char: ipmi_bt_sm, fix infinite loop

In read_all_bytes, we do

  unsigned char i;
  ...
  bt->read_data[0] = BMC2HOST;
  bt->read_count = bt->read_data[0];
  ...
  for (i = 1; i <= bt->read_count; i++)
    bt->read_data[i] = BMC2HOST;

If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the
'for' loop.  Make 'i' an 'int' instead of 'char' to get rid of the
overflow and finish the loop after 255 iterations every time.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-debugged-by: Rui Hui Dian <rhdian@novell.com>
Cc: Tomas Cech <tcech@suse.cz>
Cc: Corey Minyard <minyard@acm.org>
Cc: <openipmi-developer@lists.sourceforge.net>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>