git.openwrt.org Git - openwrt/staging/blogic.git/log

powerpc/pseries: Update NUMA VDSO information when updating CPU maps

The following patch adds vdso_getcpu_init(), which stores the NUMA node for
a cpu in SPRG3:

Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds
vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3.

This patch ensures that this information is also updated when the NUMA
affinity of a cpu changes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Use stop machine to update cpu maps

The new PRRN firmware feature allows CPU and memory resources to be
transparently reassigned across NUMA boundaries. When this happens, the
kernel must update the node maps to reflect the new affinity information.

Although the NUMA maps can be protected by locking primitives during the
update itself, this is insufficient to prevent concurrent accesses to these
structures. Since cpumask_of_node() hands out a pointer to these
structures, they can still be modified outside of the lock. Furthermore,
tracking down each usage of these pointers and adding locks would be quite
invasive and difficult to maintain.

The approach used is to make a list of affected cpus and call stop_machine
to have the update routine run on each of the affected cpus allowing them
to update themselves. Each cpu finds itself in the list of cpus and makes
the appropriate updates. We need to have each cpu do this for themselves to
handle calls to vdso_getcpu_init() added in a subsequent patch.

Situations like these are best handled using stop_machine(). Since the NUMA
affinity updates are exceptionally rare events, this approach has the
benefit of not adding any overhead while accessing the NUMA maps during
normal operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Update CPU maps when device tree is updated

Platform events such as partition migration or the new PRRN firmware
feature can cause the NUMA characteristics of a CPU to change, and these
changes will be reflected in the device tree nodes for the affected
CPUs.

This patch registers a handler for Open Firmware device tree updates
and reconfigures the CPU and node maps whenever the associativity
changes. Currently, this is accomplished by marking the affected CPUs in
the cpu_associativity_changes_mask and allowing
arch_update_cpu_topology() to retrieve the new associativity information
using hcall_vphn().

Protecting the NUMA cpu maps from concurrent access during an update
operation will be addressed in a subsequent patch in this series.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Update numa.c to use updated firmware_has_feature()

Update the numa code to use the updated firmware_has_feature() when checking
for type 1 affinity.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Update firmware_has_feature() to check architecture vector 5 bits

The firmware_has_feature() function makes it easy to check for supported
features of the hypervisor. This patch extends the capability of
firmware_has_feature() to include checking for specified bits
in vector 5 of the architecture vector as reported in the device tree.

As part of this the #defines used for the architecture vector are re-defined
such that each option has the index into vector 5 and the feature bit encoded
into it. This makes checking for architecture bits when initiating data
for firmware_has_feature much easier.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Use ARRAY_SIZE to iterate over firmware_features_table array

When iterating over the entries in firmware_features_table we only need
to go over the actual number of entries in the array instead of declaring
it to be bigger and checking to make sure there is a valid entry in every
slot.

This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the
array looping with the use of ARRAY_SIZE().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Move architecture vector definitions to prom.h

As part of handling of PRRN events we need to check vector 5 of the
architecture vector bits reported in the device tree to ensure PRRN event
handling is enabled. To do this firmware_has_feature() is updated (in a
subsequent patch) to make this check vector 5 bits. To avoid having to
re-define bits in the architecture vector the bit definitions are moved
to prom.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Add PRRN RTAS event handler

A PRRN event is signaled via the RTAS event-scan mechanism, which
returns a Hot Plug Event message "fixed part" indicating "Platform
Resource Reassignment". In response to the Hot Plug Event message,
we must call ibm,update-nodes to determine which resources were
reassigned and then ibm,update-properties to obtain the new affinity
information about those resources.

The PRRN event-scan RTAS message contains only the "fixed part" with
the "Type" field set to the value 160 and no Extended Event Log. The
four-byte Extended Event Log Length field is re-purposed (since no
Extended Event Log message is included) to pass the "scope" parameter
that causes the ibm,update-nodes to return the nodes affected by the
specific resource reassignment.

This patch adds a handler for RTAS events. The function
pseries_devicetree_update() (from mobility.c) is used to make the
ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
(handled by a subsequent patch) will require significant processing,
so pseries_devicetree_update() is called from an asynchronous workqueue
to allow event processing to continue.

PRRN RTAS events on pseries systems are rare events that have to be
initiated from the HMC console for the system by an IBM tech. This allows
us to assume that these events are widely spaced. Additionally, all work
on the queue is flushed before handling any new work to ensure we only have
one event in flight being handled at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Correct buffer parsing in update_dt_node()

Correct parsing of the buffer returned from ibm,update-properties. The first
element is a length and the path to the property which is slightly different
from the list of properties in the buffer so we need to specifically
handle this.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pseries: Expose pseries devicetree_update()

Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

When rtasd receives a PRRN event, it needs to make a series of RTAS
calls (ibm,update-nodes and ibm,update-properties) to retrieve the
updated device tree information. These calls are already handled in the
pseries_devicetree_update() routine used in partition migration.

This patch exposes pseries_devicetree_update() to make it accessible
to other pseries routines, this patch also updates pseries_devicetree_update()
to take a 32-bit scope parameter. The scope value, which was previously hard
coded to 1 for partition migration, is used for the RTAS calls
ibm,update-nodes/properties to update the device tree.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Fix hardware IRQs with MMU on exceptions when HV=0

POWER8 allows us to take interrupts with the MMU on. This gives us a
second set of vectors offset at 0x4000.

Unfortunately when coping these vectors we missed checking for MSR HV
for hardware interrupts (0x500). This results in us trying to use
HSRR0/1 when HV=0, rather than SRR0/1 on HW IRQs

The below fixes this to check CPU_FTR_HVMODE when patching the code at
0x4500.

Also we remove the check for CPU_FTR_ARCH_206 since relocation on IRQs
are only available in arch 2.07 and beyond.

Thanks to benh for helping find this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/power8: Fix secondary CPUs hanging on boot for HV=0

In __restore_cpu_power8 we determine if we are HV and if not, we return
before setting HV only resources.

Unfortunately we forgot to restore the link register from r11 before
returning.

This will happen on boot and with secondary CPUs not coming online.

This adds the missing link register restore.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Add isync to copy_and_flush

In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush.  After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.

Unfortunately there's no isync between the copy of this code and the
jump to it.  Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.

We've seen this on real machines and it's results in no console output
after:
  calling quiesce...
  returning from prom_init

The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Add HWCAP2 aux entry

We are currently out of free bits in AT_HWCAP. With POWER8, we have
several hardware features that we need to advertise.

Tested on POWER and x86.

Signed-off-by: Michael Neuling <michael@neuling.org>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Fix missing Kconfig dependency for MSIs

We need PPC_MSI_BITMAP support

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merge remote-tracking branch 'origin/master' into next

Merge upstream to get the audit fixes

powerpc/spufs: Initialise inode->i_ino in spufs_new_inode()

In commit 85fe402 (fs: do not assign default i_ino in new_inode), the
initialisation of i_ino was removed from new_inode() and pushed down
into the callers. However spufs_new_inode() was not updated.

This exhibits as no files appearing in /spu, because all our dirents
have a zero inode, which readdir() seems to dislike.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/rtas_flash: New return code to indicate FW entitlement expiry

Add new return code to rtas_flash to indicate firmware entitlement
expiry. Strictly we don't need this update. But to keep it in sync
with PAPR, this was added.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/rtas_flash: Update return token comments

Add proper comment to ibm,validate-flash-image RTAS call
update result tokens.

Note: Only comment section is modified, no code change.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/cell: Only iterate over online nodes in cbe_init_pm_irq()

None of the cell platforms support CPU hotplug, so we should iterate
only over online nodes when setting PMU interrupts.

This also fixes a warning during boot when NODES_SHIFT is large enough:

WARNING: at /scratch/michael/src/kmk/linus/kernel/irq/irqdomain.c:766
...
NIP [c0000000000db278] .irq_linear_revmap+0x30/0x58
LR [c0000000000dc2a0] .irq_create_mapping+0x38/0x1a8
Call Trace:
[c0000003fc9c3af0] [c0000000000dc2a0] .irq_create_mapping+0x38/0x1a8 (unreliable)
[c0000003fc9c3b80] [c000000000655c1c] .__machine_initcall_cell_cbe_init_pm_irq+0x84/0x158
[c0000003fc9c3c20] [c00000000000afb4] .do_one_initcall+0x5c/0x1e0
[c0000003fc9c3cd0] [c000000000644580] .kernel_init_freeable+0x238/0x328
[c0000003fc9c3db0] [c00000000000b784] .kernel_init+0x1c/0x120
[c0000003fc9c3e30] [c000000000009fb8] .ret_from_kernel_thread+0x64/0xac

This is caused by us overflowing our linear revmap because we're
requesting too many interrupts.

Reported-by: Dennis Schridde <devurandom@gmx.net>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merge remote-tracking branch 'mpe/master' into next

Merge the tentative powerpc-next branch maintained by Michael
while I was on vacation

powerpc/pseries/lparcfg: Fix possible overflow are more than 1026

need set '\0' for 'local_buffer'.

SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of
rtas_data_buf may truncated in memcpy.

if contents are really truncated.
the splpar_strlen is more than 1026. the next while loop checking will
not find the end of buffer. that will cause memory access violation.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pmac/smu: Use %*ph to print small buffers

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ptrace/powerpc: Don't flush_ptrace_hw_breakpoint() on fork()

arch_dup_task_struct() does flush_ptrace_hw_breakpoint(src), this
destroys the parent's breakpoints for no reason. We should clear
child->thread.ptrace_bps[] copied by dup_task_struct() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Add VDSO version of time

On 04/18/2013 07:38 PM, Anton Blanchard wrote:
> Since you are only reading one long you shouldn't need to check the
> update count and loop, you will always see a consistent value. The
> system call version of time() just does an unprotected load for example.

Fixed.

> With the above change and with Michael's comments covered (decent
> changelog entry and Signed-off-by):
>
> Acked-by: Anton Blanchard <anton@samba.org>

Thanks for the review, below the updated patch:

From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>

This patch implement the time syscall as vDSO. The performance speedups
are:

Baseline PPC32: 380 nsec
Baseline PPC64: 350 nsec
vdso PPC32: 20 nsec
vsdo PPC64: 20 nsec

Tested on 64 bit build with both 32 bit and 64 bit userland.

Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Linux 3.9-rc8

Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
"Misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix offcore_rsp valid mask for SNB/IVB
perf: Treat attr.config as u64 in perf_swevent_init()

Merge branch 'vm_ioremap_memory-examples'

I'm going to do an -rc8, so I'm just going to do this rather than delay
it any further. They are arguably stable material anyway.

* vm_ioremap_memory-examples:
  mtdchar: remove no-longer-used vma helpers
  vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper
  vm: convert fb_mmap to vm_iomap_memory() helper
  vm: convert mtdchar mmap to vm_iomap_memory() helper
  vm: convert HPET mmap to vm_iomap_memory() helper

Merge branch 'x86-kdump-for-linus' of git://git./linux/kernel/git/tip/tip

Pull kdump fixes from Peter Anvin:
"The kexec/kdump people have found several problems with the support
  for loading over 4 GiB that was introduced in this merge cycle.  This
  is partly due to a number of design problems inherent in the way the
  various pieces of kdump fit together (it is pretty horrifically manual
  in many places.)

  After a *lot* of iterations this is the patchset that was agreed upon,
  but of course it is now very late in the cycle.  However, because it
  changes both the syntax and semantics of the crashkernel option, it
  would be desirable to avoid a stable release with the broken
  interfaces."

I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow.  But apparently I'm doing an -rc8 instead...

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kexec: use Crash kernel for Crash kernel low
  x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
  x86, kdump: Retore crashkernel= to allocate under 896M
  x86, kdump: Set crashkernel_low automatically

Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Three groups of fixes:

   1. Make sure we don't execute the early microcode patching if family
      < 6, since it would touch MSRs which don't exist on those
      families, causing crashes.

   2. The Xen partial emulation of HyperV can be dealt with more
      gracefully than just disabling the driver.

   3. More EFI variable space magic.  In particular, variables hidden
      from runtime code need to be taken into account too."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Verify the family before dispatching microcode patching
  x86, hyperv: Handle Xen emulation of Hyper-V more gracefully
  x86,efi: Implement efi_no_storage_paranoia parameter
  efi: Export efi_query_variable_store() for efivars.ko
  x86/Kconfig: Make EFI select UCS2_STRING
  efi: Distinguish between "remaining space" and actually used space
  efi: Pass boot services variable info to runtime code
  Move utf16 functions to kernel core and rename
  x86,efi: Check max_size only if it is non-zero.
  x86, efivars: firmware bug workarounds should be in platform code

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"A set of fixes from various people - Will Deacon gets a prize for
  removing code this time around.  The biggest fix in this lot is
  sorting out the ARM740T mess.  The rest are relatively small fixes."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
  ARM: 7698/1: perf: fix group validation when using enable_on_exec
  ARM: 7697/1: hw_breakpoint: do not use __cpuinitdata for dbg_cpu_pm_nb
  ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon
  ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
  ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE
  ARM: modules: don't export cpu_set_pte_ext when !MMU
  ARM: mm: remove broken condition check for v4 flushing
  ARM: mm: fix numerous hideous errors in proc-arm740.S
  ARM: cache: remove ARMv3 support code
  ARM: tlbflush: remove ARMv3 support

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

1) Fix race in sparc64 TLB shootdowns, we have to synchronize with the
    sibling cpus completing if we are passing them a reference via
    pointer to a data structure.

2) Fix cleaning of bitmaps in sparc32, from Akinobu Mita.

3) Fix various sparc header mistakes, some of which resulted in
    userland build breakage.  From Sam Ravnborg.

4) Kill ghost declarations and defines missed when several bits of code
    got deleted recently.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix race in TLB batch processing.
  sparc: use asm-generic version of types.h
  bbc_i2c: fix section mismatch warning
  sparc: use generic headers
  sparc:cleanup unused code in smp_32.h
  sparc/iommu: fix typo s/265KB/256KB/
  sparc/srmmu: clear trailing edge of bitmap properly
  sparc:remove unused declaration smp_boot_cpus()

Merge git://git./linux/kernel/git/davem/net

Pull networking updates from David Miller:

1) ax88796 does 64-bit divides which causes link errors on ARM, fix
    from Arnd Bergmann.

2) Once an improper offload setting is detected on an SKB we don't rate
    limit the log message so we can very easily live lock.  From Ben
    Greear.

3) Openvswitch cannot report vport configuration changes reliably
    because it didn't preallocate the netlink notification message
    before changing state.  From Jesse Gross.

4) The effective UID/GID SCM credentials fix, from Linus.

5) When a user explicitly asks for wireless authentication, cfg80211
    isn't told about the AP detachment leaving inconsistent state.  Fix
    from Johannes Berg.

6) Fix self-MAC checks in batman-adv on multi-mesh nodes, from Antonio
    Quartulli.

7) Revert build_skb() change sin IGB driver, can result in memory
    corruption.  From Alexander Duyck.

8) Fix setting VLANs on virtual functions in IXGBE, from Greg Rose.

9) Fix TSO races in qlcnic driver, from Sritej Velaga.

10) In bnx2x the kernel driver and UNDI firmware can try to program the
    chip at the same time, resulting in corruption.  Add proper
    synchronization.  From Dmitry Kravkov.

11) Fix corruption of status block in firmware ram in bxn2x, from Ariel
    Elior.

12) Fix load balancing hash regression of bonding driver in forwarding
    configurations, from Eric Dumazet.

13) Fix TS ECR regression in TCP by calling tcp_replace_ts_recent() in
    all the right spots, from Eric Dumazet.

14) Fix several bonding bugs having to do with address manintainence,
    including not removing address when configuration operations
    encounter errors, missed locking on the address lists, missing
    refcounting on VLAN objects, etc.  All from Nikolay Aleksandrov.

15) Add workarounds for firmware bugs in LTE qmi_wwan devices, wherein
    the devices fail to add a proper ethernet header while on LTE
    networks but otherwise properly do so on 2G and 3G ones.  From Bjørn
    Mork.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net: fix incorrect credentials passing
  net: rate-limit warn-bad-offload splats.
  net: ax88796: avoid 64 bit arithmetic
  qlge: Update version to 1.00.00.32.
  qlge: Fix ethtool autoneg advertising.
  qlge: Fix receive path to drop error frames
  net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)
  net: qmi_wwan: fixup destination address (firmware bug workaround)
  net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)
  bonding: in bond_mc_swap() bond's mc addr list is walked without lock
  bonding: disable netpoll on enslave failure
  bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
  bonding: vlans don't get deleted on enslave failure
  bonding: mc addresses don't get deleted on enslave failure
  pkt_sched: fix error return code in fw_change_attrs()
  irda: small read past the end of array in debug code
  tcp: call tcp_replace_ts_recent() from tcp_ack()
  netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too
  netfilter: ipset: bitmap:ip,mac: fix listing with timeout
  bonding: fix l23 and l34 load balancing in forwarding path
  ...

net: fix incorrect credentials passing

Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge remote-tracking branch 'efi/urgent' into x86/urgent

Matt Fleming (1):
      x86, efivars: firmware bug workarounds should be in platform
      code

Matthew Garrett (3):
      Move utf16 functions to kernel core and rename
      efi: Pass boot services variable info to runtime code
      efi: Distinguish between "remaining space" and actually used
      space

Richard Weinberger (2):
      x86,efi: Check max_size only if it is non-zero.
      x86,efi: Implement efi_no_storage_paranoia parameter

Sergey Vlasov (2):
      x86/Kconfig: Make EFI select UCS2_STRING
      efi: Export efi_query_variable_store() for efivars.ko

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

x86, microcode: Verify the family before dispatching microcode patching

For each CPU vendor that implements CPU microcode patching, there will
be a minimum family for which this is implemented. Verify this
minimum level of support.

This can be done in the dispatch function or early in the application
functions. Doing the latter turned out to be somewhat awkward because
of the ineviable split between the BSP and the AP paths, and rather
than pushing deep into the application functions, do this in
the dispatch function.

Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie

net: rate-limit warn-bad-offload splats.

If one does do something unfortunate and allow a
bad offload bug into the kernel, this the
skb_warn_bad_offload can effectively live-lock the
system, filling the logs with the same error over
and over.

Add rate limitation to this so that box remains otherwise
functional in this case.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ax88796: avoid 64 bit arithmetic

When building ax88796 on an ARM platform with 64-bit resource_size_t,
we currently get

drivers/net/ethernet/8390/ax88796.c:875: undefined reference to `__aeabi_uldivmod'

because we do a division on the length of the MMIO resource.
Since we know that this resource is very short, using an
"unsigned long" instead of "resource_size_t" is entirely
sufficient, and avoids this link-time error.

Cc: Ben Dooks <ben-linux@fluff.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Update version to 1.00.00.32.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix ethtool autoneg advertising.

Autoneg is supported on specific port types only. Fix the driver to advertise
autoneg based on the port type.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix receive path to drop error frames

o Fix the driver to drop error frames in the receive path
o Update error counter which was not getting incremented

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qmi_wwan'

Bjørn Mork says:

====================
This series adds workarounds for 3 different firmware bugs, each
preventing the affected devices from working at all. I therefore
humbly request that these fixes go to stable-3.8 (if still
maintained) and 3.9 (either via net if still possible, or via
stable if not).

All 3 workarounds are applied to all devices supported by the driver.
Adding quirks for specific devices was considered as an alternative,
but was rejected because we have too little information about the
exact distribution of the buggy firmwares. All we know is that the
same bug shows up in devices from at least 3 different, and presumably
independent, vendors.

The workarounds have instead been designed to automatically apply
when necessary, and to have as little impact as possible on unaffected
devices. The series has been tested on a number of devices both with
and without these bugs.

The series should apply cleanly to net/master, net-next/master and
stable/linux-3.8.y
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)

We normally trust and use the CDC functional descriptors provided by a
number of devices. But some of these will erroneously list the address
reserved for the device end of the link. Attempting to use this on
both the device and host side will naturally not work.

Work around this bug by ignoring the functional descriptor and assign a
random address instead in this case.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: fixup destination address (firmware bug workaround)

Received packets are sometimes addressed to 00:a0:c6:00:00:00
instead of the address the device firmware should have learned
from the host:

321.224126 77.16.85.204 -> 148.122.171.134 ICMP 98 Echo (ping) request  id=0x4025, seq=64/16384, ttl=64

0000  82 c0 82 c9 f1 67 82 c0 82 c9 f1 67 08 00 45 00   .....g.....g..E.
0010  00 54 00 00 40 00 40 01 57 cc 4d 10 55 cc 94 7a   .T..@.@.W.M.U..z
0020  ab 86 08 00 62 fc 40 25 00 40 b2 bc 6e 51 00 00   ....b.@%.@..nQ..
0030  00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15   ..k.............
0040  16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25   .......... !"#$%
0050  26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35   &'()*+,-./012345
0060  36 37                                             67

321.240607 148.122.171.134 -> 77.16.85.204 ICMP 98 Echo (ping) reply    id=0x4025, seq=64/16384, ttl=55

0000  00 a0 c6 00 00 00 02 50 f3 00 00 00 08 00 45 00   .......P......E.
0010  00 54 00 56 00 00 37 01 a0 76 94 7a ab 86 4d 10   .T.V..7..v.z..M.
0020  55 cc 00 00 6a fc 40 25 00 40 b2 bc 6e 51 00 00   U...j.@%.@..nQ..
0030  00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15   ..k.............
0040  16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25   .......... !"#$%
0050  26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35   &'()*+,-./012345
0060  36 37                                             67

The bogus address is always the same, and matches the address
suggested by many devices as a default address.  It is likely a
hardcoded firmware default.

The circumstances where this bug has been observed indicates that
the trigger is related to timing or some other factor the host
cannot control. Repeating the exact same configuration sequence
that caused it to trigger once, will not necessarily cause it to
trigger the next time. Reproducing the bug is therefore difficult.
This opens up a possibility that the bug is more common than we can
confirm, because affected devices often will work properly again
after a reset.  A procedure most users are likely to try out before
reporting a bug.

Unconditionally rewriting the destination address if the first digit
of the received packet is 0, is considered an acceptable compromise
since we already have to inspect this digit.  The simplification will
cause unnecessary rewrites if the real address starts with 0, but this
is still better than adding additional tests for this particular case.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)

A number of LTE devices from different vendors all suffer from the
same firmware bug: Most of the packets received from the device while
it is attached to a LTE network will not have an ethernet header. The
devices work as expected when attached to 2G or 3G networks, sending
an ethernet header with all packets.

This driver is not aware of which network the modem attached to, and
even if it were there are still some packet types which are always
received with the header intact.

All devices supported by this driver have severely limited
networking capabilities:
- can only transmit IPv4, IPv6 and possibly ARP
- can only support a single host hardware address at any time
- will only do point-to-point communcation with the host

Because of this, we are able to reliably identify any bogus raw IP
packets by simply looking at the 4 IP version bits.  All we need to
do is to avoid 4 or 6 in the first digit of the mac address.  This
workaround ensures this, and fix up the received packets as necessary.

Given the distribution of the bug, it is believed that the source is
the chipset vendor.  The devices which are verified to be affected are:
Huawei E392u-12 (Qualcomm MDM9200)
Pantech UML290  (Qualcomm MDM9600)
Novatel USB551L (Qualcomm MDM9600)
Novatel E362    (Qualcomm MDM9600)

It is believed that the bug depend on firmware revision, which means
that possibly all devices based on the above mentioned chipset may be
affected if we consider all available firmware revisions.

The information about affected devices and versions is likely
incomplete.  As the additional overhead for packets not needing this
fixup is very small, it is considered acceptable to apply the
workaround to all devices handled by this driver.

Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding'

Nikolay Aleksandrov says:

====================
This patch-set fixes mainly bugs on enslave failure and one occasion
of a needed locking. The patches are:

1. On enslave failure mc addresses are not flushed from the slave
2. On enslave failure vlans are not cleaned up from the slave
3. On enslave failure the bond's primary and curr_active_slave
   are not cleaned up (which might result in use of freed memory)
4. On enslave failure netpoll is not disabled which might result in
   a memory leak
5. In bond_mc_swap() the bond's mc addr list is walked without
   netif_addr_lock, since it can be called without rtnl, add it

v2: patch 01 - fix log message and remove unnecessary code move
====================

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: in bond_mc_swap() bond's mc addr list is walked without lock

Use netif_addr_lock_bh() to acquire the appropriate lock before walking.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: disable netpoll on enslave failure

slave_disable_netpoll() is not called upon enslave failure which would
lead to a memory leak. Call slave_disable_netpoll() after err_detach as
that's the first error path after enabling netpoll on that slave.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: primary_slave & curr_active_slave are not cleaned on enslave failure

On enslave failure primary_slave can point to new_slave which is to be
freed, and the same applies to curr_active_slave. So check if this is
the case and clean up properly after err_detach because that's the first
error code path after they're set.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: vlans don't get deleted on enslave failure

The main problem is with vid refcount which only gets bumped up.
Delete the vlans after err_detach as that's the first error path
after the vlans are added.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: mc addresses don't get deleted on enslave failure

Add bond_mc_list_flush() after err_detach as that's the first error path
after the addresses are added. The main issue is the mc addresses' refcount
which only gets bumped up.

v2: update log message and don't move code unnecessarily

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: fix error return code in fw_change_attrs()

Fix to return -EINVAL when tb[TCA_FW_MASK] is set and head->mask != 0xFFFFFFFF
instead of 0 (ifdef CONFIG_NET_CLS_IND and tb[TCA_FW_INDEV]), as done elsewhere
in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: small read past the end of array in debug code

The "reason" can come from skb->data[] and it hasn't been capped so it
can be from 0-255 instead of just 0-6. For example in irlmp_state_dtr()
the code does:

reason = skb->data[3];
...
irlmp_disconnect_indication(self, reason, skb);

Also LMREASON has a couple other values which don't have entries in the
irlmp_reasons[] array. And 0xff is a valid reason as well which means
"unknown".

So far as I can see we don't actually care about "reason" except for in
the debug code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: Fix race in TLB batch processing.

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

a) Implement __flush_tlb_page and per-cpu variants, patch
   as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>

ARM: 7699/1: sched_clock: Add more notrace to prevent recursion

cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns()
is called by cyc_to_sched_clock(). I suspect that some compilers
inline both of these functions into sched_clock() and so we've
been getting away without having a notrace marking. It seems that
my compiler isn't inlining cyc_to_sched_clock() though, so I'm
hitting a recursion bug when I enable the function graph tracer,
causing my system to crash. Marking these functions notrace fixes
it. Technically cyc_to_ns() doesn't need the notrace because it's
already marked inline, but let's just add it so that if we ever
remove inline from that function it doesn't blow up.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Only one remaining fix for arm-soc platforms at this time, a small
  bugfix for cpu hotplug on highbank platforms that has become much
  easier to hit as of late.

  Details in the patch description, but it's small and well-contained
  and definitely impacts users of the platform, so 3.9 seems
  appropriate."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: highbank: fix cache flush ordering for cpu hotplug

Merge branch 'master' of git://git./linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
If time allows, please consider pulling the following patchset contains two
late Netfilter fixes, they are:

* Skip broadcast/multicast locally generated traffic in the rpfilter,
(closes netfilter bugzilla #814), from Florian Westphal.

* Fix missing elements in the listing of ipset bitmap ip,mac set
type with timeout support enabled, from Jozsef Kadlecsik.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless

John W. Linville says:

====================
A few stragglers hoping for 3.9, somewhat delayed due to my travels...

On the mac80211 bits, Johannes says:

"Sadly, I have another pull request -- the idle handling fix broke LED
handling in some cases."

and:

"Yet one more!

This fixes a fairly important/annoying bug -- when roaming between
multiple APs of the same network, the system could get stuck thinking it
was connected to the old one while it really wasn't."

On top of that...

Arend sends a brcmfmac patch that removes advertising a feature that
isn't actually fully supported, and a brcmsmac patch that rearranges
code to request firmware at IFF_UP to play more nicely with being
built into the kernel.

Felix gives us a minor ath9k_htc fix to support the newly released
open source firmware, and an ath9k_hw initvals fix to improve device
stability.

Rafał Miłecki provides a fix for an ssb regression that caused a
serious performance problem with b43.

Zefir Kurtisi offers an ath9k fix to change some kmalloc flags to
allow the DFS detector to be called in softirq context.

Please let me know if there are problems. If these don't make 3.9,
I'll just pull them into wireless-next -- just let me know if you
want to do it that way!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: call tcp_replace_ts_recent() from tcp_ack()

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.

1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>

(ecr 200 should be ecr 300 in packets 3 & 4)

Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.

Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.

Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mtdchar: remove no-longer-used vma helpers

With the conversion to vm_iomap_memory(), these vma helpers are no
longer used.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper

This is my example conversion of a few existing mmap users. The pcm
mmap case is one of the more straightforward ones.

Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vm: convert fb_mmap to vm_iomap_memory() helper

This is my example conversion of a few existing mmap users. The
fb_mmap() case is a good example because it is a bit more complicated
than some: fb_mmap() mmaps one of two different memory areas depending
on the page offset of the mmap (but happily there is never any mixing of
the two, so the helper function still works).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vm: convert mtdchar mmap to vm_iomap_memory() helper

This is my example conversion of a few existing mmap users. The mtdchar
case is actually disabled right now (and stays disabled), but I did it
because it showed up on my "git grep", and I was familiar with the code
due to fixing an overflow problem in the code in commit 9c603e53d380
("mtdchar: fix offset overflow detection").

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

vm: convert HPET mmap to vm_iomap_memory() helper

This is my example conversion of a few existing mmap users. The HPET
case is simple, widely available, and easy to test (Clemens Ladisch sent
a trivial test-program for it).

Test-program-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"Two more small fixups to the wacom driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: wacom - fix "can not retrieve extra class descriptor" for DTH2242
Input: wacom - DTH2242 Grip Pen id was off by one bit

Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse build fix from Miklos Szeredi:
"This fixes android builds.  The patch appears large, but is just
  search & replace."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix type definitions in uapi header

Input: wacom - fix "can not retrieve extra class descriptor" for DTH2242

Same as Cintiq 24HDT, DTH2242 has two interfaces sharing one configuration.
This patch ignores the second interface.

Signed-off-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Input: wacom - DTH2242 Grip Pen id was off by one bit

Signed-off-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'userns-fixes' of git://git./linux/kernel/git/luto/linux

Pull user-namespace fixes from Andy Lutomirski.

* 'userns-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux:
  userns: Changing any namespace id mappings should require privileges
  userns: Check uid_map's opener's fsuid, not the current fsuid
  userns: Don't let unprivileged users trick privileged users into setting the id_map

Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform driver revert from Matthew Garrett:
"It turns out that one of the hp-wmi patches this cycle breaks some
  other HP laptops.  I think we have a good idea how to work on it for
  3.10, but it's safer to just revert it for now."

* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
  Revert "hp-wmi: Add support for SMBus hotkeys"

netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too

Alex Efros reported rpfilter module doesn't match following packets:
IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ]
(netfilter bugzilla #814).

Problem is that network stack arranges for the locally generated broadcasts
to appear on the interface they were sent out, so the IFF_LOOPBACK check
doesn't trigger.

As -m rpfilter is restricted to PREROUTING, we can check for existing
rtable instead, it catches locally-generated broad/multicast case, too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Revert "hp-wmi: Add support for SMBus hotkeys"

This reverts commit fabf85e3ca15d5b94058f391dac8df870cdd427a which breaks
hotkey support on some other HP laptops. We'll try doing this differently
in 3.10.

Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>

netfilter: ipset: bitmap:ip,mac: fix listing with timeout

The type when timeout support was enabled, could not list all elements,
just the first ones which could fit into one netlink message: it just
did not continue listing after the first message.

Reported-by: Yoann JUET <yoann.juet@univ-nantes.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Tested-by: Yoann JUET <yoann.juet@univ-nantes.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

bonding: fix l23 and l34 load balancing in forwarding path

Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing)
bonding doesn't properly hash traffic in forwarding setups.

Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in
this case.

More generally, the transport header might not be in the skb head.

Use pskb_may_pull() & skb_header_pointer() to get it right, and use
proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for
more protocols than TCP and UDP.

Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: John Eaglesham <linux@8192.net>
Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Fix status blocks configuration

This fixes 2 issues regarding bnx2x's status blocks:

   1. ethtool -c caused corruption of status blocks in FW RAM.

   2. when using multi-CoS, the configuration of the timeout values of
      status blocks is incorrect, harming the coalescing of interrupts
      for such CoSs.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Prevent UNDI FW illegal host access

When loading after UNDI (e.g., Boot from SAN) the UNDI does not
gracefully yield its resources; The bnx2x driver handles that release
itself.

During the manipulation required to release those resources, it's possible
for the UNDI to try and write to memory regions which are no longer accessible,
causing the PCI bus to prevent further writes from the chip.

This would in turn cause DMAE timeouts later on in the driver, as the driver
will be unable to use the chip's DMA engines.

This patch prevents the chip from actually writing through the PCI bus
in said scenario, thus allowing the release without the unfortunate by-product.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

Merge branch 'qlogic'

Shahed Shaikh says:

====================
This patch series contains bug fixes for -
* Loopback test failure while traffic is running.
* Tx timeout and subsequent firmware reset by removing check for
'(adapter->netdev->features & (NETIF_F_TSO | NETIF_F_TSO6)' from tx fast
path, as per Eric's suggestion.
* Typo in logs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Fix typo in logs

o Debug logs were not matching with code functionality.
o Changed dev_info to netdev_err

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: fix TSO race condition

When driver receives a packet with gso size > 0 and when TSO is disabled,
it should be transmitted as a TSO packet to prevent Tx timeout and subsequent
firmware reset.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Stop traffic before performing loopback test

Before conducting loopback test by sending packets, driver should stop transmit
queue and turn off carrier.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix a bug in setting VF VLAN via PF

The PF driver does not check if the administrator has already set a VF
VLAN via the PF driver before setting the new VLAN.  This results in
the following scenario:

A) Administrator sets VF <n> to VLAN 100
B) Administrator sets VF <x> to VLAN 100
C) Administrator sets VF <n> to VLAN 200
D) The VF <n> driver continues to be able to receive traffic on VLAN
   100 because the VLVFB pool enable bit for that VF was left set
   instead of being cleared as it should be.

This fix ensures that the old VLAN filter for VF <n> is first removed
and the pool bit enable for VF <n> is cleared so that it no longer
receives traffic on VLAN 100.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Revert support for build_skb in igb

This patch actually reverts:
igb: Support using build_skb in the case that jumbo frames are disabled

The reason for reverting this patch is that it can lead to data corruption.
The following flow was pointed out by Ben Hutchings:

1. skb is forwarded to another device
2. Packet headers are modified and it's put into a queue
3. Second packet is received into the other half of this page
4. Page cannot be reused, so is DMA-unmapped
5. The DMA mapping was non-coherent, so unmap copies or invalidates
cache

The headers added in step 2 get trashed in step 5.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- fix MAC address check in case of multiple mesh interfaces

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: highbank: fix cache flush ordering for cpu hotplug

The L1 data cache flush needs to be after highbank_set_cpu_jump call which
pollutes the cache with the l2x0_lock. This causes other cores to deadlock
waiting for the l2x0_lock. Moving the flush of the entire data cache after
highbank_set_cpu_jump fixes the problem. Use flush_cache_louis instead of
flush_cache_all are that is sufficient to flush only the L1 data cache.
flush_cache_louis did not exist when highbank_cpu_die was originally
written.

With PL310 errata 769419 enabled, a wmb is inserted into idle which takes
the l2x0_lock. This makes the problem much more easily hit and causes
reset to hang.

Reported-by: Paolo Pisati <p.pisati@gmail.com>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

Revert "block: add missing block_bio_complete() tracepoint"

This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f.

Wanlong Gao reports that it causes a kernel panic on his machine several
minutes after boot. Reverting it removes the panic.

Jens says:
"It's not quite clear why that is yet, so I think we should just revert
  the commit for 3.9 final (which I'm assuming is pretty close).

  The wifi is crap at the LSF hotel, so sending this email instead of
  queueing up a revert and pull request."

Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Requested-by: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86, hyperv: Handle Xen emulation of Hyper-V more gracefully

Install the Hyper-V specific interrupt handler only when needed. This would
permit us to get rid of the Xen check. Note that when the vmbus drivers invokes
the call to register its handler, we are sure to be running on Hyper-V.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1366299886-6399-1-git-send-email-kys@microsoft.com
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>

kprobes: Fix a double lock bug of kprobe_mutex

Fix a double locking bug caused when debug.kprobe-optimization=0.
While the proc_kprobes_optimization_handler locks kprobe_mutex,
wait_for_kprobe_optimizer locks it again and that causes a double lock.
To fix the bug, this introduces different mutex for protecting
sysctl parameter and locks it in proc_kprobes_optimization_handler.
Of course, since we need to lock kprobe_mutex when touching kprobes
resources, that is done in *optimize_all_kprobes().

This bug was introduced by commit ad72b3bea744 ("kprobes: fix
wait_for_kprobe_optimizer()")

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dmaengine: at_hdmac: fix race condition in atc_advance_work()

The BUG_ON() directive is triggered probably due to a latency
modification following inclusion of commit c10d73671ad3 ("softirq:
reduce latencies"). This condition has not been met before 3.9-rc1 and
doesn't trigger without this patch.

We now make sure that DMA channel is idle before calling
atc_complete_all() which makes the BUG_ON() "protection" useless.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()

This patch fixes the following oops, which could be trigged by build the kernel
with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC.

hpte_insert() might return -1, indicating that the bucket (primary here)
is full. We are not necessarily reporting a BUG in this case. Instead, we could
try repeatedly (try secondary, remove and try again) until we find a slot.

[  543.075675] ------------[ cut here ]------------
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594
[  543.075771] REGS: c0000000a90832c0 TRAP: 0700   Not tainted  (3.8.0-next-20130222)
[  543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 22224482  XER: 00000000
[  543.075816] SOFTE: 0
[  543.075823] CFAR: c00000000004c200
[  543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1
GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff
GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594
GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854
GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001
GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000
GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001
GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950
GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230
[  543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4
[  543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0
[  543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: Split the code trying to insert hpte repeatedly as an helper function

Move the logic trying to insert hpte in __hash_page_huge() to an helper
function, so it could also be used by others.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: Move the setting of rflags out of loop in __hash_page_huge

It seems that new_pte and rflags don't get changed in the repeating loop, so
move their assignment out of the loop.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: Set default VGA device

Add a PCI quirk for VGA devices on Power to set the default VGA device.
Ensures a default VGA is always set if a graphics adapter is present,
even if firmware did not initialize it. If more than one graphics
adapter is present, ensure the one initialized by firmware is set
as the default VGA device. This ensures that X autoconfiguration
will work.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

pci: Set dev->dev.type in alloc_pci_dev

Set dev->dev.type in alloc_pci_dev so that archs that have their own
versions of pci_setup_device get this set properly in order to ensure
things like the boot_vga sysfs parameter get created as expected.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc/pci: fix PCI-e devices rescan issue on powerpc platform

Powerpc initializes the DMA and IRQ information in pci_scan_child_bus()->
pcibios_fixup_bus()->pcibios_setup_bus_devices(). But for the devices
which are hotpluged, bus->is added has been set for the first scan of the
PCI-e bus, so the initialization code won't be called. Then the hotpluged
devices' driver will fail to load.

For example :
The PCI-e device 0001:03:00.0 is the Intel PCI-e e1000e network card, remove
it from the system:

    # echo 1 > /sys/bus/pci/devices/0001\:03\:00.0/remove
    # e1000e 0001:03:00.0 eth0: removed PHC

Rescan it from it's bus:

    # echo 1 > /sys/bus/pci/devices/0001\:02\:00.0/rescan
    ...
    e1000e 0001:03:00.0: Disabling ASPM L0s L1
    e1000e 0001:03:00.0: No usable DMA configuration, aborting
    e1000e: probe of 0001:03:00.0 failed with error -5

So we move the DMA & IRQ initialization code from pcibios_setup_devices() and
construct a new function pcibios_enable_device. We call this function in
pcibios_enable_device, which will be called by PCI-e rescan code. At the
meanwhile, we avoid the the impact on cardbus. I also validate this patch with
silicon's PCIe-sata which encounters the IRQ issue.

Signed-off-by: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: fix annotation of fake_numa_create_new_node()

This function has always been marked as __cpuinit, but is only called
from functions marked as __init and references an __initdata variable.
So change its annotation to __init.

Fixes this build warning:

WARNING: arch/powerpc/mm/built-in.o(.cpuinit.text+0x86): Section mismatch in reference from the function .fake_numa_create_new_node() to the variable .init.data:cmdline
The function __cpuinit .fake_numa_create_new_node() references
a variable __initdata cmdline.
If cmdline is only used by .fake_numa_create_new_node then
annotate cmdline with a matching annotation.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: fix numa distance for form0 device tree

The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f24124df89e38067b3766b7bef06ddb
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes. This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same. (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc/ptrace: Add DAWR debug feature info for userspace

This adds new debug feature information so that the DAWR can be
identified by userspace tools like GDB.

Unfortunately the DAWR doesn't sit nicely into the current description
that ptrace provides to userspace via struct ppc_debug_info. It doesn't
allow for specifying that only some ranges are possible or even the end
alignment constraints (DAWR only allows 512 byte wide ranges which can't
cross a 512 byte boundary).

After talking to Edjunior Machado (GDB ppc developer), it was decided
this was the best approach. Just mark it as debug feature DAWR and
tools like GDB can internally decide the constraints.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

powerpc: Add accounting for Doorbell interrupts

This patch adds a new line to /proc/interrupts to account for the
doorbell interrupts that each hardware thread has received. The total
interrupt count in /proc/stat will now also include doorbells.

# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
16:        551       1267        281        175      XICS Level     IPI
LOC:       2037       1503       1688       1625   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
CNT:          0          0          0          0   Performance monitoring interrupts
MCE:          0          0          0          0   Machine check exceptions
DBL:         42        550         20         91   Doorbell interrupts

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>

Merge branch 'fixes' of git://git./linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
Two small bug fixes for net/3.9 including the issue previously
discussed where allocation of netlink notifications can fail after
changes have been committed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>