openwrt/staging/blogic.git
11 years agobe2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()
Sathya Perla [Fri, 14 Jun 2013 10:24:51 +0000 (15:54 +0530)]
be2net: use pci_vfs_assigned()/pci_num_vf() instead of be_find_vfs()

be_find_vfs() is no longer needed as the common PCI calls provide the same
functionality.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosit: fix an oops when IFLA_IPTUN_PROTO is not set
Nicolas Dichtel [Wed, 19 Jun 2013 10:03:13 +0000 (12:03 +0200)]
sit: fix an oops when IFLA_IPTUN_PROTO is not set

The use of this attribute has been added in 32b8a8e59c9c (sit: add IPv4 over
IPv4 support). It is optional, by default proto is IPPROTO_IPV6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF
Daniel Borkmann [Wed, 19 Jun 2013 10:51:20 +0000 (12:51 +0200)]
net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF

The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff))
while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for
sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a
TCP_SKB_MIN_TRUESIZE.

Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is
SKB_TRUESIZE(2048) after commit f07d960df33c5 ("tcp: avoid frag allocation for
small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion
window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF
as 2048 can hit a small regression for some applications setting to low
SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because
SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in
case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for
skb->head.

The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by
using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that
would appear otherwise.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoneigh: disallow un-init_net to change thresh of neigh
Gao feng [Thu, 20 Jun 2013 02:01:34 +0000 (10:01 +0800)]
neigh: disallow un-init_net to change thresh of neigh

thresh and interval are global resources,
only init net can change them.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoneigh: only allow init_net to change the default neigh_parms
Gao feng [Thu, 20 Jun 2013 02:01:33 +0000 (10:01 +0800)]
neigh: only allow init_net to change the default neigh_parms

Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
directory to the un-init_net, but we can still use cmd such as
"ip ntable change name arp_cache locktime 129" to change the locktime
of default neigh_parms.

This patch disallows the un-init_net to find out the neigh_table.parms.
So the un-init_net will failed to influence the init_net.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoneigh: no need to call lookup_neigh_parms in neigh_parms_alloc
Gao feng [Thu, 20 Jun 2013 02:01:32 +0000 (10:01 +0800)]
neigh: no need to call lookup_neigh_parms in neigh_parms_alloc

neigh_table.parms always exist and is initialized,kmemdup
can use it to create new neigh_parms, actually lookup_neigh_parms
here will return neigh_table.parms too.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: replace mechanism to check for next available packet
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:05 +0000 (01:36 +0300)]
bnx2x: replace mechanism to check for next available packet

Check next packet availability by validating that HW has finished CQE
placement. This saves latency of another dma transaction performed to update
SB indexes.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobnx2x: add support for ndo_ll_poll
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:04 +0000 (01:36 +0300)]
bnx2x: add support for ndo_ll_poll

Adds ndo_ll_poll method and locking for FPs between LL and the napi.

When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Low Latency recv statistics
Amir Vadai [Tue, 18 Jun 2013 13:18:28 +0000 (16:18 +0300)]
net/mlx4_en: Low Latency recv statistics

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/mlx4_en: Add Low Latency Socket (LLS) support
Amir Vadai [Tue, 18 Jun 2013 13:18:27 +0000 (16:18 +0300)]
net/mlx4_en: Add Low Latency Socket (LLS) support

Add basic support for LLS.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: gre tunneling support.
David S. Miller [Thu, 20 Jun 2013 01:07:49 +0000 (18:07 -0700)]
openvswitch: gre tunneling support.

Pravin B Shelar says:

====================
Following patch series adds support for gre tunneling.
First six patches extend kernel gre and ip_tunnel modules
api so that there is more code sharing between gre modules
and ovs. Rest of patches adds ovs tunneling infrastructre
and gre protocol vport.

V2 fixes two patches according to comments from Jesse.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Add gre tunnel support.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:33 +0000 (17:50 -0700)]
openvswitch: Add gre tunnel support.

Add gre vport implementation.  Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Optimize flow key match for non tunnel flows.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:28 +0000 (17:50 -0700)]
openvswitch: Optimize flow key match for non tunnel flows.

Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Expand action buffer size.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:23 +0000 (17:50 -0700)]
openvswitch: Expand action buffer size.

MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action
needs extra space on action list, for now increase max actions list limit.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Add tunneling interface.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:18 +0000 (17:50 -0700)]
openvswitch: Add tunneling interface.

Add ovs tunnel interface for set tunnel action for userspace.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Copy individual actions.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:12 +0000 (17:50 -0700)]
openvswitch: Copy individual actions.

Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoip_tunnel: Add dont fragment flag.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:07 +0000 (17:50 -0700)]
ip_tunnel: Add dont fragment flag.

This flag will be used by ovs tunneling.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoip_tunnel: push generic protocol handling to ip_tunnel module.
Pravin B Shelar [Tue, 18 Jun 2013 00:50:02 +0000 (17:50 -0700)]
ip_tunnel: push generic protocol handling to ip_tunnel module.

Process skb tunnel header before sending packet to protocol handler.
this allows code sharing between gre and ovs gre modules.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoip_tunnels: extend iptunnel_xmit()
Pravin B Shelar [Tue, 18 Jun 2013 00:49:56 +0000 (17:49 -0700)]
ip_tunnels: extend iptunnel_xmit()

Refactor various ip tunnels xmit functions and extend iptunnel_xmit()
so that there is more code sharing.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogre: export gre_handle_offloads() function.
Pravin B Shelar [Tue, 18 Jun 2013 00:49:51 +0000 (17:49 -0700)]
gre: export gre_handle_offloads() function.

This is required for OVS GRE offloading.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogre: export gre_build_header() function.
Pravin B Shelar [Tue, 18 Jun 2013 00:49:45 +0000 (17:49 -0700)]
gre: export gre_build_header() function.

This is required for ovs gre module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogre: Allow multiple protocol listener for gre protocol.
Pravin B Shelar [Tue, 18 Jun 2013 00:49:38 +0000 (17:49 -0700)]
gre: Allow multiple protocol listener for gre protocol.

Currently there is only one user is allowed to register for gre
protocol.  Following patch adds de-multiplexer.  So that multiple
modules can listen on gre protocol e.g. kernel gre devices and ovs.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agogre: Simplify gre protocol registration locking.
Pravin B Shelar [Tue, 18 Jun 2013 00:49:32 +0000 (17:49 -0700)]
gre: Simplify gre protocol registration locking.

Use cmpxchg() for atomic protocol registration which saves
code and data space.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: get R8A7740 Rx descriptor word 0 shift out of #ifdef
Sergei Shtylyov [Thu, 13 Jun 2013 18:12:45 +0000 (22:12 +0400)]
sh_eth: get R8A7740 Rx descriptor word 0 shift out of #ifdef

The only R8A7740 specific #ifdef hindering  ARM multiplatform build is  left in
sh_eth_rx(): it covers the code shifting Rx buffer descriptor word 0 by 16. Get
rid of the #ifdef by adding 'shift_rd0' field to the  'struct sh_eth_cpu_data',
making the shift dependent on it, and setting it to 1 for the R8A7740 case...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: cleanup 'enum TD_STS_BIT'
Sergei Shtylyov [Wed, 19 Jun 2013 22:26:14 +0000 (02:26 +0400)]
sh_eth: cleanup 'enum TD_STS_BIT'

Fix the comment to 'enum TD_STS_BIT', reformat the values, and add a couple of
values missing before (though unused by the driver).

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: remove redundant bits from 'eesipr_value' field initializer
Sergei Shtylyov [Wed, 19 Jun 2013 22:24:54 +0000 (02:24 +0400)]
sh_eth: remove redundant bits from 'eesipr_value' field initializer

For SH7724 'eesipr_value' field initializer includes DMAC_M_RFRMER & DMAC_M_ECI
bits which are already contained in 0x01ff009f -- remove them.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: remove 'tx_error_check' field of 'struct sh_eth_cpu_data'
Sergei Shtylyov [Wed, 19 Jun 2013 22:22:56 +0000 (02:22 +0400)]
sh_eth: remove 'tx_error_check' field of 'struct sh_eth_cpu_data'

The 'tx_error_check' field of 'struct sh_eth_cpu_data' is write-only, so remove
it along with the DEFAULT_TX_ERROR_CHECK macro.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: add NAPI support
Sergei Shtylyov [Wed, 19 Jun 2013 19:30:23 +0000 (23:30 +0400)]
sh_eth: add NAPI support

The driver hasn't used NAPI so far; implement its support at last...

The patch was tested on Renesas R8A77781 BOCK-W board.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agosh_eth: define/use EESR_RX_CHECK macro
Sergei Shtylyov [Wed, 19 Jun 2013 19:29:23 +0000 (23:29 +0400)]
sh_eth: define/use EESR_RX_CHECK macro

sh_eth_interrupt() uses the same Rx interrupt mask twice to check the interrupt
status register -- #define EESR_RX_CHECK  and use it instead.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 19 Jun 2013 23:49:39 +0000 (16:49 -0700)]
Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c

The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.

The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().

Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.

The nl80211 conflict is a little more involved.  In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported.  Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.

However, the dump handlers to not use this logic.  Instead they have
to explicitly do the locking.  There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so.  So I fixed that whilst handling the overlapping changes.

To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Wed, 19 Jun 2013 19:24:36 +0000 (15:24 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

11 years agoMerge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
John W. Linville [Wed, 19 Jun 2013 18:50:44 +0000 (14:50 -0400)]
Merge branch 'for-john' of git://git./linux/kernel/git/jberg/mac80211

11 years agonl80211: fix attrbuf access race by allocating a separate one
Johannes Berg [Wed, 19 Jun 2013 08:09:57 +0000 (10:09 +0200)]
nl80211: fix attrbuf access race by allocating a separate one

Since my commit 3713b4e364 ("nl80211: allow splitting wiphy
information in dumps"), nl80211_dump_wiphy() uses the global
nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
be a problem if it only did so on the first dump iteration which
is locked against other commands in generic netlink, but due to
space constraints in cb->args (the needed state doesn't fit) I
decided to always parse the original message. That's racy though
since nl80211_fam.attrbuf could be used by some other parsing in
generic netlink concurrently.

For now, fix this by allocating a separate parse buffer (it's a
bit too big for the stack, currently 1448 bytes on 64-bit). For
-next, I'll change the code to parse into the global buffer in
the first round only and then allocate a smaller buffer to keep
the data in cb->args.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 years agovxlan: fix check for migration of static entry
stephen hemminger [Tue, 18 Jun 2013 21:27:01 +0000 (14:27 -0700)]
vxlan: fix check for migration of static entry

The check introduced by:
commit 26a41ae604381c5cc0caf1c3261ca6b298b5fe69
Author: stephen hemminger <stephen@networkplumber.org>
Date:   Mon Jun 17 12:09:58 2013 -0700

    vxlan: only migrate dynamic FDB entries

was not correct because it is checking flag about type of FDB
entry, rather than the state (dynamic versus static). The confusion
arises because vxlan is reusing values from bridge, and bridge is
reusing values from neighbour table, and easy to get lost in translation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobcm63xx_enet: fix return value check in bcm_enet_shared_probe()
Wei Yongjun [Wed, 19 Jun 2013 02:32:32 +0000 (10:32 +0800)]
bcm63xx_enet: fix return value check in bcm_enet_shared_probe()

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Introduce by commit 0ae99b5fede6f3a8d252d50bb4aba29544295219
(bcm63xx_enet: split DMA channel register accesses)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agocan: usb_8dev: unregister netdev before free()ing
Marc Kleine-Budde [Tue, 18 Jun 2013 12:33:58 +0000 (14:33 +0200)]
can: usb_8dev: unregister netdev before free()ing

The usb_8dev hardware has problems on some xhci USB hosts. The driver fails to
read the firmware revision in the probe function. This leads to the following
Oops:

    [ 3356.635912] kernel BUG at net/core/dev.c:5701!

The driver tries to free the netdev, which has already been registered, without
unregistering it.

This patch fixes the problem by unregistering the netdev in the error path.

Reported-by: Michael Olbrich <m.olbrich@pengutronix.de>
Reviewed-by: Bernd Krumboeck <krumboeck@universalnet.at>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
11 years agoipv6: ndisc: fix ndisc_send_redirect writing to the wrong skb
Matthias Schiffer [Fri, 31 May 2013 01:27:55 +0000 (03:27 +0200)]
ipv6: ndisc: fix ndisc_send_redirect writing to the wrong skb

Since some refactoring in 5f5a011, ndisc_send_redirect called
ndisc_fill_redirect_hdr_option on the wrong skb, leading to data corruption or
in the worst case a panic when the skb_put failed.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agostaging/rtl8192u: convert skb->tail into skb_tail_pointer(skb)
Isaku Yamahata [Fri, 14 Jun 2013 08:58:35 +0000 (17:58 +0900)]
staging/rtl8192u: convert skb->tail into skb_tail_pointer(skb)

The change set of 7a884dc "[SK_BUFF]: Convert skb->tail to sk_buff_data_t"
converted skb->tail from pointer into sk_buff_data_t.
Thus skb->tail is not always pointer, the area pointed by skb->tail
should be accessed via skb_tail_pointer().

Found by inspection. Compile tested only.

Cc: Simon Horman <horms@verge.net.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: devel@driverdev.osuosl.org
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Reviewed-by: Simon Horman <horms@verge.net.au>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agopxa168_eth: convert skb->end into skb_end_pointer(skb)
Isaku Yamahata [Fri, 14 Jun 2013 08:58:34 +0000 (17:58 +0900)]
pxa168_eth: convert skb->end into skb_end_pointer(skb)

The change set of 4305b541, "[SK_BUFF]: Convert skb->end to sk_buff_data_t"
converted skb->end from pointer type to sk_buff_data_t.
The pointed value should be accessed via skb_end_pointer().

Since arm arch doesn't define NET_SKBUFF_DATA_USES_OFFSET,
skb->end is effectively pointer. So it doesn't cause a real problem.
But this patch is good for consistency.

Found by inspection. Compile tested only.

Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agomv643xx_eth.c: convert skb->end into skb_end_poitner(skb)
Isaku Yamahata [Fri, 14 Jun 2013 08:58:33 +0000 (17:58 +0900)]
mv643xx_eth.c: convert skb->end into skb_end_poitner(skb)

The change set of 4305b541 "[SK_BUFF]: Convert skb->end to sk_buff_data_t"
converted skb->end from pointer to sk_buff_data_t.
The pointed value should be accessed via skb_end_pointer().

Since arm or ppc arch doesn't define NET_SKBUFF_DATA_USES_OFFSET,
skb->end is effectively pointer. So it doesn't cause a real problem.
But this patch is good for consistency.

Found by inspection. Compile test only.

Cc: Simon Horman <horms@verge.net.au>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet, scsi/csgb4i: convert skb->transport_header into skb_transport_header(skb)
Isaku Yamahata [Fri, 14 Jun 2013 08:58:32 +0000 (17:58 +0900)]
net, scsi/csgb4i: convert skb->transport_header into skb_transport_header(skb)

The change set of 1a37e412, "net: Use 16bits for *_headers fields
of struct skbuff" converted from sk_buff_data_t into 16bit integer.
So skb->tail needs to be converted to skb_tail_pointer(skb).

Found by inspection. Compile tested only.

Cc: Simon Horman <horms@verge.net.au>
Cc: Li RongQing <roy.qing.li@gmail.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet, atm/ambassader: convert skb->tail into skb_tail_pointer(skb)
Isaku Yamahata [Fri, 14 Jun 2013 08:58:31 +0000 (17:58 +0900)]
net, atm/ambassader: convert skb->tail into skb_tail_pointer(skb)

The change set of 27a884dc, "[SK_BUFF]: Convert skb->tail to sk_buff_data_t"
converted skb->tail from pointer into sk_buff_data_t. It missed skb->tail
in drivers/atm/ambassador.c.
This patch converts skb->tail into skb_tail_pointer(skb).

Found by inspection. Compile tested only.

Cc: Simon Horman <horms@verge.net.au>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: fix switched interval for MLD Query types
Linus Lüssing [Sun, 16 Jun 2013 21:20:34 +0000 (23:20 +0200)]
bridge: fix switched interval for MLD Query types

General Queries (the one with the Multicast Address field
set to zero / '::') are supposed to have a Maximum Response Delay
of [Query Response Interval], while for Multicast-Address-Specific
Queries it is [Last Listener Query Interval] - not the other way
round. (see RFC2710, section 7.3+7.8)

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovlan: restore ethtool ABI to control VLAN hardware acceleration
Fernando Luis Vazquez Cao [Mon, 17 Jun 2013 02:28:03 +0000 (11:28 +0900)]
vlan: restore ethtool ABI to control VLAN hardware acceleration

As part of the push to add 802.1ad server provider tagging support to the
kernel the VLAN features flags were renamed. Unfortunately the kernel name
for the VLAN hardware acceleration features that the kernel shows user space
was included in the rename, which broke ethtool (txvlan and rxvlan options
do not work). This patch restores the original names, i.e. the original ABI.
If we wanted to make clear to users that we are refering to CTAGs we can
always change ethtool's short_name and long_name for these features (for
example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
tx-vlan-ctag-offload).

Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: remove SCTP_STATIC macro
Daniel Borkmann [Mon, 17 Jun 2013 09:40:05 +0000 (11:40 +0200)]
net: sctp: remove SCTP_STATIC macro

SCTP_STATIC is just another define for the static keyword. It's use
is inconsistent in the SCTP code anyway and it was introduced in the
initial implementation of SCTP in 2.5. We have a regression suite in
lksctp-tools, but this is for user space only, so noone makes use of
this macro anymore. The kernel test suite for 2.5 is incompatible with
the current SCTP code anyway.

So simply Remove it, to be more consistent with the rest of the kernel
code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: get rid of t_new macro for kzalloc
Daniel Borkmann [Mon, 17 Jun 2013 09:40:04 +0000 (11:40 +0200)]
net: sctp: get rid of t_new macro for kzalloc

t_new rather obfuscates things where everyone else is using actual
function names instead of that macro, so replace it with kzalloc,
which is the function t_new wraps.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Add support to restart autonegotiate
Chris Healy [Mon, 17 Jun 2013 14:25:06 +0000 (07:25 -0700)]
fec: Add support to restart autonegotiate

Add ethtool operation to restart autonegotiation via the PHY.

Tested on i.MX28EVK.

Signed-off-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: don't call alb_set_slave_mac_addr() while atomic
Veaceslav Falico [Mon, 17 Jun 2013 17:30:35 +0000 (19:30 +0200)]
bonding: don't call alb_set_slave_mac_addr() while atomic

alb_set_slave_mac_addr() sets the mac address in alb mode via
dev_set_mac_address(), which might sleep. It's called from
alb_handle_addr_collision_on_attach() in atomic context (under
read_lock(bond->lock)), thus triggering a bug.

Fix this by moving the lock inside alb_handle_addr_collision_on_attach().

v1->v2:
As Nikolay Aleksandrov noticed, we can drop the bond->lock completely.
Also, use bond_slave_has_mac(), when possible.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: cpsw: check for cpts pointer after its allocation
Sebastian Siewior [Mon, 17 Jun 2013 17:31:52 +0000 (19:31 +0200)]
net: cpsw: check for cpts pointer after its allocation

after priv->cpts got allocated then this pointer should check to determine
if the allocation succeeded or not.

Cc: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Mon, 17 Jun 2013 23:15:51 +0000 (16:15 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless into wireless

John W. Linville says:

====================
This will probably be the last batch of wireless fixes intended
for 3.10.  Many of these are one- or two-liners, and a couple of
others are mostly relocating existing code to avoid races or to
limit the code to effecting specific hardware, etc.

The mac80211 fixes have a couple of exceptions to the above.
Regarding those, Johannes says:

"Following davem's complaint about my patch, here's a new pull request
w/o the patch he was complaining about, but instead with the const
fix rolled into the fix.

I have a fix for radar detection, one for rate control and a workaround
for broken HT APs which is a regression fix because we didn't rely
on them to be correct before."

Johannes also sends some iwlwifi fixes:

"I picked up Nikolay's patch for the chain noise calibration bug
that seems to have been there forever, a fix from Emmanuel for
setting TX flags on BAR frames and a fix of my own to avoid printing
request_module() errors if the kernel isn't even modular. We also
have our own version of Stanislaw's fix for rate control."

Along with those...

Anderson Lizardo fixes a Bluetooth memory corruption bug when an MTU
value is set to too small of a value.

Arend van Spriel sends a revised brcmsmac bug that fixes a regression
caused by a bad return value in an earlier patch.  He also sends a
brcmfmac fix to avoid an oops when loading the driver at boot.

Daniel Drake fixes a race condition in btmrvl that causes hangs on
suspend for OLPC hardware.

Johan Hedberg adds a check to avoid sending a
HCI_Delete_Stored_Link_Key command to devices that don't support them,
avoiding some scary looking log spam.

Stanislaw Gruszka gives us a fix for iwlegacy to be able to use rates
higher than 1Mb/s on older wireless networks.  He also sends an rt2x00
fix to reinstate older tx power handling behavior for some devices
that didn't work well with the current code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Mon, 17 Jun 2013 23:13:45 +0000 (16:13 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes. They are targeted to the
TCP option targets, that have receive some scrinity in the last week. The
changes are:

* Fix TCPOPTSTRIP, it stopped working in the forward chain as tcp_hdr
  uses skb->transport_header, and we cannot use that in the forwarding
  case, from myself.

* Fix default IPv6 MSS in TCPMSS in case of absence of TCP MSS options,
  from Phil Oester.

* Fix missing fragmentation handling again in TCPMSS, from Phil Oester.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoalx: add a simple AR816x/AR817x device driver
Johannes Berg [Mon, 17 Jun 2013 20:44:02 +0000 (22:44 +0200)]
alx: add a simple AR816x/AR817x device driver

This is a very simple driver, based on the original vendor
driver that Qualcomm/Atheros published/submitted previously,
but reworked to make the code saner. However, it also lost
a number of features (TSO/GSO, VLAN acceleration and multi-
queue support) in the process, as well as debugging support
features I didn't have any use for. The only thing I left
is checksum offload.

More features can obviously be added, but this seemed like
a good start for having a driver in mainline at all.

Johannes Stezenbach has verified that the driver works on
AR8161, I have a AR8171 myself. The E2200 device ID I found
on github in somebody's repository.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotg3: Prevent system hang during repeated EEH errors.
Michael Chan [Mon, 17 Jun 2013 20:47:25 +0000 (13:47 -0700)]
tg3: Prevent system hang during repeated EEH errors.

The current tg3 code assumes the pci_error_handlers to be always called
in sequence.  In particular, during ->error_detected(), NAPI is disabled
and the device is shutdown.  The device is later reset and NAPI
re-enabled in ->slot_reset() and ->resume().

In EEH, if more than 6 errors are detected in a hour, only
->error_detected() will be called.  This will leave the driver in an
inconsistent state as NAPI is disabled but netif_running state is still
true.  When the device is later closed, we'll try to disable NAPI again
and it will loop forever.

We fix this by closing the device if we encounter any error conditions
during the normal sequence of the pci_error_handlers.

v2: Remove the changes in tg3_io_resume() based on Benjamin Poirier's
    feedback.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoFix the VLAN_TAG_PRESENT in netvsc_recv_callback()
Haiyang Zhang [Mon, 17 Jun 2013 22:36:49 +0000 (15:36 -0700)]
Fix the VLAN_TAG_PRESENT in netvsc_recv_callback()

We should call __vlan_hwaccel_put_tag() only if the packet
comes from vlan, otherwise VLAN_TAG_PRESENT will always be
added.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: handle skb_clone failure
stephen hemminger [Mon, 17 Jun 2013 19:09:59 +0000 (12:09 -0700)]
vxlan: handle skb_clone failure

If skb_clone fails if out of memory then just skip the fanout.

Problem was introduced in 3.10 with:
  commit 6681712d67eef14c4ce793561c3231659153a320
  Author: David Stevens <dlstevens@us.ibm.com>
  Date:   Fri Mar 15 04:35:51 2013 +0000

    vxlan: generalize forwarding tables

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: only migrate dynamic FDB entries
stephen hemminger [Mon, 17 Jun 2013 19:09:58 +0000 (12:09 -0700)]
vxlan: only migrate dynamic FDB entries

Only migrate dynamic forwarding table entries, don't modify
static entries. If packet received from incorrect source IP address
assume it is an imposter and drop it.

This patch applies only to -net, a different patch would be needed for earlier
kernels since the NTF_SELF flag was introduced with 3.10.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agovxlan: fix race between flush and incoming learning
stephen hemminger [Mon, 17 Jun 2013 19:09:57 +0000 (12:09 -0700)]
vxlan: fix race between flush and incoming learning

It is possible for a packet to arrive during vxlan_stop(), and
have a dynamic entry created. Close this by checking if device
is up.

 CPU1                             CPU2
vxlan_stop
  vxlan_flush
     hash_lock acquired
                                  vxlan_encap_recv
                                     vxlan_snoop
                                        waiting for hash_lock
     hash_lock relased
  vxlan_flush done
                                        hash_lock acquired
                                        vxlan_fdb_create

This is a day-one bug in vxlan goes back to 3.7.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'tipc'
David S. Miller [Mon, 17 Jun 2013 22:53:09 +0000 (15:53 -0700)]
Merge branch 'tipc'

Paul Gortmaker says:

====================
This is a rework of the content sent earlier[1], with the following changes:

-drop the Kconfig --> modparam conversion patch; this was
 requested to be replaced[2] with a dynamic port quantity resizing.
 Ying and Erik were discussing how best to achieve this, and then
 vacation schedules got in the way, so implementing that will
 come (hopefully) in the next round.

-rework the sk_rcvbuf patch to allow memory resizing via sysctl
 as per what Ying and Neil discussed[3]

-add 4 more seemingly straigtforward and relatively small changes
 from Ying (the last 4 in the series).

-add cosmetic UAPI comment update patch from Ying.

That said, the largest change is still the one where we make use of
the fact that linux supports kernel threads and do the server like
operations within kernel threads.  As Jon says:

   We remove the last remnants of the TIPC native API, to make it
   possible to simplify locking policy and solve a problem with lost
   topology events.

   First, we introduce a socket-based alternative to the native API.

   Second, we convert the two remaining users of the native API, the
   TIPC internal topology server and the configuarion server, to use the
   new API.

   Third, we remove the remaining code pertaining to the native API.

I have re-tested this collection of commits between 32 and 64 bit x86
machines using the standard tipc test suite, and build tested for ppc.

[1] http://patchwork.ozlabs.org/patch/247687/
[2] http://patchwork.ozlabs.org/patch/247680/
[3] http://patchwork.ozlabs.org/patch/247688/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: remove dev_base_lock use from enable_bearer
Ying Xue [Mon, 17 Jun 2013 14:54:51 +0000 (10:54 -0400)]
tipc: remove dev_base_lock use from enable_bearer

Convert enable_bearer() to RCU locking with dev_get_by_name().

Based on a similar changeset in commit 840a185d ["aoe: remove
dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that:

  "dev_base_lock is the legacy way to lock the device list,
   and is planned to disappear. (writers hold RTNL, readers
   hold RCU lock)"

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: fix wrong return value for link_send_sections_long routine
Ying Xue [Mon, 17 Jun 2013 14:54:50 +0000 (10:54 -0400)]
tipc: fix wrong return value for link_send_sections_long routine

When skb buffer cannot be allocated in link_send_sections_long(),
-ENOMEM error code instead of -EFAULT should be returned to its
caller.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: make tipc_link_send_sections_fast exit earlier
Ying Xue [Mon, 17 Jun 2013 14:54:49 +0000 (10:54 -0400)]
tipc: make tipc_link_send_sections_fast exit earlier

Once message build request function returns invalid code, the
process of sending message cannot continue. So in case of message
build failure, tipc_link_send_sections_fast() should return
immediately.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: enhance priority of link protocol packet
Ying Xue [Mon, 17 Jun 2013 14:54:48 +0000 (10:54 -0400)]
tipc: enhance priority of link protocol packet

pfifo_fast is set as default traffic class queueing discipline. This
queue has three so called "bands". Within each band, FIFO rules apply.
However, as long as there are packets waiting in band 0, band 1 won't
be processed.

Now all kind of TIPC type packet priorities are never set, that is,
their priorities are 0, so they are mapped to band 1 of pfifo_fast
qdisc. But, especially during link congestion, if link protocol packet
can be sent out as earlier as possible than other type of packets so
that protocol packet can arrive at peer endpoint in time, the peer
will timely reset its link timeout timer to keep the link alive.
So enhancing the priority of link protocol packets can meet the
specific demand to avoid unnecessary link reset due to a transient
link congestion.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: cosmetic realignment of function arguments
Paul Gortmaker [Mon, 17 Jun 2013 14:54:47 +0000 (10:54 -0400)]
tipc: cosmetic realignment of function arguments

No runtime code changes here.  Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: save sock structure pointer instead of void pointer to tipc_port
Ying Xue [Mon, 17 Jun 2013 14:54:46 +0000 (10:54 -0400)]
tipc: save sock structure pointer instead of void pointer to tipc_port

Directly save sock structure pointer instead of void pointer to avoid
unnecessary cast conversions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: convert config_lock from spinlock to mutex
Ying Xue [Mon, 17 Jun 2013 14:54:45 +0000 (10:54 -0400)]
tipc: convert config_lock from spinlock to mutex

As the configuration server is now running under process context,
it's unnecessary for us to have a spinlock serializing the TIPC
configuration process. Instead, we replace it with a mutex lock,
which gives us more freedom. For instance, we can now call
pre-emptable functions within the protected area.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: rename tipc_createport_raw to tipc_createport
Ying Xue [Mon, 17 Jun 2013 14:54:44 +0000 (10:54 -0400)]
tipc: rename tipc_createport_raw to tipc_createport

After the removal of the native API, there is now only one way to
to create a TIPC port instance -- the function tipc_createport_raw().
We make it more readable by renaming it to tipc_createport().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: remove user_port instance from tipc_port structure
Ying Xue [Mon, 17 Jun 2013 14:54:43 +0000 (10:54 -0400)]
tipc: remove user_port instance from tipc_port structure

After the native API has been completely removed, the 'user_port'
field in struct tipc_port becomes unused, and can be removed.
As a consequence, the "usrmem" argument in tipc_msg_build() is no
longer needed, and so we remove that one too.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: delete code orphaned by new server infrastructure
Ying Xue [Mon, 17 Jun 2013 14:54:42 +0000 (10:54 -0400)]
tipc: delete code orphaned by new server infrastructure

Having completed the conversion of the topology server and
configuration server to use the new server infrastructure,
the following functions become unused, and can be deleted:

   - tipc_createport()
   - port_wakeup_sh()
   - port_dispatcher()
   - port_dispatcher_sigh()
   - tipc_send_buf_fast()
   - tipc_send_buf2port

Additionally, the following variables become orphaned,
and can be deleted:

   - tipc_msg_err_event
   - tipc_named_msg_err_event
   - tipc_conn_shutdown_event
   - tipc_msg_event
   - tipc_named_msg_event
   - tipc_conn_msg_event
   - tipc_continue_event
   - msg_queue_head
   - msg_queue_tail
   - queue_lock

Deletion is done here in a separate commit in order to allow
the actual conversion changes to be more easily viewed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: convert configuration server to use new server facility
Ying Xue [Mon, 17 Jun 2013 14:54:41 +0000 (10:54 -0400)]
tipc: convert configuration server to use new server facility

As the new socket-based TIPC server infrastructure has been
introduced, we can now convert the configuration server to use
it.  Then we can take future steps to simplify the configuration
server locking policy.

Some minor reordering of initialization is done, due to the
dependency on having tipc_socket_init completed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: convert topology server to use new server facility
Ying Xue [Mon, 17 Jun 2013 14:54:40 +0000 (10:54 -0400)]
tipc: convert topology server to use new server facility

As the new TIPC server infrastructure has been introduced, we can
now convert the TIPC topology server to it.  We get two benefits
from doing this:

1) It simplifies the topology server locking policy.  In the
original locking policy, we placed one spin lock pointer in the
tipc_subscriber structure to reuse the lock of the subscriber's
server port, controlling access to members of tipc_subscriber
instance.  That is, we only used one lock to ensure both
tipc_port and tipc_subscriber members were safely accessed.

Now we introduce another spin lock for tipc_subscriber structure
only protecting themselves, to get a finer granularity locking
policy.  Moreover, the change will allow us to make the topology
server code more readable and maintainable.

2) It fixes a bug where sent subscription events may be lost when
the topology port is congested.  Using the new service, the
topology server now queues sent events into an outgoing buffer,
and then wakes up a sender process which has been blocked in
workqueue context.  The process will keep picking events from the
buffer and send them to their respective subscribers, using the
kernel socket interface, until the buffer is empty. Even if the
socket is congested during transmission there is no risk that
events may be dropped, since the sender process may block when
needed.

Some minor reordering of initialization is done, since we now
have a scenario where the topology server must be started after
socket initialization has taken place, as the former depends
on the latter.  And overall, we see a simplification of the
TIPC subscriber code in making this changeover.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: introduce new TIPC server infrastructure
Ying Xue [Mon, 17 Jun 2013 14:54:39 +0000 (10:54 -0400)]
tipc: introduce new TIPC server infrastructure

TIPC has two internal servers, one providing a subscription
service for topology events, and another providing the
configuration interface. These servers have previously been running
in BH context, accessing the TIPC-port (aka native) API directly.
Apart from these servers, even the TIPC socket implementation is
partially built on this API.

As this API may simultaneously be called via different paths and in
different contexts, a complex and costly lock policiy is required
in order to protect TIPC internal resources.

To eliminate the need for this complex lock policiy, we introduce
a new, generic service API that uses kernel sockets for message
passing instead of the native API. Once the toplogy and configuration
servers are converted to use this new service, all code pertaining
to the native API can be removed. This entails a significant
reduction in code amount and complexity, and opens up for a complete
rework of the locking policy in TIPC.

The new service also solves another problem:

As the current topology server works in BH context, it cannot easily
be blocked when sending of events fails due to congestion. In such
cases events may have to be silently dropped, something that is
unacceptable. Therefore, the new service keeps a dedicated outbound
queue receiving messages from BH context. Once messages are
inserted into this queue, we will immediately schedule a work from a
special workqueue. This way, messages/events from the topology server
are in reality sent in process context, and the server can block
if necessary.

Analogously, there is a new workqueue for receiving messages. Once a
notification about an arriving message is received in BH context, we
schedule a work from the receive workqueue to do the job of
receiving the message in process context.

As both sending and receive messages are now finished in processes,
subscribed events cannot be dropped any more.

As of this commit, this new server infrastructure is built, but
not actually yet called by the existing TIPC code, but since the
conversion changes required in order to use it are significant,
the addition is kept here as a separate commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: allow implicit connect for stream sockets
Erik Hugne [Mon, 17 Jun 2013 14:54:38 +0000 (10:54 -0400)]
tipc: allow implicit connect for stream sockets

TIPC's implied connect feature, aka piggyback connect, allows
applications to save one syscall and all SYN/SYN-ACK signalling
overhead when setting up a connection.  Until now, this has only
been supported for SEQPACKET sockets.  Here, we make it possible
to use this feature even with stream sockets.

At the connecting side, the connection is completed when the
first data message arrives from the accepting peer.  This means
that we must allow the connecting user to call blocking recv()
before the socket has reached state SS_CONNECTED.  So we must must
relax the state machine check at recv_stream(), and allow the
recv() call even if socket is in state SS_CONNECTING.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: change socket buffer overflow control to respect sk_rcvbuf
Ying Xue [Mon, 17 Jun 2013 14:54:37 +0000 (10:54 -0400)]
tipc: change socket buffer overflow control to respect sk_rcvbuf

As per feedback from the netdev community, we change the buffer
overflow protection algorithm in receiving sockets so that it
always respects the nominal upper limit set in sk_rcvbuf.

Instead of scaling up from a small sk_rcvbuf value, which leads to
violation of the configured sk_rcvbuf limit, we now calculate the
weighted per-message limit by scaling down from a much bigger value,
still in the same field, according to the importance priority of the
received message.

To allow for administrative tunability of the socket receive buffer
size, we create a tipc_rmem sysctl variable to allow the user to
configure an even bigger value via sysctl command.  It is a size of
three (min/default/max) to be consistent with things like tcp_rmem.

By default, the value initialized in tipc_rmem[1] is equal to the
receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
This value is also set as the default value of sk_rcvbuf.

Originally-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
[Ying: added sysctl variation to Jon's original patch]
Signed-off-by: Ying Xue <ying.xue@windriver.com>
[PG: don't compile sysctl.c if not config'd; add Documentation]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotipc: update code comments to reflect new uapi header path
Ying Xue [Mon, 17 Jun 2013 14:54:36 +0000 (10:54 -0400)]
tipc: update code comments to reflect new uapi header path

Files tipc.h and tipc_config.h were moved to uapi directory, but
the corresponding comments were not updated at the same time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add socket option for low latency polling
Eliezer Tamir [Fri, 14 Jun 2013 13:33:57 +0000 (16:33 +0300)]
net: add socket option for low latency polling

adds a socket option for low latency polling.
This allows overriding the global sysctl value with a per-socket one.
Unexport sysctl_net_ll_poll since for now it's not needed in modules.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: remove NET_LL_RX_POLL config menue
Eliezer Tamir [Fri, 14 Jun 2013 13:33:46 +0000 (16:33 +0300)]
net: remove NET_LL_RX_POLL config menue

Remove NET_LL_RX_POLL from the config menu.
Change default to y.
Busy polling still needs to be enabled at run time.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: convert low latency sockets to sched_clock()
Eliezer Tamir [Fri, 14 Jun 2013 13:33:35 +0000 (16:33 +0300)]
net: convert low latency sockets to sched_clock()

Use sched_clock() instead of get_cycles().
We can use sched_clock() because we don't care much about accuracy.
Remove the dependency on X86_TSC

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: change sysctl_net_ll_poll into an unsigned int
Eliezer Tamir [Fri, 14 Jun 2013 13:33:25 +0000 (16:33 +0300)]
net: change sysctl_net_ll_poll into an unsigned int

There is no reason for sysctl_net_ll_poll to be an unsigned long.
Change it into an unsigned int.
Fix the proc handler.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Mon, 17 Jun 2013 17:44:01 +0000 (13:44 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem

11 years agolseek(fd, n, SEEK_END) does *not* go to eof - n
Al Viro [Sun, 16 Jun 2013 17:06:06 +0000 (18:06 +0100)]
lseek(fd, n, SEEK_END) does *not* go to eof - n

When you copy some code, you are supposed to read it.  If nothing else,
there's a chance to spot and fix an obvious bug instead of sharing it...

X-Song: "I Got It From Agnes", by Tom Lehrer
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Tom Lehrer? You're dating yourself, Al ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoLinux 3.10-rc6
Linus Torvalds [Sat, 15 Jun 2013 21:51:07 +0000 (11:51 -1000)]
Linux 3.10-rc6

11 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sat, 15 Jun 2013 21:49:48 +0000 (11:49 -1000)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "These are a little later than I planned on since I got caught up with
  handling merges for 3.11 most of the week.

  Another week, another batch of fixes for arm-soc platforms.

  Again, nothing controversial.  A few more than would be ideal, but all
  are valid fixes.  In particular the prima2 panic patch is critical
  since it fixes a problem where multiplatform kernels panic on all but
  prima2 hardware."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: SAMSUNG: pm: Adjust for pinctrl- and DT-enabled platforms
  ARM: prima2: fix incorrect panic usage
  arm: mvebu: armada-xp-{gp,openblocks-ax3-4}: specify PCIe range
  ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().
  ARM: omap3: clock: fix wrong container_of in clock36xx.c
  ARM: dts: OMAP5: Fix missing PWM capability to timer nodes
  ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line
  ARM: dts: AM33xx: Fix properties on gpmc node
  arm: omap2: fix AM33xx hwmod infos for UART2
  ARM: OMAP3: Fix iva2_pwrdm settings for 3703

11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 15 Jun 2013 21:47:56 +0000 (11:47 -1000)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix RTNL locking in batman-adv, from Matthias Schiffer.

 2) Don't allow non-passthrough macvlan devices to set NOPROMISC via
    netlink, otherwise we can end up with corrupted promisc counter
    values on the device.  From Michael S Tsirkin.

 3) Fix stmmac driver build with debugging defines enabled, from Dinh
    Nguyen.

 4) Make sure name string we give in socket address in AF_PACKET is NULL
    terminated, from Daniel Borkmann.

 5) Fix leaking of two uninitialized bytes of memory to userspace in
    l2tp, from Guillaume Nault.

 6) Clear IPCB(skb) before tunneling otherwise we touch dangling IP
    options state and crash.  From Saurabh Mohan.

 7) Fix suspend/resume for davinci_mdio by using suspend_late and
    resume_early.  From Mugunthan V N.

 8) Don't tag ip_tunnel_init_net and ip_tunnel_delete_net with
    __net_{init,exit}, they can be called outside of those contexts.
    From Eric Dumazet.

 9) Fix RX length error in sh_eth driver, from Yoshihiro Shimoda.

10) Fix missing sctp_outq initialization in some code paths of SCTP
    stack, from Neil Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
  sctp: fully initialize sctp_outq in sctp_outq_init
  netiucv: Hold rtnl between name allocation and device registration.
  tulip: Properly check dma mapping result
  net: sh_eth: fix incorrect RX length error if R8A7740
  ip_tunnel: remove __net_init/exit from exported functions
  drivers: net: davinci_mdio: restore mdio clk divider in mdio resume
  drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver
  net/ipv4: ip_vti clear skb cb before tunneling.
  tg3: Wait for boot code to finish after power on
  l2tp: Fix sendmsg() return value
  l2tp: Fix PPP header erasure and memory leak
  bonding: fix igmp_retrans type and two related races
  bonding: reset master mac on first enslave failure
  packet: packet_getname_spkt: make sure string is always 0-terminated
  net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used
  be2net: Fix 32-bit DMA Mask handling
  xen-netback: don't de-reference vif pointer after having called xenvif_put()
  macvlan: don't touch promisc without passthrough
  batman-adv: Don't handle address updates when bla is disabled
  batman-adv: forward late OGMs from best next hop
  ...

11 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Sat, 15 Jun 2013 05:25:04 +0000 (19:25 -1000)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
 "So here are 3 fixes still for 3.10.  Fixes are simple, bugs are nasty
  (though not recent regressions, nasty enough) and all targeted at
  stable"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix missing/delayed calls to irq_work
  powerpc: Fix emulation of illegal instructions on PowerNV platform
  powerpc: Fix stack overflow crash in resume_kernel when ftracing

11 years agosmp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
David Daney [Fri, 14 Jun 2013 18:13:59 +0000 (11:13 -0700)]
smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().

Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.

With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
result of the time_init() call.  Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
      show_stack+0x68/0x80
      warn_slowpath_common+0x78/0xb0
      warn_slowpath_fmt+0x38/0x48
      start_kernel+0x250/0x410

Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
need a flags variable, make it a static inline to avoid name space
issues.

[ Change from v1: Convert on_each_cpu to a static inline function, add
  #include <linux/irqflags.h> to avoid build breakage on some files.

  on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
  on_each_cpu(), but they are not causing !SMP bugs for me, so I will
  defer changing them to a less urgent patch. ]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 15 Jun 2013 05:18:56 +0000 (19:18 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull VFS fixes from Al Viro:
 "Several fixes + obvious cleanup (you've missed a couple of open-coded
  can_lookup() back then)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  snd_pcm_link(): fix a leak...
  use can_lookup() instead of direct checks of ->i_op->lookup
  move exit_task_namespaces() outside of exit_notify()
  fput: task_work_add() can fail if the caller has passed exit_task_work()
  ncpfs: fix rmdir returns Device or resource busy

11 years agoMerge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Sat, 15 Jun 2013 05:16:31 +0000 (19:16 -1000)]
Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 - Remove noisy warnings about experimental support which spams the logs
 - Add padding to align directory and attr structures correctly
 - Set block number on child buffer on a root btree split
 - Disable verifiers during log recovery for non-CRC filesystems

* tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: don't shutdown log recovery on validation errors
  xfs: ensure btree root split sets blkno correctly
  xfs: fix implicit padding in directory and attr CRC formats
  xfs: don't emit v5 superblock warnings on write

11 years agoMerge tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 15 Jun 2013 05:15:36 +0000 (19:15 -1000)]
Merge tag 'char-misc-3.10-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char / misc fixes from Greg Kroah-Hartman:
 "Here are some small mei driver fixes for 3.10-rc6 that fix some
  reported problems"

* tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mei: me: clear interrupts on the resume path
  mei: nfc: fix nfc device freeing
  mei: init: Flush scheduled work before resetting the device

11 years agoMerge tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 15 Jun 2013 05:14:39 +0000 (19:14 -1000)]
Merge tag 'usb-3.10-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are some small USB driver fixes that resolve some reported
  problems for 3.10-rc6

  Nothing major, just 3 USB serial driver fixes, and two chipidea fixes"

* tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: chipidea: fix id change handling
  usb: chipidea: fix no transceiver case
  USB: pl2303: fix device initialisation at open
  USB: spcp8x5: fix device initialisation at open
  USB: f81232: fix device initialisation at open

11 years agopowerpc: Fix missing/delayed calls to irq_work
Benjamin Herrenschmidt [Sat, 15 Jun 2013 02:13:40 +0000 (12:13 +1000)]
powerpc: Fix missing/delayed calls to irq_work

When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
11 years agopowerpc: Fix emulation of illegal instructions on PowerNV platform
Paul Mackerras [Fri, 14 Jun 2013 10:07:41 +0000 (20:07 +1000)]
powerpc: Fix emulation of illegal instructions on PowerNV platform

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr).  The emulation of unimplemented instructions is currently not
working on the PowerNV platform.  The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt.  This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt().  With this, old programs that use no-longer
implemented instructions such as dcba now work again.

CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
11 years agopowerpc: Fix stack overflow crash in resume_kernel when ftracing
Michael Ellerman [Thu, 13 Jun 2013 11:04:56 +0000 (21:04 +1000)]
powerpc: Fix stack overflow crash in resume_kernel when ftracing

It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
  "powerpc: Rework runlatch code" by myself --BenH
]

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
11 years agosnd_pcm_link(): fix a leak...
Al Viro [Wed, 5 Jun 2013 18:07:08 +0000 (14:07 -0400)]
snd_pcm_link(): fix a leak...

in case when snd_pcm_stream_linked(substream) is true, we end up leaking
group.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agouse can_lookup() instead of direct checks of ->i_op->lookup
Al Viro [Thu, 6 Jun 2013 23:33:47 +0000 (19:33 -0400)]
use can_lookup() instead of direct checks of ->i_op->lookup

a couple of places got missed back when Linus has introduced that one...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomove exit_task_namespaces() outside of exit_notify()
Oleg Nesterov [Fri, 14 Jun 2013 19:09:49 +0000 (21:09 +0200)]
move exit_task_namespaces() outside of exit_notify()

exit_notify() does exit_task_namespaces() after
forget_original_parent(). This was needed to ensure that ->nsproxy
can't be cleared prematurely, an exiting child we are going to
reparent can do do_notify_parent() and use the parent's (ours) pid_ns.

However, after 32084504 "pidns: use task_active_pid_ns in
do_notify_parent" ->nsproxy != NULL is no longer needed, we rely
on task_active_pid_ns().

Move exit_task_namespaces() from exit_notify() to do_exit(), after
exit_fs() and before exit_task_work().

This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy()
does fput() which needs task_work_add().

Note: this particular problem can be fixed if we change fput(), and
that change makes sense anyway. But there is another reason to move
the callsite. The original reason for exit_task_namespaces() from
the middle of exit_notify() was subtle and it has already gone away,
now this looks confusing. And this allows us do simplify exit_notify(),
we can avoid unlock/lock(tasklist) and we can use ->exit_state instead
of PF_EXITING in forget_original_parent().

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofput: task_work_add() can fail if the caller has passed exit_task_work()
Oleg Nesterov [Fri, 14 Jun 2013 19:09:47 +0000 (21:09 +0200)]
fput: task_work_add() can fail if the caller has passed exit_task_work()

fput() assumes that it can't be called after exit_task_work() but
this is not true, for example free_ipc_ns()->shm_destroy() can do
this. In this case fput() silently leaks the file.

Change it to fallback to delayed_fput_work if task_work_add() fails.
The patch looks complicated but it is not, it changes the code from

if (PF_KTHREAD) {
schedule_work(...);
return;
}
task_work_add(...)

to
if (!PF_KTHREAD) {
if (!task_work_add(...))
return;
/* fallback */
}
schedule_work(...);

As for shm_destroy() in particular, we could make another fix but I
think this change makes sense anyway. There could be another similar
user, it is not safe to assume that task_work_add() can't fail.

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agonet: sctp: sctp_association_init: put refs in reverse order
Daniel Borkmann [Fri, 14 Jun 2013 16:24:07 +0000 (18:24 +0200)]
net: sctp: sctp_association_init: put refs in reverse order

In case we need to bail out for whatever reason during assoc
init, we call sctp_endpoint_put() and then sock_put(), however,
we've hold both refs in reverse, non-symmetric order, so first
sctp_endpoint_hold() and then sock_hold().

Reverse this, so that in an error case we have sock_put() and then
sctp_endpoint_put(). Actually shouldn't matter too much, since both
cleanup paths do the right thing, but that way, it is more consistent
with the rest of the code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: minor: remove variable in sctp_init_sock
Daniel Borkmann [Fri, 14 Jun 2013 16:24:06 +0000 (18:24 +0200)]
net: sctp: minor: remove variable in sctp_init_sock

It's only used at this one time, so we could remove it as well.
This is valid and also makes it more explicit/obvious that in case
of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock()
check that was recently added.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_sf_do_prm_asoc: do SCTP_CMD_INIT_CHOOSE_TRANSPORT first
Daniel Borkmann [Fri, 14 Jun 2013 16:24:05 +0000 (18:24 +0200)]
net: sctp: sctp_sf_do_prm_asoc: do SCTP_CMD_INIT_CHOOSE_TRANSPORT first

While this currently cannot trigger any NULL pointer dereference in
sctp_seq_dump_local_addrs(), better change the order of commands to
prevent a future bug to happen. Although we first add SCTP_CMD_NEW_ASOC
and then set the SCTP_CMD_INIT_CHOOSE_TRANSPORT, it is okay for now,
since this primitive is only called by sctp_connect() or sctp_sendmsg()
with sctp_assoc_add_peer() set first. However, lets do this precaution
and first set the transport and then add it to the association hashlist
to prevent in future something to possibly triggering this.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sideeffect: throw BUG if primary_path is NULL
Daniel Borkmann [Fri, 14 Jun 2013 16:24:04 +0000 (18:24 +0200)]
net: sctp: sideeffect: throw BUG if primary_path is NULL

This clearly states a BUG somewhere in the SCTP code as e.g. fixed once
in f28156335 ("sctp: Use correct sideffect command in duplicate cookie
handling"). If this ever happens, throw a trace in the sideeffect engine
where assocs clearly must have a primary_path assigned.

When in sctp_seq_dump_local_addrs() also throw a WARN and bail out since
we do not need to panic for printing this one asterisk. Also, it will
avoid the not so obvious case when primary != NULL test passes and at a
later point in time triggering a NULL ptr dereference caused by primary.
While at it, also fix up the white space.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>