git.openwrt.org Git - openwrt/staging/blogic.git/log

tipc: create broadcast transmission link at namespace init

The broadcast transmission link is currently instantiated when the
network subsystem is started, i.e., on order from user space via netlink.

This forces the broadcast transmission code to do unnecessary tests for
the existence of the transmission link, as well in single mode node as
in network mode.

In this commit, we do instead create the link during initialization of
the name space, and remove it when it is stopped. The fact that the
transmission link now has a guaranteed longer life cycle than any of its
potential clients paves the way for further code simplifcations
and optimizations.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move broadcast link lock to struct tipc_net

The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: move bcast definitions to bcast.c

Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.

In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.

We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.

Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-10-22

Here's probably the last bluetooth-next pull request for 4.4. Among
several other changes it contains the rest of the fixes & cleanups from
the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).

- Refactoring & cleanups to 6lowpan code
- New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
- Fix (quirk) for broken Broadcom BCM2045 controllers
- Support for latest Apple controllers
- Improvements to the vendor diagnostic message support

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Fix compile errors when CONFIG_BNXT_SRIOV is not set.

struct bnxt_pf_info needs to be always defined. Move bnxt_update_vf_mac()
to bnxt_sriov.c and add some missing #ifdef CONFIG_BNXT_SRIOV.

Reported-by: Jim Hull <jim.hull@hpe.com>
Tested-by: Jim Hull <jim.hull@hpe.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-23

This series contains updates to i40e, i40evf, if_link, ixgbe and ixgbevf.

Anjali adds a workaround to drop any flow control frames from being
transmitted from any VSI, so that a malicious VF cannot send flow control
or PFC packets out on the wire.  Also fixed a bug in debugfs by grabbing
the filter list lock before adding or deleting a filter.

Akeem fixes an issue where we were unconditionally returning VEB bridge
mode before allowing LB in the add VSI routine, resolve by checking if
the bridge is actually in VEB mode first.

Mitch fixed an issue where the incorrect structure was being used for
VLAN filter list, which meant the VLAN filter list did not get
processed correctly and VLAN filters would not be re-enabled after any
kind of reset.

Helin fixed a problem of possibly getting inconsistent flow control
status after a PF reset.  The issue was requested_mode was being set
with a default value during probe, but the hardware state could be a
different value from this mode.

Carolyn fixed a problem where the driver output of the OEM version
string varied from the other tools.

Jean Sacren fixes up kernel documentation by fixing function header
comments to match actual variables used in the functions.  Also
cleaned up variable initialization, when the variable would be
over-written immediately.

Hiroshi Shimanoto provides three patches to add "trusted" VF by adding
netlink directives and an NDO entry.  Then implement these new controls
in ixgbe and ixgbevf.  This series has gone through several iterations
to address all the suggested community changes and concerns.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mpls_multipath'

Roopa Prabhu says:

====================
mpls: multipath support

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3
====================

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: flow-based multipath selection

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: multipath route support

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple
nexthops. Cannot match against any single nexthops dev. So, this patch
removes the unused 'dev' handling in mpls_route_update.

- dead route/path handling will be implemented in a subsequent patch

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
                nexthop as 700 via inet 10.1.1.6 dev swp2 \
                nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
        nexthop as to 200 via inet 10.1.1.2  dev swp1
        nexthop as to 700 via inet 10.1.1.6  dev swp2
        nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sysctl: fix a kmemleak warning

the returned buffer of register_sysctl() is stored into net_header
variable, but net_header is not used after, and compiler maybe
optimise the variable out, and lead kmemleak reported the below warning

comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s)
hex dump (first 32 bytes):
90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffc00020f134>] create_object+0x10c/0x2a0
[<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0
[<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8
[<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0
[<ffffffc00028eef0>] register_sysctl+0x30/0x40
[<ffffffc00099c304>] net_sysctl_init+0x20/0x58
[<ffffffc000994dd8>] sock_init+0x10/0xb0
[<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8
[<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0
[<ffffffc00070ed6c>] kernel_init+0x1c/0xe8
[<ffffffc000083bfc>] ret_from_fork+0xc/0x50
[<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>>

Before fix, the objdump result on ARM64:
0000000000000000 <net_sysctl_init>:
   0:   a9be7bfd        stp     x29, x30, [sp,#-32]!
   4:   90000001        adrp    x1, 0 <net_sysctl_init>
   8:   90000000        adrp    x0, 0 <net_sysctl_init>
   c:   910003fd        mov     x29, sp
  10:   91000021        add     x1, x1, #0x0
  14:   91000000        add     x0, x0, #0x0
  18:   a90153f3        stp     x19, x20, [sp,#16]
  1c:   12800174        mov     w20, #0xfffffff4                // #-12
  20:   94000000        bl      0 <register_sysctl>
  24:   b4000120        cbz     x0, 48 <net_sysctl_init+0x48>
  28:   90000013        adrp    x19, 0 <net_sysctl_init>
  2c:   91000273        add     x19, x19, #0x0
  30:   9101a260        add     x0, x19, #0x68
  34:   94000000        bl      0 <register_pernet_subsys>
  38:   2a0003f4        mov     w20, w0
  3c:   35000060        cbnz    w0, 48 <net_sysctl_init+0x48>
  40:   aa1303e0        mov     x0, x19
  44:   94000000        bl      0 <register_sysctl_root>
  48:   2a1403e0        mov     w0, w20
  4c:   a94153f3        ldp     x19, x20, [sp,#16]
  50:   a8c27bfd        ldp     x29, x30, [sp],#32
  54:   d65f03c0        ret
After:
0000000000000000 <net_sysctl_init>:
   0:   a9bd7bfd        stp     x29, x30, [sp,#-48]!
   4:   90000000        adrp    x0, 0 <net_sysctl_init>
   8:   910003fd        mov     x29, sp
   c:   a90153f3        stp     x19, x20, [sp,#16]
  10:   90000013        adrp    x19, 0 <net_sysctl_init>
  14:   91000000        add     x0, x0, #0x0
  18:   91000273        add     x19, x19, #0x0
  1c:   f90013f5        str     x21, [sp,#32]
  20:   aa1303e1        mov     x1, x19
  24:   12800175        mov     w21, #0xfffffff4                // #-12
  28:   94000000        bl      0 <register_sysctl>
  2c:   f9002260        str     x0, [x19,#64]
  30:   b40001a0        cbz     x0, 64 <net_sysctl_init+0x64>
  34:   90000014        adrp    x20, 0 <net_sysctl_init>
  38:   91000294        add     x20, x20, #0x0
  3c:   9101a280        add     x0, x20, #0x68
  40:   94000000        bl      0 <register_pernet_subsys>
  44:   2a0003f5        mov     w21, w0
  48:   35000080        cbnz    w0, 58 <net_sysctl_init+0x58>
  4c:   aa1403e0        mov     x0, x20
  50:   94000000        bl      0 <register_sysctl_root>
  54:   14000004        b       64 <net_sysctl_init+0x64>
  58:   f9402260        ldr     x0, [x19,#64]
  5c:   94000000        bl      0 <unregister_sysctl_table>
  60:   f900227f        str     xzr, [x19,#64]
  64:   2a1503e0        mov     w0, w21
  68:   f94013f5        ldr     x21, [sp,#32]
  6c:   a94153f3        ldp     x19, x20, [sp,#16]
  70:   a8c37bfd        ldp     x29, x30, [sp],#48
  74:   d65f03c0        ret

Add the possible error handle to free the net_header to remove the
kmemleak warning

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mdiobus_nested_read_write'

Neil Armstrong says:

====================
Refactor nested mdiobus read/write functions

In order to avoid locked signal false positive for nested mdiobus
read/write calls, nested code was introduced in mv88e6xxx and
mdio-mux.
But mv88e6060 also needs such nested mdiobus read/write calls.
For sake of refactoring, introduce nested variants of mdiobus read/write
and make them used by mv88e6xxx and mv88e6060.
In a next patch, mdio-mux should also use these variant calls.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Make mv88e6060 use nested mdiobus read/write

Like mv88e6xxx and mdio-mux, to avoid lockdep give false positives
because of nested MDIO busses, switch to previously introduced
nested mdiobus_read/write variants.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Make mv88e6xxx use nested mdiobus read/write

Make the mv88e6xxx driver use the previously introduced nested
variants of mdiobus_read/write functions.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add nested variants of mdiobus read/write

Since nested variants of mdiobus_read/write are used in multiple
drivers, add nested variants in the mdiobus core.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe, ixgbevf: Add new mbox API xcast mode

The limitation of the number of multicast address for VF is not enough
for the large scale server with SR-IOV feature. IPv6 requires the multicast
MAC address for each IP address to handle the Neighbor Solicitation
message. We couldn't assign over 30 IPv6 addresses to a single VF.

This patch introduces the new mailbox API, IXGBE_VF_UPDATE_XCAST_MODE,
to update multicast mode of VF. This adds 3 modes;
  - NONE     only L2 exact match addresses or Flow Director enabled
  - MULTI    BAM and ROMPE set
  - ALLMULTI BAM, ROMPE and MPE set

If a guest VF user wants over 30 MAC multicast addresses, set IFF_ALLMULTI
to request PF to update xcast mode to enable VF multicast promiscuous mode.

On the other hand, enabling VF multicast promiscuous mode may affect
security and performance in the network of the NIC. Only trusted VF can
enable multicast promiscuous mode. The behavior of untrusted VF is the
same as previous version.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Add new ndo to trust VF

Implements the new netdev op to trust VF in ixgbe.

The administrator can turn on and off VF trusted by ip command which
supports trust message.
# ip link set dev eth0 vf 1 trust on
or
# ip link set dev eth0 vf 1 trust off

Send a ping to reset VF on changing the status of trusting.
VF driver will reconfigure its features on reset.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

drivers: net: cpsw: use module_platform_driver

There is no reasons to probe cpsw from late_initcall level
and it's not recommended. Hence, use module_platform_driver()
to register and probe cpsw driver from module_init() level.

Cc: Tony Lindgren <tony@atomide.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

if_link: Add control trust VF

Add netlink directives and ndo entry to trust VF user.

This controls the special permission of VF user.
The administrator will dedicatedly trust VF user to use some features
which impacts security and/or performance.

The administrator never turn it on unless VF user is fully trusted.

CC: Sy Jong Choi <sy.jong.choi@intel.com>
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

tcp/dccp: fix hashdance race for passive sessions

Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.

Only one must win the race, otherwise corruption would occur.

To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.

If request socket was not found, we have to undo the child creation.

This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix unconditional execution of cpu_to_le16()

The commit 3092e5e4cc79 ("i40e: add little endian conversion for
checksum") fixed the checksum bug on big-endian architecture.

But we should not execute cpu_to_le16() unconditionally. Thus, put
cpu_to_le16() under certain condition.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: clean up local variable initialization

In both i40e_calc_nvm_checksum() and i40e_update_nvm_checksum(), the
local variables designated by 'ret_code' are overwritten immediately. As
such, they should merely be declared.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: clean up local variable initialization

In i40evf_msix_aq(), the first two lines of rd32() are mainly to clear
the registers. If we initialize 'val' at this point, it will be
overwritten immediately. We shall simply discard the return value here.

When we initialize 'val', we might as well include the mask in one step.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add missing kernel-doc argument

The following kernel-doc arguments for their respective functions are
missing:

1) @cd_type_cmd_tso_mss for i40e_tso();
2) @cd_type_cmd_tso_mss for i40e_tsyn();
3) @tx_ring for i40e_tx_enable_csum().

Add them all for the kernel-doc requirement.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: add missing kernel-doc argument

@flush has been missing since the inception of i40evf_irq_enable(). Add
it for the kernel doc.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: re-use %*ph specifier to hexdump a data

Instead of using a custom approach change the code to use %*ph format
specifier.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Bump i40e to 1.3.46 and i40evf to 1.3.33

Bump up the version...

Change-ID: Ib8d501021671ba20250115ed54330e2c182255b7
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Disable VEB bridge mode with SR-IOV failure

If a call to enable SR-IOV in the kernel failed, we need to disable
I40E_FLAG_VEB_MODE_ENABLED, so that bridge mode could fall back to
VEPA, which is a default.

Change-ID: I12b6f776769506db85b29bea94b9c88d0b5ee65e
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix an incorrect OEM version string

This patch fixes a problem where the driver output of the OEM
version string varied from the other tools. The mask value
and the order of operations were incorrect, per the original
change request. Without this patch, the version string will
appear incorrect from the driver.

Change-ID: Ie1ca6485284b4ce3b57e5a99b18b7641617c7ef7
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix inconsistent statuses after a PF reset

This patch fixes a problem of possibly getting inconsistent flow control
statuses after a PF reset. Requested_mode was being set with a default
value during probing, but the initial HW state could be different from
this mode.

Change-ID: I772bf07b78616e87086418d4bd87954b66fa17cd
Signed-off-by: Helin Zhang <helin.zhang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: use correct struct for list manipulation

Not sure how this compiles at all. Use the correct struct for
manipulating the VLAN filter list. Without this, the VLAN filter
list doesn't get processed correctly, and VLAN filters will not
be re-enabled after any kind of reset.

Change-ID: Iceff2dc089f303058fb71ecb08419eed471e0e90
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix VEB/VEPA bridge mode mismatch issue

Fix i40e_is_vsi_uplink_mode_veb to check if bridge is actually
in VEB mode before allowing LB in the add VSI routine, instead of
unconditionally returning VEB bridge mode.

Change-ID: I162397b1bdd02367735fe9baaeb51465be2a3ce9
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix a bug in debugfs with add/del macaddr

The new code flow requires us to grab the filter list lock before
adding/deleting the filter.

Change-ID: I4eaef508ab4da2d1b2e23f20f2a78d931d5b6aeb
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Add a workaround to drop all flow control frames

This patch adds a workaround to drop any flow control frames from being
transmitted from any VSI. FW can still send flow control frames if flow
control is enabled.

With this patch in place a malicious VF cannot send flow control or PFC
packets out on the wire.

Change-ID: I4303b24e98b93066d2767fec24dfe78be591c277
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ppp: fix pppoe_dev deletion condition in pppoe_release()

We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.

Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_key: fix two typos

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Use wmb before updating current descriptor count

The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.

Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/phy: micrel: Add workaround for bad autoneg

Very rarely, the KSZ9031 will appear to complete autonegotiation, but
will drop all traffic afterwards. When this happens, the idle error
count will read 0xFF after autonegotiation completes. Reset the PHY
when in that state.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 address

Currently adding a new ipv4 address always cause the creation of the
related network route, with default metric. When a host has multiple
interfaces on the same network, multiple routes with the same metric
are created.

If the userspace wants to set specific metric on each routes, i.e.
giving better metric to ethernet links in respect to Wi-Fi ones,
the network routes must be deleted and recreated, which is error-prone.

This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4
address. When an address is added with such flag set, no associated
network route is created, no network route is deleted when
said IP is gone and it's up to the user space manage such route.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-overflow-arith'

Hannes Frederic Sowa says:

====================
overflow-arith: begin to add support for overflow builtins functions

I add a new header, linux/overflow-arith.h, as the central place to add
overflow and wrap-around checking functions. The reason I am doing so
is that it can make use of compiler supported builtin functions which
can leverage hardware.

As I need this for a fix in the ipv6 stack, which is also included in
this series, I propose to add it sooner than later over Davem's net
tree. This is also the reason why I start slowly with only the one
function I need at this time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues

Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

overflow-arith: begin to add support for overflow builtin functions

The idea of the overflow-arith.h header is to collect overflow checking
functions in one central place.

If gcc compiler supports the __builtin_overflow_* builtins we use them
because they might give better performance, otherwise the code falls
back to normal overflow checking functions.

The builtin_overflow functions are supported by gcc-5 and clang. The
matter of supporting clang is to just provide a corresponding
CC_HAVE_BUILTIN_OVERFLOW, because the specific overflow checking builtins
don't differ between gcc and clang.

I just provide overflow_usub function here as I intend this to get merged
into net, more functions will definitely follow as they are needed.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: allow dctcp alpha to drop to zero

If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less
than 1 << dctcp_shift_g, then alpha may never reach zero. For example,
given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha
remains 15. The effect isn't noticeable in this case below cwnd=137, but
could gradually drive uncongested flows with leftover alpha down to
cwnd=137. A larger dctcp_shift_g would have a greater effect.

This change causes alpha=15 to drop to 0 instead of being decrementing by 1
as it would when alpha=16. However, it requires one less conditional to
implement since it doesn't have to guard against subtracting 1 from 0U. A
decay of 15 is not unreasonable since an equal or greater amount occurs at
alpha >= 240.

Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix the incorrect return value of throw route

The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: unbreak receiving of gro skb with frag list

We don't have fraglist support in TAP_FEATURES. This will lead
software segmentation of gro skb with frag list. Fixes by having
frag list support in TAP_FEATURES.

With this patch single session of netperf receiving were restored from
about 5Gb/s to about 12Gb/s on mlx4.

Fixes a567dd6252 ("macvtap: simplify usage of tap_features")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix egress tunnel info.

While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices. Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: New Broadcom ethernet driver.

Broadcom ethernet driver for the new family of NetXtreme-C/E
ethernet devices.

v5:
  - Removed empty blank lines at end of files (noted by David Miller).
  - Moved busy poll helper functions to bnxt.h to at least make the
    .c file look less cluttered with #ifdef (noted by Stephen Hemminger).

v4:
  - Broke up 2 long message strings with "\n" (suggested by John Linville)
  - Constify an array of strings (suggested by Stephen Hemminger)
  - Improve bnxt_vf_pciid() (suggested by Stephen Hemminger)
  - Use PCI_VDEVICE() to populate pci_device_id table for more compact
    source.

v3:
  - Fixed 2 more sparse warnings.
  - Removed some unused structures in .h files.

v2:
  - Fixed all kbuild test robot reported warnings.
  - Fixed many of the checkpatch.pl errors and warnings.
  - Fixed the Kconfig description (noted by Dmitry Kravkov).

Acked-by: Eddie Wai <eddie.wai@broadcom.com>
Acked-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: remove debugfs interface

It is preferable to have a common debugfs interface for DSA or switchdev
instead of a driver specific one. Thus remove the mv88e6xxx debug code.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-22

This series contains fixes to i40e only.

Jesse provides two small fixes for i40e, first fixes counters that were
being displayed incorrectly due to indexing beyond the array of strings
when printing stats. Then fixed the fact that the driver was printing
a message about not being able to assign VMDq because a lack of MSI-X
vectors, when it was not true. It was due to a line missing that
initialized a variable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Fix lockdep issue.

The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.

Testing: Verified fix on kernel with lockdep enabled.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: fix annoying message

The driver was printing a message about not being able
to assign VMDq because of a lack of MSI-X vectors.

This was because a line was missing that initialized a variable,
simply a merge error.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix stats offsets

The code was setting up stats that were not being initialized.
This caused several counters to be displayed incorrectly, due
to indexing beyond the array of strings when printing stats.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

qmi_wwan: add Sierra Wireless MC74xx/EM74xx

New device IDs shamelessly lifted from the vendor driver.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2015-10-22

1) Fix IPsec pre-encap fragmentation for GSO packets.
   From Herbert Xu.

2) Fix some header checks in _decode_session6.
   We skip the header informations if the data pointer points
   already behind the header in question for some protocols.
   This is because we call pskb_may_pull with a negative value
   converted to unsigened int from pskb_may_pull in this case.
   Skipping the header informations can lead to incorrect policy
   lookups. From Mathias Krause.

3) Allow to change the replay threshold and expiry timer of a
   state without having to set other attributes like replay
   counter and byte lifetime. Changing these other attributes
   may break the SA. From Michael Rossberg.

4) Fix pmtu discovery for local generated packets.
   We may fail dispatch to the inner address family.
   As a reault, the local error handler is not called
   and the mtu value is not reported back to userspace.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-port_fdb_dump'

Vivien Didelot says:

====================
net: dsa: implement port_fdb_dump in drivers

Not all switch chips provide a Get Next kind of operation to dump FDB entries.
It is preferred to let the driver handle the dump operation the way it works
best for the chip. Thus, drop port_fdb_getnext and implement the port_fdb_dump
operation in DSA, which pushes the switchdev FDB dump callback down to the
drivers. mv88e6xxx is the only driver affected and is updated accordingly.

v3 -> v4: fix rejects on latest net-next

v2 -> v3: opencode switchdev_obj_dump_cb_t to avoid multiple typedef;
use ether_addr_copy in fdb_dump

v1 -> v2: fix a few "return err" instead of "goto unlock" in mv88e6xxx.c
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: remove port_fdb_getnext

No driver implements port_fdb_getnext anymore, and port_fdb_dump is
preferred anyway, so remove this function from DSA.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: remove port_fdb_getnext

Now that port_fdb_dump is implemented and even simpler, get rid of
port_fdb_getnext.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: implement port_fdb_dump

Implement the port_fdb_dump DSA operation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: write MAC outside of ATU Get Next code

There is no need to write the MAC address before every Get Next
operation, since ATU MAC registers are not cleared between calls.

Move the _mv88e6xxx_atu_mac_write call outside of _mv88e6xxx_atu_getnext
so future code could call ATU Get Next multiple times and save a few
register access.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: write VID outside of VTU Get Next code

There is no need to write the VLAN ID before every Get Next operation,
since the VTU VID register is not cleared between calls.

Move the VID write call in a _mv88e6xxx_vtu_vid_write function outside
of _mv88e6xxx_vtu_getnext so future code could call VTU Get Next
multiple times and save a few register accesses.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add port_fdb_dump function

Not all switch chips support a Get Next operation to iterate on its FDB.
So add a more simple port_fdb_dump function for them.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set

741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.

Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Here's another set of patches for the current cycle:
* I merged net-next back to avoid a conflict with the
* cfg80211 scheduled scan API extensions
* preparations for better scan result timestamping
* regulatory cleanups
* mac80211 statistics cleanups
* a few other small cleanups and fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'isdn-null-deref'

Karsten Keil says:

====================
Fix potential NULL pointer access and memory leak in ISDN layer2 functions

Insu Yun did brinup the issue with not checking the skb_clone() return
value in the layer2 I-frame ull functions.
This series fix the issue in a way which avoid protocol violations/data loss
on a temporary memory shortage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: fix OOM condition for sending queued I-Frames

The old code did not check the return value of skb_clone().
The extra skb_clone() is not needed at all, if using skb_realloc_headroom()
instead, which gives us a private copy with enough headroom as well.
We need to requeue the original skb if the call failed, because we cannot
inform upper layers about the data loss. Restructure the code to minimise
rollback effort if it happens.
This fix kernel bug #86091

Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ISDN: fix OOM condition for sending queued I-Frames

The skb_clone() return value was not checked and the skb_realloc_headroom()
usage was wrong, the old skb was not freed. It turned out, that the
skb_clone is not needed at all, the skb_realloc_headroom() will create a
private copy with enough headroom and the original SKB can be used for the
ACK queue.
We need to requeue the original skb if the call failed, since the upper
layer cannot be informed about memory shortage.

Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: sock_put wasn't safe to call in interrupt context

In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions to a worker thread. All these functions were
deallocation of resources related to the transport itself.

Furthermore, an unused callback was removed to simplify the
cleanup.

Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.

Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).

Reviewed-by: Aditya Asarwade <asarwade@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hisilicon: deals with the sub ctrl by syscon

the global Soc configuration is treated by syscon, and sub ctrl bus is
Soc bus. it has to be treated by syscon.

Signed-off-by: yankejian <yankejian@huawei.com>
Signed-off-by: lisheng <lisheng011@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: fix locking around NETLINK_LIST_MEMBERSHIPS

Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.

This can be easily reproduced by the following snippet:
    int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
    void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
    int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));

This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().

Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-trivial-fixes'

Hariprasad Shenai says:

====================
Trivial fixes for cxgb4 driver

This patch series updates driver description for next gen. adapters, updates
firmware info., returns error for setup_rss error case, restores L1
configuration in case of FW rejects new config, updates and aligns ethtool
get stats settings, etc

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update ethtool get_drvinfo to get regdump len

Update ethtool get_drvinfo to display regdump len and also update
firmware string version print to display N/A in case FW isn't present

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Use vmalloc, if kmalloc fails

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Return error if setup_rss is called before probe

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/cxgb4vf: Update driver desc. to include Chelsio T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add info print to display number of MSI-X vectors allocated

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Restore L1 cfg, if FW rejects new L1 cfg settings

In the ethtool set_settings() routine we need to remember our old L1
Configuration in case the firmware rejects the request and then restore
that.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Don't disallow turning off auto-negotiation

For {1, 10, 40} Gb/s. Prohibiting turning off autonegotiation isn't anywhere
in the standard.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Align ethtool get stat settings

Align the ethtool get stats settings with the rest so it looks uniform

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Use dev_queue_xmit for vport send.

With use of lwtunnel, we can directly call dev_queue_xmit()
rather than calling netdev vport send operation.
Following change make tunnel vport code bit cleaner.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Fix incorrect type use.

Patch fixes following sparse warning.
net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types)
net/openvswitch/flow_netlink.c:583:30: expected restricted __be16 [usertype] ipv4
net/openvswitch/flow_netlink.c:583:30: got int

Fixes: 6b26ba3a7d ("openvswitch: netlink attributes for IPv6 tunneling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-perf'

Alexei Starovoitov says:

====================
bpf_perf_event_output helper

Over the last year there were multiple attempts to let eBPF programs
output data into perf events by He Kuang and Wangnan.
The last one was:
https://lkml.org/lkml/2015/7/20/736
It was almost perfect with exception that all bpf programs would sent
data into one global perf_event.
This patch set takes different approach by letting user space
open independent PERF_COUNT_SW_BPF_OUTPUT events, so that program
output won't collide.

Wangnan is working on corresponding perf patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples: bpf: add bpf_perf_event_output example

Performance test and example of bpf_perf_event_output().
kprobe is attached to sys_write() and trivial bpf program streams
pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event.

Usage:
$ sudo ./bld_x64/samples/bpf/trace_output
recv 2968913 events per sec

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: introduce bpf_perf_event_output() helper

This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) and
store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.

Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is the debug
facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: pad raw data samples automatically

Instead of WARN_ON in perf_event_output() on unpaded raw samples,
pad them automatically.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: read direct ifindex instead of iflink

In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is
being grabbed from dev_get_iflink. This works for the physical device
case, since as the documentation of that function notes: "Physical
interfaces have the same 'ifindex' and 'iflink' values.". However, if
the master device is a veth, and the pairs are in separate net
namespaces, the route lookup will fail with -ENODEV due to outer veth
pair being in a separate namespace from the ipvlan master/routing
namespace.

ns0 | ns1 | ns2
veth0a--|--veth0b--|--ipvl0

In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above
configuration will pass fl.flowi4_oif == veth0a to
ip_route_output_flow(), but *net == ns1.

Notice also that ipv6 processing is not using iflink. Since there is a
discrepancy in usage, fixup both v4 and v6 case to use local dev
variable.

Tested this with l3 ipvlan on top of veth, as well as with single
physical interface in the top namespace.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83848: Add TI DP83848 Ethernet PHY

Add support for the TI DP83848 Ethernet PHY device.

The DP83848 is a highly reliable, feature rich, IEEE 802.3 compliant
single port 10/100 Mb/s Ethernet Physical Layer Transceiver supporting
the MII and RMII interfaces.

Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fastopen: limit max_qlen

Allowing an application to set whatever limit for
the list of recently RST fastopen sessions [1] is not wise,
as it open ways to deplete kernel memory.

Cap the user provided limit by somaxconn sysctl,
like listen() backlog.

[1] https://tools.ietf.org/html/rfc7413#section-5.1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bluetooth: Increase minor version of core module

With the addition of support for diagnostic feature, it makes sense to
increase the minor version of the Bluetooth core module.

The module version is not used anywhere, but it gives a nice extra
hint for debugging purposes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

ieee802154: 6lowpan: fix memory leak

Looking at current situation of memory management in 6lowpan receive
function I detected some invalid handling. After calling
lowpan_invoke_rx_handlers we will do a kfree_skb and then NET_RX_DROP on
error handling. We don't do this before, also on
skb_share_check/skb_unshare which might manipulate the reference
counters.

After running some 'grep -r "dev_add_pack" net/' to look how others
packet-layer receive callbacks works I detected that every subsystem do
a kfree_skb, then NET_RX_DROP without calling skb functions which
might manipulate the skb reference counters. This is the reason why we
should do the same here like all others subsystems. I didn't find any
documentation how the packet-layer receive callbacks handle NET_RX_DROP
return values either.

This patch will add a kfree_skb, then NET_RX_DROP handling for the
"trivial checks", in case of skb_share_check/skb_unshare the kfree_skb
call will be done inside these functions.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Make hci_disconnect() behave correctly for all states

There are a few places that don't explicitly check the connection
state before calling hci_disconnect(). To make this API do the right
thing take advantage of the new hci_abort_conn() API and also make
sure to only read the clock offset if we're really connected.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Take advantage of connection abort helpers

Convert the various places mapping connection state to
disconnect/cancel HCI command to use the new hci_abort_conn helper
API.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Introduce hci_req helper to abort a connection

There are several different places needing to make sure that a
connection gets disconnected or canceled. The exact action needed
depends on the connection state, so centralizing this logic can save
quite a lot of code duplication.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: hci_bcm: checking for ERR_PTR instead of NULL

bt_skb_alloc() returns NULL on error, it never returns an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Fix crash in SMP when unpairing

When unpairing the keys stored in hci_dev are removed. If SMP is
ongoing the SMP context will also have references to these keys, so
removing them from the hci_dev lists will make the pointers invalid.
This can result in the following type of crashes:

BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<c11f26be>] __list_del_entry+0x44/0x71
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: hci_uart btqca btusb btintel btbcm btrtl hci_vhci rfcomm bluetooth_6lowpan bluetooth
CPU: 0 PID: 723 Comm: kworker/u5:0 Not tainted 4.3.0-rc3+ #1379
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
Workqueue: hci0 hci_rx_work [bluetooth]
task: f19da940 ti: f1a94000 task.ti: f1a94000
EIP: 0060:[<c11f26be>] EFLAGS: 00010202 CPU: 0
EIP is at __list_del_entry+0x44/0x71
EAX: c0088d20 EBX: f30fcac0 ECX: 6b6b6b6b EDX: 6b6b6b6b
ESI: f4b60000 EDI: c0088d20 EBP: f1a95d90 ESP: f1a95d8c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 6b6b6b6b CR3: 319e5000 CR4: 00000690
Stack:
  f30fcac0 f1a95db0 f82dc3e1 f1bfc000 00000000 c106524f f1bfc000 f30fd020
  f1a95dc0 f1a95dd0 f82dcbdb f1a95de0 f82dcbdb 00000067 f1bfc000 f30fd020
  f1a95de0 f1a95df0 f82d1126 00000067 f82d1126 00000006 f30fd020 f1bfc000
Call Trace:
  [<f82dc3e1>] smp_chan_destroy+0x192/0x240 [bluetooth]
  [<c106524f>] ? trace_hardirqs_on_caller+0x14e/0x169
  [<f82dcbdb>] smp_teardown_cb+0x47/0x64 [bluetooth]
  [<f82dcbdb>] ? smp_teardown_cb+0x47/0x64 [bluetooth]
  [<f82d1126>] l2cap_chan_del+0x5d/0x14d [bluetooth]
  [<f82d1126>] ? l2cap_chan_del+0x5d/0x14d [bluetooth]
  [<f82d40ef>] l2cap_conn_del+0x109/0x17b [bluetooth]
  [<f82d40ef>] ? l2cap_conn_del+0x109/0x17b [bluetooth]
  [<f82c0205>] ? hci_event_packet+0x5b1/0x2092 [bluetooth]
  [<f82d41aa>] l2cap_disconn_cfm+0x49/0x50 [bluetooth]
  [<f82d41aa>] ? l2cap_disconn_cfm+0x49/0x50 [bluetooth]
  [<f82c0228>] hci_event_packet+0x5d4/0x2092 [bluetooth]
  [<c1332c16>] ? skb_release_data+0x6a/0x95
  [<f82ce5d4>] ? hci_send_to_monitor+0xe7/0xf4 [bluetooth]
  [<c1409708>] ? _raw_spin_unlock_irqrestore+0x44/0x57
  [<f82b3bb0>] hci_rx_work+0xf1/0x28b [bluetooth]
  [<f82b3bb0>] ? hci_rx_work+0xf1/0x28b [bluetooth]
  [<c10635a0>] ? __lock_is_held+0x2e/0x44
  [<c104772e>] process_one_work+0x232/0x432
  [<c1071ddc>] ? rcu_read_lock_sched_held+0x50/0x5a
  [<c104772e>] ? process_one_work+0x232/0x432
  [<c1047d48>] worker_thread+0x1b8/0x255
  [<c1047b90>] ? rescuer_thread+0x23c/0x23c
  [<c104bb71>] kthread+0x91/0x96
  [<c14096a7>] ? _raw_spin_unlock_irq+0x27/0x44
  [<c1409d61>] ret_from_kernel_thread+0x21/0x30
  [<c104bae0>] ? kthread_parkme+0x1e/0x1e

To solve the issue, introduce a new smp_cancel_pairing() API that can
be used to clean up the SMP state before touching the hci_dev lists.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Disable auto-connection parameters when unpairing

For connection parameters that are left around until a disconnection
we should at least clear any auto-connection properties. This way a
new Add Device call is required to re-set them after calling Unpair
Device.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

net: mdio-gpio: move platform data header

This header file only contains the platform data structure definition,
so move it to the include/linux/platform_data/ directory.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: gemini: remove unnecessary mdio-gpio includes

Remove the inclusion of linux/mdio-gpio.h in nas4220b, wbd111 and wbd222
boards since mdio-gpio is not used.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sun4i-emac: Properly free resources on probe failure and remove

Fix sun4i-emac not releasing the following resources:
-iomapped memory not released on probe-failure nor on remove
-clock not getting disabled on probe-failure nor on remove
-sram not being released on remove

And while at it also add error checking to the clk_prepare_enable call
done on probe.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hisilicon: fix ptr_ret.cocci warnings

drivers/net/ethernet/hisilicon/hns/hnae.c:442:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: gro: support sit protocol

Tom Herbert added SIT support to GRO with commit
19424e052fb4 ("sit: Add gro callbacks to sit_offload"),
later reverted by Herbert Xu.

The problem came because Tom patch was building GRO
packets without proper meta data : If packets were locally
delivered, we would not care.

But if packets needed to be forwarded, GSO engine was not
able to segment individual segments.

With the following patch, we correctly set skb->encapsulation
and inner network header. We also update gso_type.

Tested:

Server :
netserver
modprobe dummy
ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
arp -s 8.0.0.100 4e:32:51:04:47:e5
iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
ifconfig sixtofour0
sixtofour0 Link encap:IPv6-in-IPv4
          inet6 addr: 2002:af6:798::1/128 Scope:Global
          inet6 addr: 2002:af6:798::/128 Scope:Global
          UP RUNNING NOARP  MTU:1480  Metric:1
          RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)

Client :
netperf -H 2002:af6:798::1 -l 1000 &

Checked on server traffic copied on dummy0 and verify segments were
properly rebuilt, with proper IP headers, TCP checksums...

tcpdump on eth0 shows proper GRO aggregation takes place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>