git.openwrt.org Git - openwrt/staging/blogic.git/log

mlx4: avoid unnecessary dirtying of critical fields

While stressing a 40Gbit mlx4 NIC with busy polling, I found false
sharing in mlx4 driver that can be easily avoided.

This patch brings an additional 7 % performance improvement in UDP_RR
workload.

1) If we received no frame during one mlx4_en_process_rx_cq()
   invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons

2) Do not refill rx buffers if we have plenty of them.
   This avoids false sharing and allows some bulk/batch optimizations.
   Page allocator and its locks will thank us.

Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
cpu handling NIC IRQ should be changed. We should return budget-1
instead, to not fool net_rx_action() and its netdev_budget.

v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: use READ_ONCE() instead of barrier()

barrier() is a big hammer compared to READ_ONCE(),
and requires comments explaining what is protected.

READ_ONCE() is more precise and compiler should generate
better overall code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: avoid one cache line miss in recvmsg()

UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
requesting a cold cache line being read in cpu cache.

We can avoid this cache line miss for UDP sockets,
as partial_cov has a meaning only for UDPLite.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-bpf-refcnt-fixes'

Daniel Borkmann says:

====================
Couple of BPF refcount fixes for mlx5

Various mlx5 bugs on eBPF refcount handling found during review.
Last patch in series adds a __must_check to BPF helpers to make
sure we won't run into it again w/o compiler complaining first.

v2 -> v3:

- Just reworked patch 2/4 so we don't need bpf_prog_sub().
- Rebased, rest as is.

v1 -> v2:

- After discussion with Alexei, we agreed upon rebasing the
   patches against net-next.
- Since net-next, I've also added the __must_check to enforce
   future users to check for errors.
- Fixed up commit message #2.
- Simplify assignment from patch #1 based on Saeed's feedback
   on previous set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add __must_check attributes to refcount manipulating helpers

Helpers like bpf_prog_add(), bpf_prog_inc(), bpf_map_inc() can fail
with an error, so make sure the caller properly checks their return
value and not just ignores it, which could worst-case lead to use
after free.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup

mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled, otherwise we leak the program.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, mlx5: fix various refcount issues in mlx5e_xdp_set

There are multiple issues in mlx5e_xdp_set():

1) The batched bpf_prog_add() is currently not checked for errors. When
   doing so, it should be done at an earlier point in time to makes sure
   that we cannot fail anymore at the time we want to set the program for
   each channel. The batched refs short-cut can only be performed when we
   don't need to perform a reset for changing the rq type and the device
   was in opened state. In case the device was not in opened state, then
   the next mlx5e_open_locked() will aquire the refs from the control prog
   via mlx5e_create_rq(), same when we need to perform a reset.

2) When swapping the priv->xdp_prog, then no extra reference count must be
   taken since we got that from call path via dev_change_xdp_fd() already.
   Otherwise, we'd never be able to release the program. Also, bpf_prog_add()
   without checking the return code could fail.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, mlx5: fix mlx5e_create_rq taking reference on prog

In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail since 92117d8443bc
("bpf: fix refcnt overflow"), so we really must check it. Take the reference
right when we assign it to the rq from priv->xdp_prog, and just drop the
reference on error path. Destruction in mlx5e_destroy_rq() looks good, though.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mV88e6xxx-interrupt-fixes'

Andrew Lunn says:

====================
Fixes for the MV88e6xxx interrupt code

The interrupt code was never tested with a board who's probing
resulted in an -EPROBE_DEFFERED. So the clean up paths never got
tested. I now do have -EPROBE_DEFFERED, and things break badly during
cleanup. These are the fixes.

This is fixing code in net-next.

v2:
Fix typo pointed out by David Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Hold the mutex while freeing g1 interrupts

Freeing interrupts requires switch register access to mask the
interrupts. Hence we must hold the register mutex.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix releasing for the global2 interrupts

It is not possible to use devm_request_threaded_irq() because we have
two stacked interrupt controllers in one device. The lower interrupt
controller cannot be removed until the upper is fully removed. This
happens too late with the devm API, resulting in error messages about
removing a domain while there is still an active interrupt. Swap to
using request_threaded_irq() and manage the release of the interrupt
manually.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix cleanup on error for g1 interrupt setup

On error, remask the interrupts, release all maps, and remove the
domain. This cannot be done using the mv88e6xxx_g1_irq_free() because
some of these actions are not idempotent.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt

Fix the g1 interrupt free code such that is masks any further
interrupts, and then releases the interrupt.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix unconditional irq freeing

Trying to remove an IRQ domain that was not created results in an
Opps. Add the necessary checks that the irqs were created before
freeing them.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix typos when removing g1 interrupts

Simple typos, s/g2/g1/

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix bogus cast in skb_pagelen() and use unsigned variables

1) cast to "int" is unnecessary:
   u8 will be promoted to int before decrementing,
   small positive numbers fit into "int", so their values won't be changed
   during promotion.

   Once everything is int including loop counters, signedness doesn't
   matter: 32-bit operations will stay 32-bit operations.

   But! Someone tried to make this loop smart by making everything of
   the same type apparently in an attempt to optimise it.
   Do the optimization, just differently.
   Do the cast where it matters. :^)

2) frag size is unsigned entity and sum of fragments sizes is also
   unsigned.

Make everything unsigned, leave no MOVSX instruction behind.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4)
function                                     old     new   delta
skb_cow_data                                 835     834      -1
ip_do_fragment                              2549    2548      -1
ip6_fragment                                3130    3128      -2
Total: Before=154865032, After=154865028, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: smaller nla_attr_minlen table

Length of a netlink attribute may be u16 but lengths of basic attributes
are much smaller, so small we can save 16 bytes of .rodata and pocket
change inside .text.

16-bit is worse on x86-64 than 8-bit because of operand size override prefix.

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19)
function                                     old     new   delta
validate_nla                                 418     417      -1
nla_policy_len                                66      64      -2
nla_attr_minlen                               32      16     -16
Total: Before=154865051, After=154865032, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: use "unsigned int" in nla_next()

->nla_len is unsigned entity (it's length after all) and u16,
thus it can't overflow when being aligned into int/unsigned int.

(nlmsg_next has the same code, but I didn't yet convince myself
it is correct to do so).

There is pointer arithmetic in this function and offset being
unsigned is better:

add/remove: 0/0 grow/shrink: 1/64 up/down: 5/-309 (-304)
function                                     old     new   delta
nl80211_set_wiphy                           1444    1449      +5
team_nl_cmd_options_set                      997     995      -2
tcf_em_tree_validate                         872     870      -2
switchdev_port_bridge_setlink                352     350      -2
switchdev_port_br_afspec                     312     310      -2
rtm_to_fib_config                            428     426      -2
qla4xxx_sysfs_ddb_set_param                 2193    2191      -2
qla4xxx_iface_set_param                     4470    4468      -2
ovs_nla_free_flow_actions                    152     150      -2
output_userspace                             518     516      -2
...
nl80211_set_reg                              654     649      -5
validate_scan_freqs                          148     142      -6
validate_linkmsg                             288     282      -6
nl80211_parse_connkeys                       489     483      -6
nlattr_set                                   231     224      -7
nf_tables_delsetelem                         267     260      -7
do_setlink                                  3416    3408      -8
netlbl_cipsov4_add_std                      1672    1659     -13
nl80211_parse_sched_scan                    2902    2888     -14
nl80211_trigger_scan                        1738    1720     -18
do_execute_actions                          2821    2738     -83
Total: Before=154865355, After=154865051, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make struct napi_alloc_cache::skb_count unsigned int

size_t is way too much for an integer not exceeding 64.

Space savings: 10 bytes!

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-10 (-10)
function                                     old     new   delta
napi_consume_skb                             165     163      -2
__kfree_skb_flush                             56      53      -3
__kfree_skb_defer                             97      92      -5
Total: Before=154865639, After=154865629, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batadv-next-for-davem-20161119' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature patchset includes the following changes:

- 6 patches adding functionality to detect a WiFi interface under
   other virtual interfaces, like VLANs. They introduce a cache for
   the detected the WiFi configuration to avoid RTNL locking in
   critical sections. Patches have been prepared by Marek Lindner
   and Sven Eckelmann

- Enable automatic module loading for genl requests, by Sven Eckelmann

- Fix a potential race condition on interface removal. This is not
   happening very often in practice, but requires bigger changes to fix,
   so we are sending this to net-next. By Linus Luessing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: Use virtio_net_hdr_from_skb() directly.

Remove static function __packet_rcv_vnet(), which only called
virtio_net_hdr_from_skb() and BUG()ged out if an error code was
returned. Instead, call virtio_net_hdr_from_skb() from the former
call sites of __packet_rcv_vnet() and actually use the error handling
code that is already there.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: Use virtio_net_hdr_to_skb().

Use the common virtio_net_hdr_to_skb() instead of open coding it.
Other call sites were changed by commit fd2a0437dc, but this one was
missed, maybe because it is split in two parts of the source code.

Interim comparisons of 'vnet_hdr->gso_type' still work as both the
vnet_hdr and skb notion of gso_type is zero when there is no gso.

Fixes: fd2a0437dc ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Do not clear memory for struct virtio_net_hdr twice.

virtio_net_hdr_from_skb() clears the memory for the header, so there
is no point for the callers to do the same.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net.h: Fix comment.

Fix incorrent comment after the final #endif.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb().

No point storing the return value of virtio_net_hdr_to_skb() or
virtio_net_hdr_from_skb() to a variable when the value is used only
once as a boolean in an immediately following if statement.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Allocate Tx queues dynamically

Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio CN23XX: bitwise vs logical AND typo

We obviously intended a bitwise AND here, not a logical one.

Fixes: 8c978d059224 ("liquidio CN23XX: Mailbox support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lan78xx: relocate mdix setting to phy driver

Relocate mdix code to phy driver to be called at config_init().

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-marvell-freescale-compile-test'

Florian Fainelli says:

====================
net: Enable COMPILE_TEST for Marvell & Freescale drivers

This patch series allows building the Freescale and Marvell Ethernet network
drivers with COMPILE_TEST.

Changes in v4:

- add proper HAS_DMA to fix build errors on m32r
- provide an inline stub for mvebu_mbus_get_dram_win_info
- added an additional patch to fix build errors with mv88e6xxx on m32r

Changes in v3:

- reorder patches to avoid introducing a build warning between commits

Changes in v2:

- rename register define clash when building for i386 (spotted by LKP)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Select IRQ_DOMAIN

Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: marvell: Allow drivers to be built with COMPILE_TEST

All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info

In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fsl: Allow most drivers to be built with COMPILE_TEST

There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:

* FEC, for which we would need to define a default register layout if no
  supported architecture is defined

* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
  to a generic location)

* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
  architectures (like m32r)

We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)

which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: gianfar_ptp: Rename FS bit to FIPERST

FS is a global symbol used by the x86 32-bit architecture, fixes builds
re-definitions:

>> drivers/net/ethernet/freescale/gianfar_ptp.c:75:0: warning: "FS"
>> redefined
    #define FS                    (1<<28) /* FIPER start indication */

   In file included from arch/x86/include/uapi/asm/ptrace.h:5:0,
                    from arch/x86/include/asm/ptrace.h:6,
                    from arch/x86/include/asm/math_emu.h:4,
                    from arch/x86/include/asm/processor.h:11,
                    from include/linux/mutex.h:19,
                    from include/linux/kernfs.h:13,
                    from include/linux/sysfs.h:15,
                    from include/linux/kobject.h:21,
                    from include/linux/device.h:17,
                    from
drivers/net/ethernet/freescale/gianfar_ptp.c:23:
   arch/x86/include/uapi/asm/ptrace-abi.h:15:0: note: this is the
location of the previous definition
    #define FS 9

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Update connection validation for backplane mode

Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-phy-downshift'

Allan W. Nielsen says:

====================
Adding PHY-Tunables and downshift support

(This is a re-post of the v3 patch set with a new cover letter - I was not
aware that the cover letters was used a commit comments in merge commits).

This series add support for PHY tunables, and uses this facility to
configure downshifting. The downshifting mechanism is implemented for MSCC
phys.

This series tries to address the comments provided back in mid October when
this feature was posted along with fast-link-failure. Fast-link-failure has
been separated out, but we would like to pick continue on that if/when we
agree on how the phy-tunables and downshifting should be done.

The proposed generic interface is similar to
ETHTOOL_GTUNABLE/ETHTOOL_STUNABLE, it uses the same type
(ethtool_tunable/tunable_type_id) but a new enum (phy_tunable_id) is added
to reflect the PHY tunable.

The implementation just call the newly added function pointers in
get_tunable/set_tunable phy_device structure.

To configure downshifting, the ethtool_tunable structure is used. 'id' must
be set to 'ETHTOOL_PHY_DOWNSHIFT', 'type_id' must be set to
'ETHTOOL_TUNABLE_U8' and 'data' value configure the amount of downshift
re-tries.

If configured to DOWNSHIFT_DEV_DISABLE, then downshift is disabled If
configured to DOWNSHIFT_DEV_DEFAULT_COUNT, then it is up to the device to
choose a device-specific re-try count.

Tested on Beaglebone Black with VSC 8531 PHY.

Change set:
v0:

- Link Speed downshift and Fast Link failure-2 features coded by using
  Device tree.
v1:
- Split the Downshift and FLF2 features in different set of patches.
- Removed DT access and implemented IOCTL access suggested by Andrew.
- Added function pointers in get_tunable/set_tunable phy_device structure
v2:
- Added trace message with a hist is printed when downshifting clould not
  be eanbled with the requested count
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
  to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
  Andrew.
v3:
- Fixed Spelling in "net: phy: Add downshift get/set support in Microsemi
  PHYs driver"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add downshift get/set support in Microsemi PHYs driver

Implements the phy tunable function pointers and implement downshift
functionality for MSCC PHYs.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Core impl for ETHTOOL_PHY_DOWNSHIFT tunable

Adding validation support for the ETHTOOL_PHY_DOWNSHIFT. Functional
implementation needs to be done in the individual PHY drivers.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables

For operation in cabling environments that are incompatible with
1000BASE-T, PHY device may provide an automatic link speed downshift
operation. When enabled, the device automatically changes its 1000BASE-T
auto-negotiation to the next slower speed after a configured number of
failed attempts at 1000BASE-T. This feature is useful in setting up in
networks using older cable installations that include only pairs A and B,
and not pairs C and D.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE

Adding get_tunable/set_tunable function pointer to the phy_driver
structure, and uses these function pointers to implement the
ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE

Defines a generic API to get/set phy tunables. The API is using the
existing ethtool_tunable/tunable_type_id types which is already being used
for mac level tunables.

Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-next'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 update 2016-11-15

This series contains four humble mlx5 features.

From Gal,
- Add the support for PCIe statistics and expose them in ethtool

From Huy,
- Add the support for port module events reporting and statistics
- Add the support for driver version setting into FW (for display purposes only)

From Mohamad,
- Extended the command interface cache flexibility

This series was generated against commit
6a02f5eb6a8a ("Merge branch 'mlxsw-i2c")

V2:
- Changed plain "unsigned" to "unsigned int"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Expose PCIe statistics to ethtool

This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Add MPCNT register infrastructure

Add the needed infrastructure for future use of MPCNT register.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Set driver version into firmware

If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.

Example of driver version: "Linux,mlx5_core,3.0-1"

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Set driver version infrastructure

Add driver_version capability bit is enabled, and set driver
version command in mlx5_ifc firmware header. The only purpose
of this command is to store a driver version/OS string in FW
to be reported and displayed in various management systems,
such as IPMI/BMC.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Add port module event counters to ethtool stats

Add port module event counters to ethtool -S command

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Add handling for port module event

For each asynchronous port module event:
1. print with ratelimit to the dmesg log
2. increment the corresponding event counter

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Port module event hardware structures

Add hardware structures and constants definitions needed for module
events support.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Make the command interface cache more flexible

Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-tso-v2'

Edward Cree says:

====================
sfc: Firmware-Assisted TSO version 2

The firmware on 8000 series SFC NICs supports a new TSO API ("FATSOv2"), and
7000 series NICs will also support this in an imminent release. This series
adds driver support for this TSO implementation.
The series also removes SWTSO, as it's now equivalent to GSO. This does not
actually remove very much code, because SWTSO was grotesquely intertwingled
with FATSOv1, which will also be removed once 7000 series supports FATSOv2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: remove Software TSO

It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: handle failure to allocate TSOv2 contexts

If we fail to init the TXQ because of insufficient TSOv2 contexts,
try again with TSOv2 disabled.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Firmware-Assisted TSO version 2

Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
of TCP segmentation to the firmware, such that we now just pass a single
super-packet to the NIC. This means TSO has a great deal in common with a
normal DMA transmit, apart from adding a couple of option descriptors.
NIC-specific checks have been moved off the fast path and in to
initialisation where possible.

This also moves FATSOv1/SWTSO to a new file (tx_tso.c). The end of transmit
and some error handling is now outside TSO, since it is common with other
code.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Update EF10 register definitions

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Update MCDI protocol definitions

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: make struct pernet_operations::id unsigned int

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

void f(long *p, int i)
{
g(p[i]);
}

  roughly translates to

movsx rsi, esi
mov rdi, [rsi+...]
call g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function                                     old     new   delta
nfsd4_lock                                  3886    3959     +73
tipc_link_build_proto_msg                   1096    1140     +44
mac80211_hwsim_new_radio                    2776    2808     +32
tipc_mon_rcv                                1032    1058     +26
svcauth_gss_legacy_init                     1413    1429     +16
tipc_bcbase_select_primary                   379     392     +13
nfsd4_exchange_id                           1247    1260     +13
nfsd4_setclientid_confirm                    782     793     +11
...
put_client_renew_locked                      494     480     -14
ip_set_sockfn_get                            730     716     -14
geneve_sock_add                              829     813     -16
nfsd4_sequence_done                          721     703     -18
nlmclnt_lookup_host                          708     686     -22
nfsd4_lockt                                 1085    1063     -22
nfs_get_client                              1077    1050     -27
tcf_bpf_init                                1106    1076     -30
nfsd4_encode_fattr                          5997    5930     -67
Total: Before=154856051, After=154854321, chg -0.00%

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: enable busy polling for all sockets

UDP busy polling is restricted to connected UDP sockets.

This is because sk_busy_loop() only takes care of one NAPI context.

There are cases where it could be extended.

1) Some hosts receive traffic on a single NIC, with one RX queue.

2) Some applications use SO_REUSEPORT and associated BPF filter
   to split the incoming traffic on one UDP socket per RX
queue/thread/cpu

3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()

This patch records the napi_id of first received skb, giving more
reach to busy polling.

Tested:

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read

lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done

Before patch :
   27867   28870   37324   41060   41215
   36764   36838   44455   41282   43843
After patch :
   73920   73213   70147   74845   71697
   68315   68028   75219   70082   73707

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rds-ha-failover-fixes'

Sowmini Varadhan says:

====================
RDS: TCP: HA/Failover fixes

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair
   # modprobe -r rds-tcp on test nodes
done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Force every connection to be initiated by numerically smaller IP address

When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: Track peer's connection generation number

The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
(b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list

As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.

The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart

As sugested by Joe Perches, we could replace all
if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
by the simpler macro netif_xxx(priv, hw, priv->dev, ...)

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: replace hardcoded function name by __func__

Some printing have the function name hardcoded.
It is better to use __func__ instead.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: replace all pr_xxx by their netdev_xxx counterpart

The stmmac driver use lots of pr_xxx functions to print information.
This is bad since we cannot know which device logs the information.
(moreover if two stmmac device are present)

Furthermore, it seems that it assumes wrongly that all logs will always
be subsequent by using a dev_xxx then some indented pr_xxx like this:
kernel: sun7i-dwmac 1c50000.ethernet: no reset control found
kernel:  Ring mode enabled
kernel:  No HW DMA feature register supported
kernel:  Normal descriptors
kernel:  TX Checksum insertion supported

So this patch replace all pr_xxx by their netdev_xxx counterpart.
Excepts for some printing where netdev "cause" unpretty output like:
sun7i-dwmac 1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
In those case, I keep dev_xxx.

In the same time I remove some "stmmac:" print since
this will be a duplicate with that dev_xxx displays.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: sch_fq: use hash_ptr()

When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().

Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: remove napi_hash_del() calls

Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: remove napi_hash_del() call

There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()

netif_napi_del() does this already.

Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-i2c'

Jiri Pirko says:

====================
mlxsw: Introduce support for I2C bus

Vadim says:

This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB,
SwitchIB2 and Spectrum silicones.

It contains:
- Small changes in mlxsw core code, needed for I2C bus support;
- I2C driver, which obtains I2C input/output mailboxes setting and
provides command interface implementation.
- Minimal driver, which works on top of I2C driver and allows running
of mlxsw command interface over I2C bus;

Use case:
On system, which does not have PCI to ASIC (BMC), hwmon functionality
(sensors, pwm, tacho) will be available through I2C.

Usage (manual probing):
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device

Sysfs interface:
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: minimal: Add I2C support for Mellanox ASICs

Add I2C access support for Mellanox ASICs:
- Virtual Protocol Interconnect switches SwitchX, SwitchX2,
providing InfiniBand, Ethernet and Fibre Channel connectivity;
- Infiniband switches SwitchIB, SwitchIB2:
- Ethernet switch Spectrum.

Example of probing activation:
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Invoke driver's init/fini methods only if defined

We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.

Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Introduce support for I2C bus

Add I2C bus implementation for Mellanox Technologies Switch ASICs.
This includes command interface implementation using input / out mailboxes,
whose location is retrieved from the firmware during probe time.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Add bus capability flag

The mlxsw core infrastructure currently assumes that communication with
the ASIC is always possible using Ethernet management datagrams (EMADs),
but this is only possible when the PCI bus is used.

The bus capability flag is added to indicate EMAD support and make core
initialize EMAD communication only when it's set. Otherwise, register
access is done using command interface.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: replace IS_ERR_OR_NULL by IS_ERR

knav_queue_open always returns an ERR_PTR value, never NULL.  This can be
confirmed by unfolding the function calls and conforms to the function's
documentation.  Thus, replace IS_ERR_OR_NULL by IS_ERR in error checks.

The change is made using the following semantic patch:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x;
statement S;
@@

x = knav_queue_open(...);
if (
-   IS_ERR_OR_NULL
+   IS_ERR
    (x)) S
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: use new rhlist interface on sctp transport rhashtable

Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
to hash a node to one chain. If in one host thousands of assocs connect
to one server with the same lport and different laddrs (although it's
not a normal case), all the transports would be hashed into the same
chain.

It may cause to keep returning -EBUSY when inserting a new node, as the
chain is too long and sctp inserts a transport node in a loop, which
could even lead to system hangs there.

The new rhlist interface works for this case that there are many nodes
with the same key in one chain. It puts them into a list then makes this
list be as a node of the chain.

This patch is to replace rhashtable_ interface with rhltable_ interface.
Since a chain would not be too long and it would not return -EBUSY with
this fix when inserting a node, the reinsert loop is also removed here.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates.

New firmware spec. update, autoneg update, and UDP RSS support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add ethtool -n|-N rx-flow-hash support.

To display and modify the RSS hash.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add UDP RSS support for 57X1X chips.

The newer chips have proper support for 4-tuple UDP RSS.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Enhance autoneg support.

On some dual port NICs, the speed setting on one port can affect the
available speed on the other port. Add logic to detect these changes
and adjust the advertised speed settings when necessary.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Update firmware interface spec to 1.5.4.

Use the new FORCE_LINK_DWN bit to shutdown link during close.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: more efficient locking

Callers of netpoll_poll_lock() own NAPI_STATE_SCHED

Callers of netpoll_poll_unlock() have BH blocked between
the NAPI_STATE_SCHED being cleared and poll_lock is released.

We can avoid the spinlock which has no contention, and use cmpxchg()
on poll_owner which we need to set anyway.

This removes a possible lockdep violation after the cited commit,
since sk_busy_loop() re-enables BH before calling busy_poll_stop()

Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cadence: Add LSO support.

New Cadence GEM hardware support Large Segment Offload (LSO):
TCP segmentation offload (TSO) as well as UDP fragmentation
offload (UFO). Support for those features was added to the driver.

Signed-off-by: David S. Miller <davem@davemloft.net>

netronome: don't access real_num_rx_queues directly

The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
is enabled, so we now get a build failure when that is turned off:

netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

As far as I can tell, the check here is only used as an optimization that
we can skip in order to fix the compilation. If sysfs is disabled,
the following netif_set_real_num_rx_queues() has no effect.

Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: remove napi_hash_del() call

Calling napi_hash_del() after netif_napi_del() is pointless.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: subtract tunnel headroom from mtu on output redirect

This patch changes the lwtunnel_headroom() function which is called
in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
value when the lwtunnel state is OUTPUT_REDIRECT.

This patch enables e.g. SR-IPv6 encapsulations to work without
manually setting the route mtu.

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Adjust placement of FIB abort warning

The recent merge commit bb598c1b8c9b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
the FIB abort warning to fire whenever we flush the FIB tables - either
during module removal or actual abort.

Move it back to its rightful location in the FIB abort function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit

The SPEED_UNFORCED indicates the MAC & PHY should perform
auto-negotiation to determine a speed which works. If this is called
for, don't set the force bit. If it is set, the MAC actually does
10Gbps, why the internal PHYs don't support.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2016-11-15

This patch series addresses some minor issues found in the recently
accepted patch series for the AMD XGBE driver.

The following fixes are included in this driver update series:

- Fix a possibly uninitialized variable in the debugfs support
- Fix the GPIO pin number constraint check

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix maximum GPIO value check

The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated
from 0 to 15. The driver uses the wrong value (16) to validate the GPIO
pin range in the routines to set and clear the GPIO output pins. Update
the code to use the correct value (15).

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix possible uninitialized variable

The debugfs support in the driver uses a common routine to write the
debugfs values. In this routine, if the input file position is non-zero
then the write routine will not return an error and an output parameter
will not have been set. Because an error isn't returned an uninitialized
value will be written into a register.

Fix the common write routine to return an error if the input file position
is non-zero, which will propagate back to the caller.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nway-reset'

Florian Fainelli says:

====================
net: Implenent ethtool::nway_reset for a few drivers

This patch series depends on "net: phy: Centralize auto-negotation restart"
since it provides phy_ethtool_nway_reset as a helper function.

The drivers here already support PHYLIB, so there really is no reason why
restarting auto-negotiation would not be possible with these.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mvpp2: Implement ethtool::nway_reset

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mvneta: Implement ethtool::nway_reset

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethoc: Implement ethtool::nway_reset

Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Implement ethtool::nway_reset

Utilize the generic phy_ethtool_nway_reset() helper function to
implement an autonegotiation restart.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'busypoll-preemption-and-other-optimizations'

Eric Dumazet says:

====================
net: busy-poll: allow preemption and other optimizations

It is time to have preemption points in sk_busy_loop() and improve
its scalability.

Also napi_complete() and friends can tell drivers when it is safe to
not re-enable device interrupts, saving some overhead under
high busy polling.

mlx4 and bnx2x are changed accordingly, to show how this busy polling
status can be exploited by drivers.

Next steps will implement Zach Brown suggestion, where NAPI polling
would be enabled all the time for some chosen queues.
This is needed for efficient epoll() support anyway.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: switch to napi_complete_done()

Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.

Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: use napi_complete_done() return value

Do not rearm interrupts if we are busy polling.

mlx4 uses separate CQ for TX and RX, so number of TX interrupts
does not change, unfortunately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: busy-poll: return busypolling status to drivers

NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.

In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.

Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.

This is optional, legacy drivers should work fine even
if not updated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>