openwrt/staging/blogic.git
6 years agomlxsw: spectrum_span: Allow bridge for gretap mirror
Petr Machata [Sun, 29 Apr 2018 07:56:13 +0000 (10:56 +0300)]
mlxsw: spectrum_span: Allow bridge for gretap mirror

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: Respin SPAN on switchdev events
Petr Machata [Sun, 29 Apr 2018 07:56:12 +0000 (10:56 +0300)]
mlxsw: Respin SPAN on switchdev events

Changes to switchdev artifact can make a SPAN entry offloadable or
unoffloadable. To that end:

- Listen to SWITCHDEV_FDB_*_TO_BRIDGE notifications in addition to
  the *_TO_DEVICE ones, to catch whatever activity is sent to the
  bridge (likely by mlxsw itself).

  On each FDB notification, respin SPAN to reconcile it with the FDB
  changes.

- Also respin on switchdev port attribute changes (which currently
  covers changes to STP state of ports) and port object additions and
  removals.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Register SPAN before switchdev
Petr Machata [Sun, 29 Apr 2018 07:56:11 +0000 (10:56 +0300)]
mlxsw: spectrum: Register SPAN before switchdev

Since switchdev events can trigger SPAN respin, it is necessary that the
data structures are available. Register SPAN first, with a commentary on
what the dependencies are.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum_switchdev: Publish two functions
Petr Machata [Sun, 29 Apr 2018 07:56:10 +0000 (10:56 +0300)]
mlxsw: spectrum_switchdev: Publish two functions

Publish the existing function mlxsw_sp_bridge_port_find(), and add
another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both
in a new file spectrum_switchdev.h.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Extract mlxsw_sp_stp_spms_state()
Petr Machata [Sun, 29 Apr 2018 07:56:09 +0000 (10:56 +0300)]
mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state()

Instead of duplicating the decision regarding port forwarding state made
by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new
function and reuse.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bridge: Publish bridge accessor functions
Petr Machata [Sun, 29 Apr 2018 07:56:08 +0000 (10:56 +0300)]
net: bridge: Publish bridge accessor functions

Add a couple new functions to allow querying FDB and vlan settings of a
bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: sr: extract the right key values for "seg6_make_flowlabel"
Ahmed Abdelsalam [Sat, 28 Apr 2018 10:18:35 +0000 (12:18 +0200)]
ipv6: sr: extract the right key values for "seg6_make_flowlabel"

The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
flowlabel from a given skb. It relies on skb_get_hash() which eventually
calls __skb_flow_dissect() to extract the flow_keys struct values from
the skb.

In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
skb_reset_network_header(), and skb_mac_header_rebuild() will results in
flow_keys struct of all key values set to zero.

This patch calls seg6_make_flowlabel() before resetting the headers of skb
to get the right key values.

Extracted Key values are based on the type inner packet as follows:
1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
3) L2 traffic: depends on what kind of traffic carried into the L2
frame. IPv6 and IPv4 traffic works as discussed 1) and 2)

Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
10.100.1.100: 47302 > 30.0.0.2: 5001
00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
00000020: 0a 64 01 64

fc00:a1:a > b2::2
00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
00000030: 00 00 00 00 00 00 00 00 00 00 00 0a

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agolibcxgb,cxgb4: use __skb_put_zero to simplfy code
YueHaibing [Sat, 28 Apr 2018 04:35:22 +0000 (12:35 +0800)]
libcxgb,cxgb4: use __skb_put_zero to simplfy code

use helper __skb_put_zero to replace the pattern of __skb_put() && memset()

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoerspan: auto detect truncated packets.
William Tu [Fri, 27 Apr 2018 21:16:32 +0000 (14:16 -0700)]
erspan: auto detect truncated packets.

Currently the truncated bit is set only when the mirrored packet
is larger than mtu.  For certain cases, the packet might already
been truncated before sending to the erspan tunnel.  In this case,
the patch detect whether the IP header's total length is larger
than the actual skb->len.  If true, this indicated that the
mirrored packet is truncated and set the erspan truncate bit.

I tested the patch using bpf_skb_change_tail helper function to
shrink the packet size and send to erspan tunnel.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'r8169-further-improvements'
David S. Miller [Mon, 30 Apr 2018 13:38:20 +0000 (09:38 -0400)]
Merge branch 'r8169-further-improvements'

Heiner Kallweit says:

====================
r8169: further improvements w/o functional change

This series aims at further improving and simplifying the code w/o
any intended functional changes.

Series was tested on: RTL8169sb, RTL8168d, RTL8168e-vl
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: move common initializations to tp->hw_start
Heiner Kallweit [Sat, 28 Apr 2018 20:19:47 +0000 (22:19 +0200)]
r8169: move common initializations to tp->hw_start

The chip-specific init code includes quite some calls which are
identical for all chips. So move these calls to tp->hw_start().

In addition move rtl_set_rx_max_size() a little to make sure it's
defined before it's used. Unfortunately the diff generated by git
is a little bit hard to read.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: remove calls to rtl_set_rx_mode
Heiner Kallweit [Sat, 28 Apr 2018 20:19:43 +0000 (22:19 +0200)]
r8169: remove calls to rtl_set_rx_mode

__dev_open() calls the ndo_set_rx_mode callback anyway, so we don't
have to do it here too.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: simplify rtl_hw_start_8169
Heiner Kallweit [Sat, 28 Apr 2018 20:19:40 +0000 (22:19 +0200)]
r8169: simplify rtl_hw_start_8169

Currently done:
- if mac_version in (01, 02, 03, 04)
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
- if mac_version in (01, 02, 03, 04)
rtl_set_rx_tx_config_registers(tp);
- if mac_version not in (01, 02, 03, 04)
RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
rtl_set_rx_tx_config_registers(tp);

So we do exactly the same independent of chip version and can simplify
the code.

In addition remove the call to rtl_init_rxcfg(), it's called in
rtl_init_one() already and the set bits are never touched later.
rtl_init_8168/8101 don't include this call either.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve handling of CPCMD quirk mask
Heiner Kallweit [Sat, 28 Apr 2018 20:19:30 +0000 (22:19 +0200)]
r8169: improve handling of CPCMD quirk mask

Both quirk masks are the same, so we can merge them. The quirk mask
includes most bits so it's actually easier to define a mask with
the bits to keep.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve CPlusCmd handling
Heiner Kallweit [Sat, 28 Apr 2018 20:19:26 +0000 (22:19 +0200)]
r8169: improve CPlusCmd handling

tp->cp_cmd is supposed to reflect the current value of the CplusCmd
register. Several (quite old) changes however directly change this
register w/o updating tp->cp_cmd. Also we have places in the code
reading this register where we could use the cached value.

In addition:
- Properly initialize tp->cmd with the register value.
- In rtl_hw_start_8169 remove one setting of PCIMulRW because it's
  set unconditionally anyway a few lines later.
- In rtl_hw_start_8168 properly mask out the INTT bits before
  setting INTT_1. So far we rely on both bits being zero.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: replace magic number for INTT mask with a constant
Heiner Kallweit [Sat, 28 Apr 2018 20:19:21 +0000 (22:19 +0200)]
r8169: replace magic number for INTT mask with a constant

Use a proper constant for INTT bit mask.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve rtl8169_set_features
Heiner Kallweit [Sat, 28 Apr 2018 20:19:15 +0000 (22:19 +0200)]
r8169: improve rtl8169_set_features

__rtl8169_set_features is used in rtl8169_set_features only, so we
can inline it. In addition:
- Remove check (features ^ dev->features), __netdev_update_features
  check's already that requested features differ from current ones.
- Don't mask out unsupported flags, there's no benefit in it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: remove unneeded call to __rtl8169_set_features in rtl_open
Heiner Kallweit [Sat, 28 Apr 2018 20:19:08 +0000 (22:19 +0200)]
r8169: remove unneeded call to __rtl8169_set_features in rtl_open

RxChkSum and RxVlan aren't touched outside __rtl8169_set_features
(except in probe), so they are always in sync with dev->features.
And the RxConfig flags are set in rtl_set_rx_mode() which is
called via dev_set_rx_mode() from __dev_open().
Therefore we can safely remove this call.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"
Colin Ian King [Sat, 28 Apr 2018 09:52:16 +0000 (10:52 +0100)]
liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision"

Trivial fix to spelling mistake in oct_stats_strings text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'liquidio-enhanced-ethtool-set-channels-feature'
David S. Miller [Mon, 30 Apr 2018 13:26:29 +0000 (09:26 -0400)]
Merge branch 'liquidio-enhanced-ethtool-set-channels-feature'

Intiyaz Basha says:

====================
liquidio: enhanced ethtool --set-channels feature

For the ethtool --set-channels feature, the liquidio driver currently
accepts max combined value as the queue count configured during driver
load time, where max combined count is the total count of input and output
queues. This limitation is applicable only when SR-IOV is enabled, that
is, when VFs are created for PF. If SR-IOV is not enabled, the driver can
configure max supported (64) queues.

This series of patches are for enhancing driver to accept
max supported queues for ethtool --set-channels.

Changes in V2:
  Only patch #6 was changed to fix these Sparse warnings reported by kbuild
  test robot:
    lio_ethtool.c:848:5: warning: symbol 'lio_23xx_reconfigure_queue_count'
                         was not declared. Should it be static?
    lio_ethtool.c:877:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:878:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:879:22: warning: incorrect type in assignment (different
                          base types)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: enhanced ethtool --set-channels feature
Intiyaz Basha [Sat, 28 Apr 2018 06:32:57 +0000 (23:32 -0700)]
liquidio: enhanced ethtool --set-channels feature

Enhancing driver to accept max supported queues for ethtool --set-channels

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Moved common function setup_glists to lio_core.c
Intiyaz Basha [Sat, 28 Apr 2018 06:32:55 +0000 (23:32 -0700)]
liquidio: Moved common function setup_glists to lio_core.c

Moved common function setup_glists to lio_core.c
and reamed it to lio_setup_glists

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Moved common definition octnic_gather to octeon_network.h
Intiyaz Basha [Sat, 28 Apr 2018 06:32:51 +0000 (23:32 -0700)]
liquidio: Moved common definition octnic_gather to octeon_network.h

Moving common definition octnic_gather to octeon_network.h

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Moved common function delete_glists to lio_core.c
Intiyaz Basha [Sat, 28 Apr 2018 06:32:49 +0000 (23:32 -0700)]
liquidio: Moved common function delete_glists to lio_core.c

Moved common function delete_glists to lio_core.c
and renamed it to lio_delete_glists

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Moved common function list_delete_head to octeon_network.h
Intiyaz Basha [Sat, 28 Apr 2018 06:32:45 +0000 (23:32 -0700)]
liquidio: Moved common function list_delete_head to octeon_network.h

Moved common function list_delete_head to octeon_network.h
and renamed it to lio_list_delete_head

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Moved common function if_cfg_callback to lio_core.c
Intiyaz Basha [Sat, 28 Apr 2018 06:32:39 +0000 (23:32 -0700)]
liquidio: Moved common function if_cfg_callback to lio_core.c

Moved common function if_cfg_callback to lio_core.c
and renamed it to lio_if_cfg_callback.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: core: Assert the size of netdev_featres_t
Florian Fainelli [Fri, 27 Apr 2018 20:11:14 +0000 (13:11 -0700)]
net: core: Assert the size of netdev_featres_t

We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-cleanup-skb_tx_hash'
David S. Miller [Mon, 30 Apr 2018 02:01:33 +0000 (22:01 -0400)]
Merge branch 'net-cleanup-skb_tx_hash'

Alexander Duyck says:

====================
Clean up users of skb_tx_hash and __skb_tx_hash

I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.

My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
Alexander Duyck [Fri, 27 Apr 2018 18:06:53 +0000 (14:06 -0400)]
net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash

I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
Alexander Duyck [Fri, 27 Apr 2018 18:06:45 +0000 (14:06 -0400)]
mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue

The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoopa_vnic: Just use skb_get_hash instead of skb_tx_hash
Alexander Duyck [Fri, 27 Apr 2018 18:06:35 +0000 (14:06 -0400)]
opa_vnic: Just use skb_get_hash instead of skb_tx_hash

This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.

The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.

Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.

Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.

In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'lan78xx-fixed-phy'
David S. Miller [Mon, 30 Apr 2018 01:41:01 +0000 (21:41 -0400)]
Merge branch 'lan78xx-fixed-phy'

Raghuram Chary J says:

====================
lan78xx updates along with Fixed phy Support

These series of patches handle few modifications in driver
and adds support for fixed phy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agolan78xx: Modify error messages
Raghuram Chary J [Sat, 28 Apr 2018 06:03:16 +0000 (11:33 +0530)]
lan78xx: Modify error messages

Modify the error messages when phy registration fails.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agolan78xx: Remove DRIVER_VERSION for lan78xx driver
Raghuram Chary J [Sat, 28 Apr 2018 06:03:15 +0000 (11:33 +0530)]
lan78xx: Remove DRIVER_VERSION for lan78xx driver

Remove driver version info from the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agolan78xx: Lan7801 Support for Fixed PHY
Raghuram Chary J [Sat, 28 Apr 2018 06:03:14 +0000 (11:33 +0530)]
lan78xx: Lan7801 Support for Fixed PHY

Adding Fixed PHY support to the lan78xx driver.

Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tcp-mmap-rework-zerocopy-receive'
David S. Miller [Mon, 30 Apr 2018 01:29:55 +0000 (21:29 -0400)]
Merge branch 'tcp-mmap-rework-zerocopy-receive'

Eric Dumazet says:

====================
tcp: mmap: rework zerocopy receive

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new getsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.

v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
    instead of setsockopt(), feedback from Ka-Cheon Poon

v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
    Properly clear zc->recv_skip_hint in case user request was completed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE
Eric Dumazet [Fri, 27 Apr 2018 15:58:09 +0000 (08:58 -0700)]
selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE

After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
__u64 address; /* in: address of mapping */
__u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */
};

After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
Eric Dumazet [Fri, 27 Apr 2018 15:58:08 +0000 (08:58 -0700)]
tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dsa-mv88e6xxx-remove-Global-2-setup'
David S. Miller [Mon, 30 Apr 2018 00:36:50 +0000 (20:36 -0400)]
Merge branch 'dsa-mv88e6xxx-remove-Global-2-setup'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: remove Global 2 setup

Parts of the mv88e6xxx driver still write arbitrary registers of
different banks at setup time, which is misleading especially when
supporting multiple device models.

This patchset moves two features setup into the top lovel
mv88e6xxx_setup function and kills the old Global 2 register bank setup
function. It brings no functional changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: remove Global 2 setup
Vivien Didelot [Fri, 27 Apr 2018 01:56:46 +0000 (21:56 -0400)]
net: dsa: mv88e6xxx: remove Global 2 setup

The remaining values written to the Switch Management Register in the
mv88e6xxx_g2_setup function are specific to 88E6352 and older, and are
the default values anyway.

Thus remove completely this function. The mv88e6xxx driver no more
contains setup code to access arbitrary Global 2 registers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: move device mapping setup
Vivien Didelot [Fri, 27 Apr 2018 01:56:45 +0000 (21:56 -0400)]
net: dsa: mv88e6xxx: move device mapping setup

Move the Device Mapping setup out of the specific Global 2 code,
into the top level device setup function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: move trunk setup
Vivien Didelot [Fri, 27 Apr 2018 01:56:44 +0000 (21:56 -0400)]
net: dsa: mv88e6xxx: move trunk setup

Move the trunking setup out of Global 2 specific setup into the top
level mv88e6xxx_setup function.

Note that the 88E6390 family calls this LAG instead of Trunk and
supports 32 possible ID routing vectors, with LAG ID bit 4 being placed
in Global 2 register 0x1D...

We don't need Trunk (or LAG) IDs for the moment, thus keep it simple.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: Fix modular PHYLIB build
Florian Fainelli [Fri, 27 Apr 2018 19:41:49 +0000 (12:41 -0700)]
net: phy: Fix modular PHYLIB build

After commit c59530d0d5dc ("net: Move PHY statistics code into PHY
library helpers") we made net/core/ethtool.c reference symbols which are
part of the library which can be modular. David introduced a temporary
fix with 1ecd6e8ad996 ("phy: Temporary build fix after phylib changes.")
which would prevent such modularity.

This is not desireable of course, so instead, just inline the functions
into include/linux/phy.h to keep both options available.

Fixes: c59530d0d5dc ("net: Move PHY statistics code into PHY library helpers")
Fixes: 1ecd6e8ad996 ("phy: Temporary build fix after phylib changes.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoudp: remove stray export symbol
Willem de Bruijn [Fri, 27 Apr 2018 15:12:10 +0000 (11:12 -0400)]
udp: remove stray export symbol

UDP GSO needs to export __udp_gso_segment to call it from ipv6.

I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.

Fixes: ee80d1ebe5ba ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: sr: Add documentation for seg_flowlabel sysctl
Ahmed Abdelsalam [Fri, 27 Apr 2018 15:51:48 +0000 (17:51 +0200)]
ipv6: sr: Add documentation for seg_flowlabel sysctl

This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodrivers: net: replace UINT64_MAX with U64_MAX
Jisheng Zhang [Fri, 27 Apr 2018 08:18:58 +0000 (16:18 +0800)]
drivers: net: replace UINT64_MAX with U64_MAX

U64_MAX is well defined now while the UINT64_MAX is not, so we fall
back to drivers' own definition as below:

#ifndef UINT64_MAX
#define UINT64_MAX             (u64)(~((u64)0))
#endif

I believe this is in one phy driver then copied and pasted to other phy
drivers.

Replace the UINT64_MAX with U64_MAX to clean up the source code.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptp_pch: use helpers function for converting between ns and timespec
YueHaibing [Fri, 27 Apr 2018 07:36:18 +0000 (15:36 +0800)]
ptp_pch: use helpers function for converting between ns and timespec

use ns_to_timespec64() and timespec64_to_ns() instead of open coding

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qrtr: Expose tunneling endpoint to user space
Bjorn Andersson [Fri, 27 Apr 2018 05:28:49 +0000 (22:28 -0700)]
net: qrtr: Expose tunneling endpoint to user space

This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.

This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'selftests-Add-tests-for-mirroring-to-gretap'
David S. Miller [Fri, 27 Apr 2018 18:57:51 +0000 (14:57 -0400)]
Merge branch 'selftests-Add-tests-for-mirroring-to-gretap'

Petr Machata says:

====================
selftests: Add tests for mirroring to gretap

This suite tests GRE-encapsulated mirroring. The general topology that
most of the tests use is as follows, but each test defines details of
the topology based on its needs, and some tests actually use a somewhat
different topology.

+---------------------+                      +---------------------+
| H1                  |                      |                  H2 |
|     + $h1           |                      |           $h2 +     |
+-----|---------------+                      +---------------|-----+
      |                                                      |
+-----|------------------------------------------------------|-----+
| SW  o---> mirror                                           |     |
| +---|------------------------------------------------------|---+ |
| |   + $swp1               BR                         $swp2 +   | |
| +--------------------------------------------------------------+ |
|                                                                  |
|     + $swp3          + gt6 (ip6gretap)    + gt4 (gretap)         |
+-----|----------------:--------------------:----------------------+
      |                :                    :
+-----|----------------:--------------------:----------------------+
|     + $h3            + h3-gt6(ip6gretap)  + h3-gt4 (gretap)      |
| H3                                                               |
+------------------------------------------------------------------+

The following axes of configuration space are tested:

- ingress and egress mirroring
- mirroring triggered by matchall and flower
- mirroring to ipgretap and ip6gretap
- remote tunnel reachable directly or through a next-hop route
- skip_sw as well as skip_hw configurations

Apart from basic tests with the above mentioned features, the following
tests are included:

- handling of changes to neighbors pertinent to routing decisions in
  mirrored underlay
- handling of configuration changes at the mirrored-to tunnel (endpoint
  addresses, upness)

A suite of mlxsw-specific tests will be part of a separate submission
through linux-mlxsw patch queue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Test changes in mirror-to-gretap
Petr Machata [Thu, 26 Apr 2018 23:22:25 +0000 (01:22 +0200)]
selftests: forwarding: Test changes in mirror-to-gretap

These tests set up mirroring in a situation that the configuration is
incorrect, i.e. mirrored packets, if any, are not supposed to reach
destination tunnel device. Then the configuration is rectified and
mirroring is checked to have started working.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Test neighbor updates when mirroring to gretap
Petr Machata [Thu, 26 Apr 2018 23:21:23 +0000 (01:21 +0200)]
selftests: forwarding: Test neighbor updates when mirroring to gretap

Test that when a mirror to gretap or ip6gretap netdevice is configured,
changes to neighbors are reflected.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Test flower mirror to gretap
Petr Machata [Thu, 26 Apr 2018 23:20:56 +0000 (01:20 +0200)]
selftests: forwarding: Test flower mirror to gretap

Add a test for mirroring to a gretap and an ip6gretap netdevices such
that the mirroring action is triggered by a flower match.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Test mirror to gretap w/ bound dev
Petr Machata [Thu, 26 Apr 2018 23:19:55 +0000 (01:19 +0200)]
selftests: forwarding: Test mirror to gretap w/ bound dev

Test mirroring to a gretap and an ip6gretap netdevice with a bound
device, where the tunnel device and the bound device are in different
VRFs (an overlay / underlay configuration).

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Test gretap mirror with next-hop remote
Petr Machata [Thu, 26 Apr 2018 23:19:15 +0000 (01:19 +0200)]
selftests: forwarding: Test gretap mirror with next-hop remote

Test mirror to a gretap and an ip6gretap netdevice such that the remote
address of the tunnel is reachable through a next-hop route.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Add test for mirror to gretap
Petr Machata [Thu, 26 Apr 2018 23:18:23 +0000 (01:18 +0200)]
selftests: forwarding: Add test for mirror to gretap

Add a test for basic mirroring to gretap and ip6gretap netdevices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: forwarding: Add libs for gretap mirror testing
Petr Machata [Thu, 26 Apr 2018 23:17:56 +0000 (01:17 +0200)]
selftests: forwarding: Add libs for gretap mirror testing

To simplify implementation of mirror-to-gretap tests, extend lib.sh with
several new functions that might potentially be useful more
broadly (although right now the mirroring tests will be the only
client).

Also add mirror_lib.sh with code useful for mirroring tests,
mirror_gre_lib.sh with code specifically useful for mirror-to-gretap
tests, and mirror_gre_topo.sh that primes a given test with a good
baseline topology that the test can then tweak to its liking.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bnxt_en-next'
David S. Miller [Fri, 27 Apr 2018 18:47:32 +0000 (14:47 -0400)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Net-next updates.

This series has 3 main features.  The first is to add mqprio TC to
hardware queue mapping to avoid reprogramming hardware CoS queue
watermarks during run-time.  The second is DIM improvements from
Andy Gospo.  The third is some improvements to VF resource allocations
when supporting large numbers of VFs with more limited resources.

There are some additional minor improvements and a new function level
discard counter.

v2: Fixed EEPROM typo noted by Andrew Lunn.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Reserve rings at driver open if none was reserved at probe time.
Michael Chan [Thu, 26 Apr 2018 21:44:44 +0000 (17:44 -0400)]
bnxt_en: Reserve rings at driver open if none was reserved at probe time.

Add logic to reserve default rings at driver open time if none was
reserved during probe time.  This will happen when the PF driver did
not provision minimum rings to the VF, due to more limited resources.

Driver open will only succeed if some minimum rings can be reserved.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Reserve RSS and L2 contexts for VF.
Michael Chan [Thu, 26 Apr 2018 21:44:43 +0000 (17:44 -0400)]
bnxt_en: Reserve RSS and L2 contexts for VF.

For completeness and correctness, the VF driver needs to reserve these
RSS and L2 contexts.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Don't reserve rings on VF when min rings were not provisioned by PF.
Michael Chan [Thu, 26 Apr 2018 21:44:42 +0000 (17:44 -0400)]
bnxt_en: Don't reserve rings on VF when min rings were not provisioned by PF.

When rings are more limited and the PF has not provisioned minimum
guaranteed rings to the VF, do not reserve rings during driver probe.
Wait till device open before reserving rings when they will be used.
Device open will succeed if some minimum rings can be successfully
reserved and allocated.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Reserve rings in bnxt_set_channels() if device is down.
Michael Chan [Thu, 26 Apr 2018 21:44:41 +0000 (17:44 -0400)]
bnxt_en: Reserve rings in bnxt_set_channels() if device is down.

The current code does not reserve rings during ethtool -L when the device
is down.  The rings will be reserved when the device is later opened.

Change it to reserve rings during ethtool -L when the device is down.
This provides a better guarantee that the device open will be successful
when the rings are reserved ahead of time.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: add debugfs support for DIM
Andy Gospodarek [Thu, 26 Apr 2018 21:44:40 +0000 (17:44 -0400)]
bnxt_en: add debugfs support for DIM

This adds debugfs support for bnxt_en with the purpose of allowing users
to examine the current DIM profile in use for each receive queue.  This
was instrumental in debugging issues found with DIM and ensuring that
the profiles we expect to use are the profiles being used.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: reduce timeout on initial HWRM calls
Andy Gospodarek [Thu, 26 Apr 2018 21:44:39 +0000 (17:44 -0400)]
bnxt_en: reduce timeout on initial HWRM calls

Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected.  More detailed analysis indicated that the
default 25us delay was higher than necessary.  Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Increase RING_IDLE minimum threshold to 50
Andy Gospodarek [Thu, 26 Apr 2018 21:44:38 +0000 (17:44 -0400)]
bnxt_en: Increase RING_IDLE minimum threshold to 50

This keeps the RING_IDLE flag set in hardware for higher coalesce
settings by default and improved latency.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Do not allow VF to read EEPROM.
Michael Chan [Thu, 26 Apr 2018 21:44:37 +0000 (17:44 -0400)]
bnxt_en: Do not allow VF to read EEPROM.

Firmware does not allow the operation and would return failure, causing
a warning in dmesg.  So check for VF and disallow it in the driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Display function level rx/tx_discard_pkts via ethtool
Vasundhara Volam [Thu, 26 Apr 2018 21:44:36 +0000 (17:44 -0400)]
bnxt_en: Display function level rx/tx_discard_pkts via ethtool

Add counters to display sum of rx/tx_discard_pkts of all rings as
function level statistics via ethtool.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Simplify ring alloc/free error messages.
Michael Chan [Thu, 26 Apr 2018 21:44:35 +0000 (17:44 -0400)]
bnxt_en: Simplify ring alloc/free error messages.

Replace switch statements printing different messages for every ring type
with a common message.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Do not set firmware time from VF driver on older firmware.
Michael Chan [Thu, 26 Apr 2018 21:44:34 +0000 (17:44 -0400)]
bnxt_en: Do not set firmware time from VF driver on older firmware.

Older firmware will reject this call and cause an error message to
be printed by the VF driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Check the lengths of encapsulated firmware responses.
Michael Chan [Thu, 26 Apr 2018 21:44:33 +0000 (17:44 -0400)]
bnxt_en: Check the lengths of encapsulated firmware responses.

Firmware messages that are forwarded from PF to VFs are encapsulated.
The size of these encapsulated messages must not exceed the maximum
defined message size.  Add appropriate checks to avoid oversize
messages.  Firmware messages may be expanded in future specs and
this will provide some guardrails to avoid data corruption.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Remap TC to hardware queues when configuring PFC.
Michael Chan [Thu, 26 Apr 2018 21:44:32 +0000 (17:44 -0400)]
bnxt_en: Remap TC to hardware queues when configuring PFC.

Initially, the MQPRIO TCs are mapped 1:1 directly to the hardware
queues.  Some of these hardware queues are configured to be lossless.
When PFC is enabled on one of more TCs, we now need to remap the
TCs that have PFC enabled to the lossless hardware queues.

After remapping, we need to close and open the NIC for the new
mapping to take effect.  We also need to reprogram all ETS parameters.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add TC to hardware QoS queue mapping logic.
Michael Chan [Thu, 26 Apr 2018 21:44:31 +0000 (17:44 -0400)]
bnxt_en: Add TC to hardware QoS queue mapping logic.

The current driver maps MQPRIO traffic classes directly 1:1 to the
internal hardware queues (TC0 maps to hardware queue 0, etc).  This
direct mapping requires the internal hardware queues to be reconfigured
from lossless to lossy and vice versa when necessary.  This
involves reconfiguring internal buffer thresholds which is
disruptive and not always reliable.

Implement a new scheme to map TCs to internal hardware queues by
matching up their PFC requirements.  This will eliminate the need
to reconfigure a hardware queue internal buffers at run time.  After
remapping, the NIC is closed and opened for the new TC to hardware
queues to take effect.

This patch only adds the basic mapping logic.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agohv_netvsc: simplify receive side calling arguments
Stephen Hemminger [Thu, 26 Apr 2018 21:34:25 +0000 (14:34 -0700)]
hv_netvsc: simplify receive side calling arguments

The calls up from the napi poll reading the receive ring had many
places where an argument was being recreated. I.e the caller already
had the value and wasn't passing it, then the callee would use
known relationship to determine the same value. Simpler and faster
to just pass arguments needed.

Also, add const in a couple places where message is being only read.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'sctp-refactor-MTU-handling'
David S. Miller [Fri, 27 Apr 2018 18:35:24 +0000 (14:35 -0400)]
Merge branch 'sctp-refactor-MTU-handling'

Marcelo Ricardo Leitner says:

====================
sctp: refactor MTU handling

Currently MTU handling is spread over SCTP stack. There are multiple
places doing same/similar calculations and updating them is error prone
as one spot can easily be left out.

This patchset converges it into a more concise and consistent code. In
general, it moves MTU handling from functions with bigger objectives,
such as sctp_assoc_add_peer(), to specific functions.

It's also a preparation for the next patchset, which removes the
duplication between sctp_make_op_error_space and
sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.

More details on each patch.
====================

Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: allow unsetting sockopt MAXSEG
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:59:02 +0000 (16:59 -0300)]
sctp: allow unsetting sockopt MAXSEG

RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.

This patch thus allow setting it as 0, effectively removing the user
limit.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: consider idata chunks when setting SCTP_MAXSEG
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:59:01 +0000 (16:59 -0300)]
sctp: consider idata chunks when setting SCTP_MAXSEG

When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: honor PMTU_DISABLED when handling icmp
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:59:00 +0000 (16:59 -0300)]
sctp: honor PMTU_DISABLED when handling icmp

sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.

This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: re-use sctp_transport_pmtu in sctp_transport_route
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:59 +0000 (16:58 -0300)]
sctp: re-use sctp_transport_pmtu in sctp_transport_route

sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.

This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.

Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove sctp_transport_pmtu_check
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:58 +0000 (16:58 -0300)]
sctp: remove sctp_transport_pmtu_check

We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: introduce sctp_dst_mtu
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:57 +0000 (16:58 -0300)]
sctp: introduce sctp_dst_mtu

Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove sctp_assoc_pending_pmtu
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:56 +0000 (16:58 -0300)]
sctp: remove sctp_assoc_pending_pmtu

No need for this helper.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: introduce sctp_assoc_update_frag_point
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:55 +0000 (16:58 -0300)]
sctp: introduce sctp_assoc_update_frag_point

and avoid the open-coded versions of it.

Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: introduce sctp_mtu_payload
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:54 +0000 (16:58 -0300)]
sctp: introduce sctp_mtu_payload

When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.

So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: introduce sctp_assoc_set_pmtu
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:53 +0000 (16:58 -0300)]
sctp: introduce sctp_assoc_set_pmtu

All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove an if() that is always true
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:52 +0000 (16:58 -0300)]
sctp: remove an if() that is always true

As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.

Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: move transport pathmtu calc away of sctp_assoc_add_peer
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:51 +0000 (16:58 -0300)]
sctp: move transport pathmtu calc away of sctp_assoc_add_peer

There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: remove old and unused SCTP_MIN_PMTU
Marcelo Ricardo Leitner [Thu, 26 Apr 2018 19:58:50 +0000 (16:58 -0300)]
sctp: remove old and unused SCTP_MIN_PMTU

This value is not used anywhere in the code. In essence it is a
duplicate of SCTP_DEFAULT_MINSEGMENT, which is used by the stack.

SCTP_MIN_PMTU value makes more sense, but we should not change to it now
as it would risk breaking applications.

So this patch removes SCTP_MIN_PMTU and adjust the comment above it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests: pmtu: Minimum MTU for vti6 is 68
Stefano Brivio [Thu, 26 Apr 2018 17:41:02 +0000 (19:41 +0200)]
selftests: pmtu: Minimum MTU for vti6 is 68

A vti6 interface can carry IPv4 packets too.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove mss check in tcp_select_initial_window()
Wei Wang [Thu, 26 Apr 2018 16:58:10 +0000 (09:58 -0700)]
tcp: remove mss check in tcp_select_initial_window()

In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.

In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'smc-next'
David S. Miller [Fri, 27 Apr 2018 18:02:53 +0000 (14:02 -0400)]
Merge branch 'smc-next'

Ursula Braun says:

====================
smc fixes from 2018-04-17 - v3

in the mean time we challenged the benefit of these CLC handshake
optimizations for the sockopts TCP_NODELAY and TCP_CORK.
We decided to give up on them for now, since SMC still works
properly without.
There is now version 3 of the patch series with patches 2-4 implementing
sockopts that require special handling in SMC.

Version 3 changes
   * no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
   * allow fallback for some sockopts eliminating SMC usage
   * when setting TCP_NODELAY always enforce data transmission
     (not only together with corked data)

Version 2 changes of Patch 2/4 (and 3/4):
   * return error -EOPNOTSUPP for TCP_FASTOPEN sockopts
   * fix a kernel_setsockopt() usage bug by switching parameter
     variable from type "u8" to "int"
   * add return code validation when calling kernel_setsockopt()
   * propagate a setsockopt error on the internal CLC socket
     to the SMC socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: handle sockopt TCP_DEFER_ACCEPT
Ursula Braun [Thu, 26 Apr 2018 15:18:23 +0000 (17:18 +0200)]
net/smc: handle sockopt TCP_DEFER_ACCEPT

If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: sockopts TCP_NODELAY and TCP_CORK
Ursula Braun [Thu, 26 Apr 2018 15:18:22 +0000 (17:18 +0200)]
net/smc: sockopts TCP_NODELAY and TCP_CORK

Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.

For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: handle sockopts forcing fallback
Ursula Braun [Thu, 26 Apr 2018 15:18:21 +0000 (17:18 +0200)]
net/smc: handle sockopts forcing fallback

Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: fix structure size
Karsten Graul [Thu, 26 Apr 2018 15:18:20 +0000 (17:18 +0200)]
net/smc: fix structure size

The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: intel: Cleanup the copyright/license headers
Jeff Kirsher [Thu, 26 Apr 2018 15:08:09 +0000 (08:08 -0700)]
net: intel: Cleanup the copyright/license headers

After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.

Also caught a few files missing the SPDX license identifier, so fixed
them up.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Fix coccinelle warning
Kirill Tkhai [Thu, 26 Apr 2018 12:18:38 +0000 (15:18 +0300)]
net: Fix coccinelle warning

kbuild test robot says:

  >coccinelle warnings: (new ones prefixed by >>)
  >>> net/core/dev.c:1588:2-3: Unneeded semicolon

So, let's remove it.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agogeneve: fix build with modular IPV6
Tobias Regnery [Thu, 26 Apr 2018 10:36:36 +0000 (12:36 +0200)]
geneve: fix build with modular IPV6

Commit c40e89fd358e ("geneve: configure MTU based on a lower device") added
an IS_ENABLED(CONFIG_IPV6) to geneve, leading to the following link error
with CONFIG_GENEVE=y and CONFIG_IPV6=m:

drivers/net/geneve.o: In function `geneve_link_config':
geneve.c:(.text+0x14c): undefined reference to `rt6_lookup'

Fix this by adding a Kconfig dependency and forcing GENEVE to be a module
when IPV6 is a module.

Fixes: c40e89fd358e ("geneve: configure MTU based on a lower device")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 's390-next'
David S. Miller [Fri, 27 Apr 2018 17:38:50 +0000 (13:38 -0400)]
Merge branch 's390-next'

Julian Wiedmann says:

====================
s390/net: updates 2018-04-26

please apply the following patches to net-next. There's the usual
cleanups & small improvements, and Kittipon adds HW offload support
for IPv6 checksumming.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/qeth: improve fallback to random MAC address
Julian Wiedmann [Thu, 26 Apr 2018 07:42:24 +0000 (09:42 +0200)]
s390/qeth: improve fallback to random MAC address

If READ MAC fails to fetch a valid MAC address, allow some more device
types (IQD and z/VM OSD) to fall back to a random address.
Also use eth_hw_addr_random(), for indicating to userspace that the
address type is NET_ADDR_RANDOM.

Note that while z/VM has various protection schemes to prohibit
custom addresses on its NICs, they are all optional. So we should at
least give it a try.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/qeth: add IPv6 RX checksum offload support
Kittipon Meesompop [Thu, 26 Apr 2018 07:42:23 +0000 (09:42 +0200)]
s390/qeth: add IPv6 RX checksum offload support

Check if a qeth device supports IPv6 RX checksum offload, and hook it up
into the existing NETIF_F_RXCSUM support.
As NETIF_F_RXCSUM is now backed by a combination of HW Assists, we need
to be a little smarter when dealing with errors during a configuration
change:
- switching on NETIF_F_RXCSUM only makes sense if at least one HW Assist
  was enabled successfully.
- for switching off NETIF_F_RXCSUM, all available HW Assists need to be
  deactivated.

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agos390/qeth: add IPv6 TX checksum offload support
Kittipon Meesompop [Thu, 26 Apr 2018 07:42:22 +0000 (09:42 +0200)]
s390/qeth: add IPv6 TX checksum offload support

Check if a qeth device supports IPv6 TX checksum offload, and advertise
NETIF_F_IPV6_CSUM accordingly. Add support for setting the relevant bits
in IPv6 packet descriptors.

Currently this has only limited use (ie. UDP, or Jumbo Frames). For any
TCP traffic with a standard MSS, the TCP checksum gets calculated
as part of the linear GSO segmentation.

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>