git.openwrt.org Git - openwrt/staging/blogic.git/log

ipset: Fix memory accounting for hash types on resize

If a fresh array block is allocated during resize, the current in-memory
set size should be increased by the size of the block, not replaced by it.

Before the fix, adding entries to a hash set type, leading to a table
resize, caused an inconsistent memory size to be reported. This becomes
more obvious when swapping sets with similar sizes:

  # cat hash_ip_size.sh
  #!/bin/sh
  FAIL_RETRIES=10

  tries=0
  while [ ${tries} -lt ${FAIL_RETRIES} ]; do
   ipset create t1 hash:ip
   for i in `seq 1 4345`; do
   ipset add t1 1.2.$((i / 255)).$((i % 255))
   done
   t1_init="$(ipset list t1|sed -n 's/Size in memory: $.*$/\1/p')"

   ipset create t2 hash:ip
   for i in `seq 1 4360`; do
   ipset add t2 1.2.$((i / 255)).$((i % 255))
   done
   t2_init="$(ipset list t2|sed -n 's/Size in memory: $.*$/\1/p')"

   ipset swap t1 t2
   t1_swap="$(ipset list t1|sed -n 's/Size in memory: $.*$/\1/p')"
   t2_swap="$(ipset list t2|sed -n 's/Size in memory: $.*$/\1/p')"

   ipset destroy t1
   ipset destroy t2
   tries=$((tries + 1))

   if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then
   echo "FAIL after ${tries} tries:"
   echo "T1 size ${t1_init}, after swap ${t1_swap}"
   echo "T2 size ${t2_init}, after swap ${t2_swap}"
   exit 1
   fi
  done
  echo "PASS"
  # echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control
  # ./hash_ip_size.sh
  [ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa
  [ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163)
  [ 2035.080353] Table destroy by resize 00000000fe6551fa
  FAIL after 4 tries:
  T1 size 9064, after swap 71128
  T2 size 71128, after swap 9064

Reported-by: NOYB <JunkYardMail1@Frontier.com>
Fixes: 9e41f26a505c ("netfilter: ipset: Count non-static extension memory for userspace")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: Fix error path in set_target_v3_checkentry()

Fix error path and release the references properly.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: Fix the last missing check of nla_parse_deprecated()

In dump_init() the outdated comment was incorrect and we had a missing
validation check of nla_parse_deprecated().

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: fix a missing check of nla_parse

When nla_parse fails, we should not use the results (the first
argument). The fix checks if it fails, and if so, returns its error code
upstream.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: merge uadd and udel functions

Both functions are using exactly the same code, except the command value
passed to call_ad function.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipset: remove useless memset() calls

One of the memset call is buggy: it does not erase full array, but only pointer size.
Moreover, after a check, first step of nla_parse_nested/nla_parse is to
erase tb array as well. We can remove both calls safely.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

netfilter: ipv6: Fix undefined symbol nf_ct_frag6_gather

CONFIG_NETFILTER=m and CONFIG_NF_DEFRAG_IPV6 is not set

ERROR: "nf_ct_frag6_gather" [net/ipv6/ipv6.ko] undefined!

Fixes: c9bb6165a16e ("netfilter: nf_conntrack_bridge: fix CONFIG_IPV6=y")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

inet_connection_sock: remove unused parameter of reqsk_queue_unlink func

small cleanup: "struct request_sock_queue *queue" parameter of reqsk_queue_unlink
func is never used in the func, so we can remove it.

Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: remove state PHY_FORCING

In the early days of phylib we had a functionality that changed to the
next lower speed in fixed mode if no link was established after a
certain period of time. This functionality has been removed years ago,
and state PHY_FORCING isn't needed any longer. Instead we can go from
UP to RUNNING or NOLINK directly (same as in autoneg mode).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rds: add per rds connection cache statistics

The variable cache_allocs is to indicate how many frags (KiB) are in one
rds connection frag cache.
The command "rds-info -Iv" will output the rds connection cache
statistics as below:
"
RDS IB Connections:
      LocalAddr RemoteAddr Tos SL  LocalDev            RemoteDev
      1.1.1.14 1.1.1.14   58 255  fe80::2:c903:a:7a31 fe80::2:c903:a:7a31
      send_wr=256, recv_wr=1024, send_sge=8, rdma_mr_max=4096,
      rdma_mr_size=257, cache_allocs=12
"
This means that there are about 12KiB frag in this rds connection frag
cache.
Since rds.h in rds-tools is not related with the kernel rds.h, the change
in kernel rds.h does not affect rds-tools.
rds-info in rds-tools 2.0.5 and 2.0.6 is tested with this commit. It works
well.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dwmac-mediatek'

Biao Huang says:

====================
complete dwmac-mediatek driver and fix flow control issue

Changes in v2:
        patch#1: there is no extra action in mediatek_dwmac_remove, remove it

v1:
This series mainly complete dwmac-mediatek driver:
        1. add power on/off operations for dwmac-mediatek.
        2. disable rx watchdog to reduce rx path reponding time.
        3. change the default value of tx-frames from 25 to 1, so
           ptp4l will test pass by default.

and also fix the issue that flow control won't be disabled any more
once being enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac4: fix flow control issue

Current dwmac4_flow_ctrl will not clear
GMAC_RX_FLOW_CTRL_RFE/GMAC_RX_FLOW_CTRL_RFE bits,
so MAC hw will keep flow control on although expecting
flow control off by ethtool. Add codes to fix it.

Fixes: 477286b53f55 ("stmmac: add GMAC4 core support")
Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: modify default value of tx-frames

the default value of tx-frames is 25, it's too late when
passing tstamp to stack, then the ptp4l will fail:

ptp4l -i eth0 -f gPTP.cfg -m
ptp4l: selected /dev/ptp0 as PTP clock
ptp4l: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l: port 1: link up
ptp4l: timed out while polling for tx timestamp
ptp4l: increasing tx_timestamp_timeout may correct this issue,
but it is likely caused by a driver bug
ptp4l: port 1: send peer delay response failed
ptp4l: port 1: LISTENING to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)

ptp4l tests pass when changing the tx-frames from 25 to 1 with
ethtool -C option.
It should be fine to set tx-frames default value to 1, so ptp4l will pass
by default.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-mediatek: disable rx watchdog

disable rx watchdog for dwmac-mediatek, then the hw will
issue a rx interrupt once receiving a packet, so the responding time
for rx path will be reduced.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-mediatek: enable Ethernet power domain

add Ethernet power on/off operations in init/exit flow.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: vxlan: drop unneeded likely() call around IS_ERR()

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: drop unneeded likely() call around IS_ERR()

IS_ERR() already calls unlikely(), so this extra unlikely() call
around IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: drop unneeded likely() call around IS_ERR()

IS_ERR() already calls unlikely(), so this extra unlikely() call
around IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: openvswitch: drop unneeded likely() call around IS_ERR()

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: socket: drop unneeded likely() call around IS_ERR()

IS_ERR() already calls unlikely(), so this extra likely() call
around the !IS_ERR() is not needed.

Signed-off-by: Enrico Weigelt <info@metux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: flower: use struct_size() helper

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct nfp_tun_active_tuns {
...
        struct route_ip_info {
                __be32 ipv4;
                __be32 egress_port;
                __be32 extra[2];
        } tun_info[];
};

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

So, replace the following form:

sizeof(struct nfp_tun_active_tuns) + sizeof(struct route_ip_info) * count

with:

struct_size(payload, tun_info, count)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Check and set the PF driver state first in i40e_ndo_set_vf_mac

The PF driver state flag __I40E_VIRTCHNL_OP_PENDING needs to be
checked and set at the beginning of i40e_ndo_set_vf_mac. Otherwise,
if there are error conditions before it, the flag will be cleared
unexpectedly by this function to cause potential race conditions.
Hence move the check to the top of this function.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Do not check VF state in i40e_ndo_get_vf_config

The VF configuration returned in i40e_ndo_get_vf_config is
already stored by the PF. There is no dependency on any
specific state of the VF to return the configuration.
Drop the check against I40E_VF_STATE_INIT since it is not
needed.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2019-06-05

This series contains updates to mainly ixgbe, with a few updates to
i40e, net, ice and hns2 driver.

Jan adds support for tracking each queue pair for whether or not AF_XDP
zero copy is enabled.  Also updated the ixgbe driver to use the
netdev-provided umems so that we do not need to contain these structures
in our own adapter structure.

William Tu provides two fixes for AF_XDP statistics which were causing
incorrect counts.

Jake reduces the PTP transmit timestamp timeout from 15 seconds to 1 second,
which is still well after the maximum expected delay.  Also fixes an
issues with the PTP SDP pin setup which was not properly aligning on a
full second, so updated the code to account for the cyclecounter
multiplier and simplify the code to make the intent of the calculations
more clear.  Updated the function header comments to help with the code
documentation.  Added support for SDP/PPS output for x550 devices, which
is slightly different than x540 devices that currently have this
support.

Anirudh adds a new define for Link Layer Discovery Protocol to the
networking core, so that drivers do not have to create and use their own
definitions.  In addition, update all the drivers currently defining
their own LLDP define to use the new networking core define.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ixgbevf: fix a missing check of ixgbevf_write_msg_read_ack

If ixgbevf_write_msg_read_ack fails, return its error code upstream

Signed-off-by: Kangjie Lu <kjlu@umn.edu>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: implement support for SDP/PPS output on X550 hardware

Similar to the X540 hardware, enable support for generating a 1pps
output signal on SDP0.

This support is slightly different to the X540 hardware, because of the
register layout changes. First, the system time register is now
represented in 'cycles' and 'billions of cycles'. Second, we need to
also program the TSSDP register, as well as the ESDP register. Third,
the clock output uses only FREQOUT, instead of a full 64bit value for
the output clock period. Finally, we have to use the ST0 bit instead of
the SYNCLK bit in the TSAUXC register.

This support should work even for the hardware with a higher frequency
clock, as it carefully takes into account the multiply and shift of the
cycle counter used.

We also set the pps configuration to 1, since we now support generating
a pulse per second output.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: hns3: Use LLDP ethertype define ETH_P_LLDP

Remove references to HCLGE_MAC_ETHERTYPE_LLDP and use ETH_P_LLDP instead.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ice: Use LLDP ethertype define ETH_P_LLDP

Instead of using a local define for the LLDP ethertype, use the kernel
define ETH_P_LLDP.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use LLDP ethertype define ETH_P_LLDP

Remove references to IXGBE_ETH_P_LLD and use ETH_P_LLDP instead.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Use LLDP ethertype define ETH_P_LLDP

Remove references to I40E_ETH_P_LLDP and use ETH_P_LLDP instead.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: Add a define for LLDP ethertype

Add a new define ETH_P_LLDP for Link Layer Discovery Protocol (LLDP)
ethertype.

Suggested-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add a kernel documentation comment for ixgbe_ptp_get_ts_config

This function was missing a documentation comment. Add one now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: use 'cc' instead of 'hw_cc' for local variable

The ixgbe_ptp.c file sometimes uses hw_cc as the local variable for the
cycle counter in ixgbe_ptp_read_X550. However, we use just 'cc' as
a local variable for this by convention else where in the file.

Convert this lone usage of 'hw_cc' into just the shorter 'cc' name to
match the other read functions in the file.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix PTP SDP pin setup on X540 hardware

The function ixgbe_ptp_setup_sdp_X540 attempts to program a software
defined pin, in order to generate a pulse-per-second output on SDP 0.

It does work to generate the output, but does not align the output on
the full second. Additionally, it does not take into account the
cyclecounter multiplier. This leads to somewhat confusing code which is
likely to be incorrect if blindly copied to another hardware type.

Update this code to account for the cyclecounter multiplier, and to
directly use timecounter_read.

This change ensures that the SDP output will align properly on a full
second, and makes the intent of the calculations a bit more clear.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: reduce PTP Tx timestamp timeout to 1 second

Previously we waited for a whole 15 seconds before we cleared the Tx
timestamp state. This is astronomically long compared to the worst case
timings expected by our devices. In addition, this is longer than the
wait in ptp4l when it detects a fault (caused by missing Tx timestamps).
Thus, reduce the timer to only 1 second, which is well after the maximum
expected delay. This should reduce user frustration when a timestamp
does get dropped for some reason.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix AF_XDP tx packet count

The total_packets count at ixgbe_clean_xdp_tx_irq is
always zero when testing with xdpsock -t -N. Set the gso_segs
to 1 to make the tx packet count correct.

Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix AF_XDP tx byte count

The tx bytecount is done twice. When running
'./xdpsock -t -N -i eth3' and 'ip -s link show dev eth3'
The avg packet size is 120 instead of 60. So remove the
extra one.

Signed-off-by: William Tu <u9012063@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: remove umem from adapter

As current implementation of netdev already contains and provides
umems for us, we no longer have the need to contain these
structures in ixgbe_adapter.

Refactor the code to operate on netdev-provided umems.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add tracking of AF_XDP zero-copy state for each queue pair

Here, we add a bitmap to the ixgbe_adapter that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf.
The bitmap is used in ixgbe_xsk_umem, and enables zero-copy if
and only if XDP is enabled, the corresponding qid in the bitmap
is set, and the umem is non-NULL;

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: fec_ptp: Use dev_err() instead of pr_err()

dev_err() is more appropriate for printing error messages inside
drivers, so switch to dev_err().

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-factor-out-firmware-handling'

Heiner Kallweit says:

====================
r8169: factor out firmware handling

Let's factor out firmware handling into a separate source code file.
This simplifies reading the code and makes clearer what the interface
between driver and firmware handling is.

v2:
- fix small whitespace issue in patch 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: factor out firmware handling

Let's factor out firmware handling into a separate source code file.
This simplifies reading the code and makes clearer what the interface
between driver and firmware handling is.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: rename r8169.c to r8169_main.c

In preparation of factoring out firmware handling rename r8169.c to
r8169_main.c.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mediatek: fix mtk_eth_soc build errors & warnings

Fix build errors in Mediatek mtk_eth_soc driver.

It looks like these 3 source files were meant to be linked together
since 2 of them are library-like functions,
but they are currently being built as 3 loadable modules.

Fixes these build errors:

  WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/mediatek/mtk_eth_path.o
  WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/mediatek/mtk_sgmii.o
  ERROR: "mtk_sgmii_init" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
  ERROR: "mtk_setup_hw_path" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
  ERROR: "mtk_sgmii_setup_mode_force" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
  ERROR: "mtk_sgmii_setup_mode_an" [drivers/net/ethernet/mediatek/mtk_eth_soc.ko] undefined!
  ERROR: "mtk_w32" [drivers/net/ethernet/mediatek/mtk_eth_path.ko] undefined!
  ERROR: "mtk_r32" [drivers/net/ethernet/mediatek/mtk_eth_path.ko] undefined!

This changes the loadable module name from mtk_eth_soc to mtk_eth.
I didn't see a way to leave it as mtk_eth_soc.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Nelson Chang <nelson.chang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-mv88e6xxx-support-for-mv88e6250'

Rasmus Villemoes says:

====================
net: dsa: mv88e6xxx: support for mv88e6250

This adds support for the mv88e6250 chip. Initially based on the
mv88e6240, this time around, I've been through each ->ops callback and
checked that it makes sense, either replacing with a 6250 specific
variant or dropping it if no equivalent functionality seems to exist
for the 6250. Along the way, I found a few oddities in the existing
code, mostly sent as separate patches/questions.

The one relevant to the 6250 is the ieee_pri_map callback, where the
existing mv88e6085_g1_ieee_pri_map() is actually wrong for many of the
existing users. I've put the mv88e6250_g1_ieee_pri_map() patch first
in case some of the existing chips get switched over to use that and
it is deemed important enough for -stable.

v4:
- fix style issue in 1/10
- add Andrew's reviewed-by to 1,6,7,8,9,10.

v3:
- rebase on top of net-next/master
- add reviewed-bys to patches unchanged from v2 (2,3,4,5)
- add 6250-specific ->ieee_pri_map, ->port_set_speed, ->port_link_state (1,6,7)
- in addition, use mv88e6065_phylink_validate for ->phylink_validate,
and don't implement ->port_get_cmode, ->port_set_jumbo_size,
->port_disable_learn_limit, ->rmu_disable
- drop ptp support
- add patch adding the compatible string to the DT binding (9)
- add small refactoring patch (10)

v2:
- rebase on top of net-next/master
- add reviewed-by to two patches unchanged from v1 (2,3)
- add separate watchdog_ops
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: refactor mv88e6352_g1_reset

The new mv88e6250_g1_reset() is identical to mv88e6352_g1_reset() except
for the call of mv88e6352_g1_wait_ppu_polling(), so refactor the 6352
version in term of the 6250 one. No functional change.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: marvell: add "marvell,mv88e6250" compatible string

The mv88e6250 has port_base_addr 0x8 or 0x18 (depending on
configuration pins), so it constitutes a new family and hence needs
its own compatible string.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add support for mv88e6250

This adds support for the Marvell 88E6250. I've checked that each
member in the ops-structure makes sense, and basic switchdev
functionality works fine.

It uses the new dual_chip option, and since its port registers start
at SMI address 0x08 or 0x18 (i.e., always sw_addr + 0x08), we need to
introduce a new compatible string in order for the auto-identification
in mv88e6xxx_detect() to work.

The chip has four per port 16-bits statistics registers, two of which
correspond to the existing "sw_in_filtered" and "sw_out_filtered" (but
at offsets 0x13 and 0x10 rather than 0x12 and 0x13, because why should
this be easy...). Wiring up those four statistics seems to require
introducing a STATS_TYPE_PORT_6250 bit or similar, which seems a tad
ugly, so for now this just allows access to the STATS_TYPE_BANK0 ones.

The chip does have ptp support, and the existing
mv88e6352_{gpio,avb,ptp}_ops at first glance seem like they would work
out-of-the-box, but for simplicity (and lack of testing) I'm eliding
this.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: implement port_link_state for mv88e6250

The mv88e6250 has a rather different way of reporting the link, speed
and duplex status. A simple difference is that the link bit is bit 12
rather than bit 11 of the port status register.

It gets more complicated for speed and duplex, which do not have
separate fields. Instead, there's a four-bit PortMode field, and
decoding that depends on whether it's a phy or mii port. For the phy
ports, only four of the 16 values have defined meaning; the rest are
called "reserved", so returning {SPEED,DUPLEX}_UNKNOWN seems
reasonable.

For the mii ports, most possible values are documented (0x3 and 0x5
are reserved), but I'm unable to make sense of them all. Since the
bits simply reflect the Px_MODE[3:0] configuration pins, just support
the subset that I'm certain about. Support for other setups can be
added later.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: implement port_set_speed for mv88e6250

The data sheet also mentions the possibility of selecting 200 Mbps for
the MII ports (ports 5 and 6) by setting the ForceSpd field to
0x2 (aka MV88E6065_PORT_MAC_CTL_SPEED_200). However, there's a note
that "actual speed is determined by bit 8 above", and flipping back a
page, one finds that bits 13:8 are reserved...

So without further information on what bit 8 means, let's stick to
supporting just 10 and 100 Mbps on all ports.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: implement watchdog_ops for mv88e6250

The MV88E6352_G2_WDOG_CTL_* bits almost, but not quite, describe the
watchdog control register on the mv88e6250. Among those actually
referenced in the code, only QC_ENABLE differs (bit 6 rather than bit
5).

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: implement vtu_getnext and vtu_loadpurge for mv88e6250

These are almost identical to the 6185 variants, but have fewer bits
for the FID.

Bit 10 of the VTU_OP register (offset 0x05) is the VidPolicy bit,
which one should probably preserve in mv88e6xxx_g1_vtu_op(), instead
of always writing a 0. However, on the 6352 family, that bit is
located at bit 12 in the VTU FID register (offset 0x02), and is always
unconditionally cleared by the mv88e6xxx_g1_vtu_fid_write()
function.

Since nothing in the existing driver seems to know or care about that
bit, it seems reasonable to not add the boilerplate to preserve it for
the 6250 (which would require adding a chip-specific vtu_op function,
or adding chip-quirks to the existing one).

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prepare mv88e6xxx_g1_atu_op() for the mv88e6250

All the currently supported chips have .num_databases either 256 or
4096, so this patch does not change behaviour for any of those. The
mv88e6250, however, has .num_databases == 64, and it does not put the
upper two bits in ATU control 13:12, but rather in ATU Operation
9:8. So change the logic to prepare for supporting mv88e6250.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: introduce support for two chips using direct smi addressing

The 88e6250 (as well as 6220, 6071, 6070, 6020) do not support
multi-chip (indirect) addressing. However, one can still have two of
them on the same mdio bus, since the device only uses 16 of the 32
possible addresses, either addresses 0x00-0x0F or 0x10-0x1F depending
on the ADDR4 pin at reset [since ADDR4 is internally pulled high, the
latter is the default].

In order to prepare for supporting the 88e6250 and friends, introduce
mv88e6xxx_info::dual_chip to allow having a non-zero sw_addr while
still using direct addressing.

Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add mv88e6250_g1_ieee_pri_map

Quite a few of the existing supported chips that use
mv88e6085_g1_ieee_pri_map as ->ieee_pri_map (including, incidentally,
mv88e6085 itself) actually have a reset value of 0xfa50 in the
G1_IEEE_PRI register.

The data sheet for the mv88e6095, however, does describe a reset value
of 0xfa41.

So rather than changing the value in the existing callback, introduce
a new variant with the 0xfa50 value. That will be used by the upcoming
mv88e6250, and existing chips can be switched over one by one,
preferably double-checking both the data sheet and actual hardware in
each case - if anybody actually feels this is important enough to
care.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: turn off lro when rxcsum is disabled

Currently, when rx csum is disabled, vmxnet3 driver does not turn
off lro, which can cause performance issues if user does not turn off
lro explicitly. This patch adds fix_features support which is used to
turn off LRO whenever RXCSUM is disabled.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Rishi Mehta <rmehta@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-add-struct-nexthop-to-fib-info'

David Ahern says:

====================
net: add struct nexthop to fib{6}_info

Set 10 of 11 to improve route scalability via support for nexthops as
standalone objects for fib entries.
https://lwn.net/Articles/763950/

This sets adds 'struct nexthop' to fib_info and fib6_info. IPv4
already handles multiple fib_nh entries in a single fib_info, so
the conversion to use a nexthop struct is fairly mechanical. IPv6
using a nexthop struct with a fib6_info impacts a lot of core logic
which is built around the assumption of a single, builtin fib6_nh
per fib6_info. To make this easier to review, this set adds
nexthop to fib6_info and adds checks in most places fib6_info is
used. The next set finishes the IPv6 conversion, walking through
the places that need to consider all fib6_nh within a nexthop struct.

Offload drivers - mlx5, mlxsw and rocker - are changed to fail FIB
entries using nexthop objects. That limitation can be removed once
the drivers are updated to properly support separate nexthops.

This set starts by adding accessors for fib_nh and fib_nhs in a
fib_info. This makes it easier to extract the number of nexthops
in the fib entry and a specific fib_nh once the entry references
a struct nexthop. Patch 2 converts more of IPv4 code to use
fib_nh_common allowing a struct nexthop to use a fib6_nh with an
IPv4 entry.

Patches 3 and 4 add 'struct nexthop' to fib{6}_info and update
references to both take a different path when it is set. New
exported functions are added to the nexthop code to validate a
nexthop struct when configured for use with a fib entry. IPv4
is allowed to use a nexthop with either v4 or v6 entries. IPv6
is limited to v6 entries only. In both cases list_heads track
the fib entries using a nexthop struct for fast correlation on
events (e.g., device events or nexthop events like delete or
replace).

The last 3 patches add hooks to drivers listening for FIB
notificationas. All 3 of them reject the routes as unsupported,
returning an error message to the user via extack. For mlxsw
at least this is a stop gap measure until the driver is updated for
proper support.

Functional tests for nexthops have already been committed. Those tests
will be active after the next patch set which makes the code paths
created by this set and the next one live.

Existing code paths moved to the else branch of 'if (f{6}i->nh)' checks
are covered by existing tests under selftests/net.

v3
- remove ip6_create_rt_rcu from ip6_pol_route in patch 4 and use pcpu
routes for REJECT routes with the blackhole nexthop (request from Wei)

v2
- no code changes from v1
- commit messages for first 4 patches updated
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Fail attempts to use routes with nexthop objects

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: Fail attempts to use routes with nexthop objects

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Fail attempts to use routes with nexthop objects

Fail attempts to use nexthop objects with routes until support can be
properly added.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Plumb support for nexthop object in a fib6_info

Add struct nexthop and nh_list list_head to fib6_info. nh_list is the
fib6_info side of the nexthop <-> fib_info relationship. Since a fib6_info
referencing a nexthop object can not have 'sibling' entries (the old way
of doing multipath routes), the nh_list is a union with fib6_siblings.

Add f6i_list list_head to 'struct nexthop' to track fib6_info entries
using a nexthop instance. Update __remove_nexthop_fib to walk f6_list
and delete fib entries using the nexthop.

Add a few nexthop helpers for use when a nexthop is added to fib6_info:
- nexthop_fib6_nh - return first fib6_nh in a nexthop object
- fib6_info_nh_dev moved to nexthop.h and updated to use nexthop_fib6_nh
  if the fib6_info references a nexthop object
- nexthop_path_fib6_result - similar to ipv4, select a path within a
  multipath nexthop object. If the nexthop is a blackhole, set
  fib6_result type to RTN_BLACKHOLE, and set the REJECT flag

Update the fib6_info references to check for nh and take a different path
as needed:
- rt6_qualify_for_ecmp - if a fib entry uses a nexthop object it can NOT
  be coalesced with other fib entries into a multipath route
- rt6_duplicate_nexthop - use nexthop_cmp if either fib6_info references
  a nexthop
- addrconf (host routes), RA's and info entries (anything configured via
  ndisc) does not use nexthop objects
- fib6_info_destroy_rcu - put reference to nexthop object
- fib6_purge_rt - drop fib6_info from f6i_list
- fib6_select_path - update to use the new nexthop_path_fib6_result when
  fib entry uses a nexthop object
- rt6_device_match - update to catch use of nexthop object as a blackhole
  and set fib6_type and flags.
- ip6_route_info_create - don't add space for fib6_nh if fib entry is
  going to reference a nexthop object, take a reference to nexthop object,
  disallow use of source routing
- rt6_nlmsg_size - add space for RTA_NH_ID
- add rt6_fill_node_nexthop to add nexthop data on a dump

As with ipv4, most of the changes push existing code into the else branch
of whether the fib entry uses a nexthop object.

Update the nexthop code to walk f6i_list on a nexthop deleted to remove
fib entries referencing it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Plumb support for nexthop object in a fib_info

Add 'struct nexthop' and nh_list list_head to fib_info. nh_list is the
fib_info side of the nexthop <-> fib_info relationship.

Add fi_list list_head to 'struct nexthop' to track fib_info entries
using a nexthop instance. Add __remove_nexthop_fib and add it to
__remove_nexthop to walk the new list_head and mark those fib entries
as dead when the nexthop is deleted.

Add a few nexthop helpers for use when a nexthop is added to fib_info:
- nexthop_cmp to determine if 2 nexthops are the same
- nexthop_path_fib_result to select a path for a multipath
  'struct nexthop'
- nexthop_fib_nhc to select a specific fib_nh_common within a
  multipath 'struct nexthop'

Update existing fib_info_nhc to use nexthop_fib_nhc if a fib_info uses
a 'struct nexthop', and mark fib_info_nh as only used for the non-nexthop
case.

Update the fib_info functions to check for fi->nh and take a different
path as needed:
- free_fib_info_rcu - put the nexthop object reference
- fib_release_info - remove the fib_info from the nexthop's fi_list
- nh_comp - use nexthop_cmp when either fib_info references a nexthop
  object
- fib_info_hashfn - use the nexthop id for the hashing vs the oif of
  each fib_nh in a fib_info
- fib_nlmsg_size - add space for the RTA_NH_ID attribute
- fib_create_info - verify nexthop reference can be taken, verify
  nexthop spec is valid for fib entry, and add fib_info to fi_list for
  a nexthop
- fib_select_multipath - use the new nexthop_path_fib_result to select a
  path when nexthop objects are used
- fib_table_lookup - if the 'struct nexthop' is a blackhole nexthop, treat
  it the same as a fib entry using 'blackhole'

The bulk of the changes are in fib_semantics.c and most of that is
moving the existing change_nexthops into an else branch.

Update the nexthop code to walk fi_list on a nexthop deleted to remove
fib entries referencing it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Prepare for fib6_nh from a nexthop object

Convert more IPv4 code to use fib_nh_common over fib_nh to enable routes
to use a fib6_nh based nexthop. In the end, only code not using a
nexthop object in a fib_info should directly access fib_nh in a fib_info
without checking the famiy and going through fib_nh_common. Those
functions will be marked when it is not directly evident.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Use accessors for fib_info nexthop data

Use helpers to access fib_nh and fib_nhs fields of a fib_info. Drop the
fib_dev macro which is an alias for the first nexthop. Replacements:

  fi->fib_dev    --> fib_info_nh(fi, 0)->fib_nh_dev
  fi->fib_nh     --> fib_info_nh(fi, 0)
  fi->fib_nh[i]  --> fib_info_nh(fi, i)
  fi->fib_nhs    --> fib_info_num_path(fi)

where fib_info_nh(fi, i) returns fi->fib_nh[nhsel] and fib_info_num_path
returns fi->fib_nhs.

Move the existing fib_info_nhc to nexthop.h and define the new ones
there. A later patch adds a check if a fib_info uses a nexthop object,
and defining the helpers in nexthop.h avoid circular header
dependencies.

After this all remaining open coded references to fi->fib_nhs and
fi->fib_nh are in:
- fib_create_info and helpers used to lookup an existing fib_info
  entry, and
- the netdev event functions fib_sync_down_dev and fib_sync_up.

The latter two will not be reused for nexthops, and the fib_create_info
will be updated to handle a nexthop in a fib_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Always allocate pcpu memory in a fib6_nh

A recent commit had an unintended side effect with reject routes:
rt6i_pcpu is expected to always be initialized for all fib6_info except
the null entry. The commit mentioned below skips it for reject routes
and ends up leaking references to the loopback device. For example,

    ip netns add foo
    ip -netns foo li set lo up
    ip -netns foo -6 ro add blackhole 2001:db8:1::1
    ip netns exec foo ping6 2001:db8:1::1
    ip netns del foo

ends up spewing:
    unregister_netdevice: waiting for lo to become free. Usage count = 3

The fib_nh_common_init is not needed for reject routes (no ipv4 caching
or encaps), so move the alloc_percpu_gfp after it and adjust the goto label.

Fixes: f40b6ae2b612 ("ipv6: Move pcpu cached routes to fib6_nh")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hinic: add LRO support

This patch adds LRO support for the HiNIC driver.

Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond-mpls'

Ariel Levkovich says:

====================
Support MPLS features in bonding and vlan net devices

Netdevice HW MPLS features are not passed from device driver's netdevice to
upper netdevice, specifically VLAN and bonding netdevice which are created
by the kernel when needed.

This prevents enablement and usage of HW offloads, such as TSO and checksumming
for MPLS tagged traffic when running via VLAN or bonding interface.

The patches introduce changes to the initialization steps of the VLAN and bonding
netdevices to inherit the MPLS features from lower netdevices to allow the HW
offloads.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: vlan: Inherit MPLS features from parent device

During the creation of the VLAN interface net device,
the various device features and offloads are being set based
on the parent device's features.
The code initiates the basic, vlan and encapsulation features
but doesn't address the MPLS features set and they remain blank.
As a result, all device offloads that have significant performance
effect are disabled for MPLS traffic going via this VLAN device such
as checksumming and TSO.

This patch makes sure that MPLS features are also set for the
VLAN device based on the parent which will allow HW offloads of
checksumming and TSO to be performed on MPLS tagged packets.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: Inherit MPLS features from slave devices

When setting the bonding interface net device features,
the kernel code doesn't address the slaves' MPLS features
and doesn't inherit them.

Therefore, HW offloads that enhance performance such as
checksumming and TSO are disabled for MPLS tagged traffic
flowing via the bonding interface.

The patch add the inheritance of the MPLS features from the
slave devices with a similar logic to setting the bonding device's
VLAN and encapsulation features.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-tls-small-general-improvements'

Jakub Kicinski says:

====================
net/tls: small general improvements

This series cleans up and improves the tls code, mostly the offload
parts.

First a slight performance optimization - avoiding unnecessary re-
-encryption of records in patch 1.  Next patch 2 makes the code
more resilient by checking for errors in skb_copy_bits().  Next
commit removes a warning which can be triggered in normal operation,
(especially for devices explicitly making use of the fallback path).
Next two paths change the condition checking around the call to
tls_device_decrypted() to make it easier to extend.  Remaining
commits are centered around reorganizing struct tls_context for
better cache utilization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: don't pass version to tls_advance_record_sn()

All callers pass prot->version as the last parameter
of tls_advance_record_sn(), yet tls_advance_record_sn()
itself needs a pointer to prot. Pass prot from callers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: reorganize struct tls_context

struct tls_context is slightly badly laid out. If we reorder things
right we can save 16 bytes (320 -> 304) but also make all fast path
data fit into two cache lines (one read only and one read/write,
down from four cache lines).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: use version from prot

ctx->prot holds the same information as per-direction contexts.
Almost all code gets TLS version from this structure, convert
the last two stragglers, this way we can improve the cache
utilization by moving the per-direction data into cold cache lines.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: don't re-check msg decrypted status in tls_device_decrypted()

tls_device_decrypted() is only called from decrypt_skb_update(),
when ctx->decrypted == false, there is no need to re-check the bit.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: don't look for decrypted frames on non-offloaded sockets

If the RX config of a TLS socket is SW, there is no point iterating
over the fragments and checking if frame is decrypted.  It will
always be fully encrypted.  Note that in fully encrypted case
the function doesn't actually touch any offload-related state,
so it's safe to call for TLS_SW, today.  Soon we will introduce
code which can only be called for offloaded contexts.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: remove false positive warning

It's possible that TCP stack will decide to retransmit a packet
right when that packet's data gets acked, especially in presence
of packet reordering. This means that packets may be in flight,
even though tls_device code has already freed their record state.
Make fill_sg_in() and in turn tls_sw_fallback() not generate a
warning in that case, and quietly proceed to drop such frames.

Make the exit path from tls_sw_fallback() drop monitor friendly,
for users to be able to troubleshoot dropped retransmissions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: check return values from skb_copy_bits() and skb_store_bits()

In light of recent bugs, we should make a better effort of
checking return values. In theory none of the functions should
fail today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/tls: fully initialize the msg wrapper skb

If strparser gets cornered into starting a new message from
an sk_buff which already has frags, it will allocate a new
skb to become the "wrapper" around the fragments of the
message.

This new skb does not inherit any metadata fields. In case
of TLS offload this may lead to unnecessarily re-encrypting
the message, as skb->decrypted is not set for the wrapper skb.

Try to be conservative and copy all fields of old skb
strparser's user may reasonably need.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mscc: ocelot: Fix some struct initializations

Clang warns:

drivers/net/ethernet/mscc/ocelot_ace.c:335:37: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
        struct ocelot_vcap_u64 payload = { 0 };
                                           ^
                                           {}
drivers/net/ethernet/mscc/ocelot_ace.c:336:28: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
        struct vcap_data data = { 0 };
                                  ^
                                  {}
drivers/net/ethernet/mscc/ocelot_ace.c:683:37: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
        struct ocelot_ace_rule del_ace = { 0 };
                                           ^
                                           {}
drivers/net/ethernet/mscc/ocelot_ace.c:743:28: warning: suggest braces
around initialization of subobject [-Wmissing-braces]
        struct vcap_data data = { 0 };
                                  ^
                                  {}
4 warnings generated.

One way to fix these warnings is to add additional braces like Clang
suggests; however, there has been a bit of push back from some
maintainers[1][2], who just prefer memset as it is unambiguous, doesn't
depend on a particular compiler version[3], and properly initializes all
subobjects. Do that here so there are no more warnings.

[1]: https://lore.kernel.org/lkml/022e41c0-8465-dc7a-a45c-64187ecd9684@amd.com/
[2]: https://lore.kernel.org/lkml/20181128.215241.702406654469517539.davem@davemloft.net/
[3]: https://lore.kernel.org/lkml/20181116150432.2408a075@redhat.com/

Fixes: b596229448dd ("net: mscc: ocelot: Add support for tcam")
Link: https://github.com/ClangBuiltLinux/linux/issues/505
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: fix rcu lockdep splat due to wrong annotation

syzbot triggered following splat when strict netlink
validation is enabled:

net/ipv4/devinet.c:1766 suspicious rcu_dereference_check() usage!

This occurs because we hold RTNL mutex, but no rcu read lock.
The second call site holds both, so just switch to the _rtnl variant.

Reported-by: syzbot+bad6e32808a3a97b1515@syzkaller.appspotmail.com
Fixes: 2638eb8b50cf ("net: ipv4: provide __rcu annotation for ifa_list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-expose-flash-update-status-to-user'

Jiri Pirko says:

====================
expose flash update status to user

When user is flashing device using devlink, he currenly does not see any
information about what is going on, percentages, etc.
Drivers, for example mlxsw and mlx5, have notion about the progress
and what is happening. This patchset exposes this progress
information to userspace.

Example output for existing flash command:
$ devlink dev flash pci/0000:01:00.0 file firmware.bin
Preparing to flash
Flashing 100%
Flashing done

See this console recording which shows flashing FW on a Mellanox
Spectrum device:
https://asciinema.org/a/247926

Please see individual patches for changelog.
v2->v3 only adds tags and the last selftest patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: add basic netdevsim devlink flash testing

Utilizes the devlink flash code.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: implement fake flash updating with notifications

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Implement flash update status notifications

Implement mlxfw status_notify op by passing notification down to
devlink. Also notify about flash update begin and end.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxfw: Introduce status_notify op and call it to notify about the status

Add new op status_notify which is called to update the user about
flashing status.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: allow driver to update progress of flash update

Introduce a function to be called from drivers during flash. It sends
notification to userspace about flash update progress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxfw: Propagate error messages through extack

Currently the error messages are printed to dmesg. Propagate them also
to directly to user doing the flashing through extack.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: Move firmware flash implementation to devlink

Benefit from the devlink flash update implementation and ethtool
fallback to it and move firmware flash implementation there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Move firmware flash implementation to devlink

Benefit from the devlink flash update implementation and ethtool
fallback to it and move firmware flash implementation there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: socfpga: add RMII phy mode

Add option for enabling RMII phy mode.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Wei Liang Lim <wei.liang.lim@intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'FDB-updates-for-SJA1105-DSA-driver'

Vladimir Oltean says:

====================
FDB updates for SJA1105 DSA driver

This patch series adds:

- FDB switchdev support for the second generation of switches (P/Q/R/S).
I could test/code these now that I got a board with a SJA1105Q.

- Management route support for SJA1105 P/Q/R/S. This is needed to send
PTP/STP/management frames over the CPU port.

- Logic to hide private DSA VLANs from the 'bridge fdb' commands.

The new FDB code was also tested and still works on SJA1105T.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command

TX VLANs and RX VLANs are an internal implementation detail of DSA for
frame tagging. They work by installing special VLANs on switch ports in
the operating modes where no behavior change w.r.t. VLANs can be
observed by the user.

Therefore it makes sense to hide these VLANs in the 'bridge fdb'
command, as well as translate the pvid into the RX VID and TX VID on
'bridge fdb add' and 'bridge fdb del' commands.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Unset port from forwarding mask unconditionally on fdb_del

This is a cosmetic patch that simplifies the code by removing a
redundant check. A logical AND-with-zero performed on a zero is still
zero.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Add FDB operations for P/Q/R/S series

This adds support for manipulating the L2 forwarding database (dump,
add, delete) for the second generation of NXP SJA1105 switches.

At the moment only FDB entries installed statically through 'bridge fdb'
are visible in the dump callback - the dynamically learned ones are
still under investigation.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Add P/Q/R/S management route support via dynamic interface

Management routes are one-shot FDB rules installed on the CPU port for
sending link-local traffic. They are a prerequisite for STP, PTP etc to
work.

Also make a note that removing a management route was not supported on
the previous generation of switches.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Make dynamic_config_read return -ENOENT if not found

Conceptually, if an entry is not found in the requested hardware table,
it is not an invalid request - so change the error returned
appropriately.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Add P/Q/R/S support for dynamic L2 lookup operations

These are needed in order to implement the switchdev FDB callbacks.

Compared to the E/T generation, not only the ABI (bit offsets) is
different, but also the introduction of the HOSTCMD field which permits
O(1) TCAM search for an FDB entry. Make use of the newly introduce
OP_SEARCH to permit that. It will be used while adding and deleting an
FDB entry (to see whether it exists or not).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Make room for P/Q/R/S FDB operations

The DSA callbacks were written with the E/T (first generation) in mind,
which is quite different.

For P/Q/R/S completely new implementations need to be provided, which
are held as function pointers in the priv->info structure. We are
taking a slightly roundabout way for this (a function from
sja1105_main.c reads a structure defined in sja1105_spi.c that
points to a function defined in sja1105_main.c), but it is what it is.

The FDB dump callback works for both families, hence no function pointer
for that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Plug in support for TCAM searches via the dynamic interface

Only a single dynamic configuration table of the SJA1105 P/Q/R/S
supports this operation: the FDB.

To keep the existing structure in place (sja1105_dynamic_config_read and
sja1105_dynamic_config_write) and not introduce any new function, a
convention is made for sja1105_dynamic_config_read that a negative index
argument denotes a search for the entry provided as argument.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: sja1105: Add missing L2 Forwarding Table definitions for P/Q/R/S

This appends to the L2 Forwarding and L2 Forwarding Parameters tables
(originally added for first-generation switches) the bits that are new
in the second generation.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>