Mitch Williams [Fri, 8 Feb 2019 20:50:41 +0000 (12:50 -0800)]
ice: use absolute vector ID for VFs
When the PF driver sets up the VF MSI-X vector allocation, it needs to
use the hardware absolute vector ID, not the per-PF vector ID. Without
this change we see (apparent) TX hangs when using VFs on multiple PFs.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Victor Raj [Fri, 8 Feb 2019 20:50:40 +0000 (12:50 -0800)]
ice: check for a leaf node presence
Check for a leaf node presence for a given VSI. This check is required
before removing a VSI since VSIs can't be removed with enabled queues
(with leaf nodes) from the FW scheduler tree unless its a reset.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Victor Raj [Fri, 8 Feb 2019 20:50:39 +0000 (12:50 -0800)]
ice: flush Tx pipe on disable queue timeout
Set the flush Tx pipe flag instead of getting an EAGAIN error when FW
times out in processing the disable Tx queue command.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 8 Feb 2019 20:50:38 +0000 (12:50 -0800)]
ice: clear VF ARQLEN register on reset
On older devices like X710 and X722, the VF's ARQLEN register is cleared
on reset, so the VF driver uses that register to detect an unannounced
reset. Unfortunately, on devices controlled by ice, this register is NOT
cleared on reset. This causes the VF to miss resets, and even on
properly-announced resets, the VF driver complains that it didn't see
the reset.
To fix this, we'll do it in software. When we handle a VF reset (whether
triggered by software or VFLR), clear this register after the HW reset
is complete.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 8 Feb 2019 20:50:37 +0000 (12:50 -0800)]
ice: don't spam VFs with link messages
Don't send a link message to the VFs unless link actually changes state.
This avoids a small timing hole in some VF drivers that can cause an
apparent TX hang if they receive a link status message at the wrong time.
Although we have fixed the timing hole in the current VF driver, there
are still lots of drivers in the field that have this timing hole. Let's
not fall into it if we can avoid it.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Brett Creeley [Fri, 8 Feb 2019 20:50:36 +0000 (12:50 -0800)]
ice: only use the VF for ICE_VSI_VF in ice_vsi_release
In ice_vsi_release we are always assigning a value to the local VF
variable. Change this to only be assigned if the VSI is a VF VSI.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 8 Feb 2019 20:50:35 +0000 (12:50 -0800)]
ice: fix numeric overflow warning
When compiling and analyzing the driver on newer kernels, a static
analyzer warns about the following "numeric overflow" issues:
"The result of expression: 'budget-1' generates 4-byte type while casting
to a bigger size of 8-byte".
"The result of expression: '*words-words_read' generates 4-byte type
while casting to a bigger size of 8-byte".
Fix them both.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Brett Creeley [Fri, 8 Feb 2019 20:50:34 +0000 (12:50 -0800)]
ice: fix issue where host reboots on unload when iommu=on
Currently if the kernel has the intel_iommu=on parameter set, on some
platforms removing the driver causes a system reboot. In initialization
we associate the control queue interrupts with the pf->hw_oicr_idx and
enable the interrupts by setting the CAUSE_ENA bit. The problem comes
on teardown because we are not clearing the CAUSE_ENA bit for the
control queues, but the vector at pf->hw_oicr_idx (miscellaneous
interrupt vector) gets disabled.
Fix this by clearing the CAUSE_ENA bit in the appropriate control queue
registers on when freeing the miscellaneous interrupt vector. Also,
move the call to ice_free_irq_msix_misc() to after ice_deinit_sw() in
ice_remove() because ice_deinit_sw() makes an AQ call, but
ice_free_irq_msix_misc() disables the miscellaneous vector and it's
associated interrupts.
Also, create two small helper functions to enable and disable the
control queue interrupts respectively.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 8 Feb 2019 20:50:33 +0000 (12:50 -0800)]
ice: fix ice_remove_rule_internal vsi_list handling
When adding multiple VLANs to the same VSI, the ice_add_vlan code will
share the VSI list, so as not to create multiple unnecessary VSI lists.
Consider the following flow
ice_add_vlan(hw, <VSI 0 VID 7, VSI 0 VID 8, VSI 0 VID 9>)
Where we add three VLAN filters for VIDs 7, 8, and 9, all for VSI 0.
The ice_add_vlan will create a single vsi_list and share it among all
the filters.
Later, if we try to remove a VLAN,
ice_remove_vlan(hw, <VSI 0 VID 7>)
Then the removal code will update the vsi_list and remove VSI 0 from it.
But, since the vsi_list is shared, this breaks the list for the other
users who reference it. We actually even free the VSI list memory, and
may result in segmentation faults.
This is due to the way that VLAN rule share VSI lists with reference
counts, and is caused because we call ice_rem_update_vsi_list even when
the ref_cnt is greater than one.
To fix this, handle the case where ref_cnt is greater than one
separately. In this case, we need to remove the associated rule without
modifying the vsi_list, since it is currently being referenced by
another rule. Instead, we just need to decrement the VSI list ref_cnt.
The case for handling sharing of VSI lists with multiple VSIs is not
currently supported by this code. No such rules will be created today,
and this code will require changes if/when such code is added.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 8 Feb 2019 20:50:32 +0000 (12:50 -0800)]
ice: fix stack hogs from struct ice_vsi_ctx structures
struct ice_vsi_ctx has gotten large enough that function local declarations
of it on the stack are causing stack hogs. Fix that by allocating the
structs on heap. Cleanup some formatting issues in the code around these
changes and fix incorrect data type uses of returned functions in a couple
places.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 8 Feb 2019 20:50:31 +0000 (12:50 -0800)]
ice: sizeof(<type>) should be avoided
With sizeof(), it is preferable to use the variable of type <type> instead
of sizeof(<type>).
There are multiple places where a temporary variable is used to hold a
'size' value which is then used for a subsequent alloc/memset. Get rid
of the temporary variable by calculating size as part of the alloc/memset
statement.
Also remove unnecessary type-cast.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Victor Raj [Fri, 8 Feb 2019 20:50:30 +0000 (12:50 -0800)]
ice: Fix added in VSI supported nodes calc
VSI supported nodes are calculated in order to add the VSI parent or
intermediate nodes to the scheduler tree. If one of the node in below
layers (from VSI layer) has space to add the new VSI or intermediate node
above that layer then it's not required to continue the calculation further
for below layers.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Maciej Fijalkowski [Fri, 8 Feb 2019 20:50:29 +0000 (12:50 -0800)]
ice: Fix the calculation of ICE_MAX_MTU
Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and
adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended.
The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded
with parentheses.
Wrap mentioned expression and take into account VLAN double tagging.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 8 Feb 2019 20:50:28 +0000 (12:50 -0800)]
ice: Mark extack argument as __always_unused
Commit
87b0984ebfab ("net: Add extack argument to ndo_fdb_add()") in
net-next added an extended parameter to the .ndo_fdb_add op and changed
ice_fdb_add() accordingly. Update the function header and add the
__always_unused attribute to the new parameter to avoid -Wunused-parameter
warnings.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Peter Oskolkov [Sun, 24 Feb 2019 02:25:01 +0000 (18:25 -0800)]
net: fix double-free in bpf_lwt_xmit_reroute
dst_output() frees skb when it fails (see, for example,
ip_finish_output2), so it must not be freed in this case.
Fixes: 3bd0b15281af ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c")
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Sun, 24 Feb 2019 00:24:45 +0000 (08:24 +0800)]
ip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmit
ip l add dev tun type gretap key 1000
Non-tunnel-dst ip tunnel device can send packet through lwtunnel
This patch provide the tun_inf dst cache support for this mode.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 06:21:23 +0000 (22:21 -0800)]
Merge branch 'dsa-mv88e6xxx-lockdep'
Andrew Lunn says:
====================
mv88e6xxx: Avoid false positive Lockdep splats
When acquiring the GPIO interrupt line for the switch, it is possible
to trigger lockdep splats. These are false positives, the mutex is in
a different IRQ descriptor. But fix it anyway, since it could mask
real locking issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 23 Feb 2019 16:43:57 +0000 (17:43 +0100)]
net: dsa: mv88e6xxx: Release lock while requesting IRQ
There is no need to hold the register lock while requesting the GPIO
interrupt. By not holding it we can also avoid a false positive
lockdep splat.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 23 Feb 2019 16:43:56 +0000 (17:43 +0100)]
net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splat
The following false positive lockdep splat has been observed.
======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #302 Not tainted
------------------------------------------------------
systemd-udevd/160 is trying to acquire lock:
edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704
but task is already holding lock:
edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&desc->request_mutex){+.+.}:
mutex_lock_nested+0x1c/0x24
__setup_irq+0xa0/0x704
request_threaded_irq+0xd0/0x150
mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
mdio_probe+0x2c/0x54
really_probe+0x200/0x2c4
driver_probe_device+0x5c/0x174
__driver_attach+0xd8/0xdc
bus_for_each_dev+0x58/0x7c
bus_add_driver+0xe4/0x1f0
driver_register+0x7c/0x110
mdio_driver_register+0x24/0x58
do_one_initcall+0x74/0x2e8
do_init_module+0x60/0x1d0
load_module+0x1968/0x1ff4
sys_finit_module+0x8c/0x98
ret_fast_syscall+0x0/0x28
0xbedf2ae8
-> #0 (&chip->reg_lock){+.+.}:
__mutex_lock+0x50/0x8b8
mutex_lock_nested+0x1c/0x24
__setup_irq+0x640/0x704
request_threaded_irq+0xd0/0x150
mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
mdio_probe+0x2c/0x54
really_probe+0x200/0x2c4
driver_probe_device+0x5c/0x174
__driver_attach+0xd8/0xdc
bus_for_each_dev+0x58/0x7c
bus_add_driver+0xe4/0x1f0
driver_register+0x7c/0x110
mdio_driver_register+0x24/0x58
do_one_initcall+0x74/0x2e8
do_init_module+0x60/0x1d0
load_module+0x1968/0x1ff4
sys_finit_module+0x8c/0x98
ret_fast_syscall+0x0/0x28
0xbedf2ae8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&desc->request_mutex);
lock(&chip->reg_lock);
lock(&desc->request_mutex);
lock(&chip->reg_lock);
&desc->request_mutex refer to two different mutex. #1 is the GPIO for
the chip interrupt. #2 is the chained interrupt between global 1 and
global 2.
Add lockdep classes to the GPIO interrupt to avoid this.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Sat, 23 Feb 2019 13:32:54 +0000 (21:32 +0800)]
ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel
The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vakul Garg [Sat, 23 Feb 2019 08:42:37 +0000 (08:42 +0000)]
tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg
The patch enables returning 'type' in msghdr for records that are
retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
from socket from getting clubbed with any other record of different
type when records are subsequently dequeued from strparser.
For each record, we now retain its type in sk_buff's control buffer
cb[]. Inside control buffer, record's full length and offset are already
stored by strparser in 'struct strp_msg'. We store record type after
'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
stored just after record dequeue. For tls1.3, the type is stored after
record has been decrypted.
Inside process_rx_list(), before processing a non-data record, we check
that we must be able to return back the record type to the user
application. If not, the decrypted records in tls context's rx_list is
left there without consuming any data.
Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 05:57:26 +0000 (21:57 -0800)]
Merge branch 'ipv4-v6-icmp-small-cleanup-and-update'
Kefeng Wang says:
====================
ipv4/v6: icmp: small cleanup and update
v2:
- Add cover letter and user proper patch subject-prefix suggested-by Eric Dumazet
This patch series contains some small cleanup and update,
1) use icmp/v6_sk_exit when icmp_sk_init fails instead of open-code
2) use new percpu allocation interface for the ipv6.icmp_sk
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kefeng Wang [Sat, 23 Feb 2019 07:28:28 +0000 (15:28 +0800)]
ipv6: icmp: use percpu allocation
Use percpu allocation for the ipv6.icmp_sk.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kefeng Wang [Sat, 23 Feb 2019 07:28:27 +0000 (15:28 +0800)]
ipv6: icmp: use icmpv6_sk_exit()
Simply use icmpv6_sk_exit() when inet_ctl_sock_create() fail
in icmpv6_sk_init().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kefeng Wang [Sat, 23 Feb 2019 07:28:26 +0000 (15:28 +0800)]
ipv4: icmp: use icmp_sk_exit()
Simply use icmp_sk_exit() when inet_ctl_sock_create() fail in icmp_sk_init().
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 23 Feb 2019 05:30:47 +0000 (13:30 +0800)]
ila: Fix uninitialised return value in ila_xlat_nl_cmd_flush
This patch fixes an uninitialised return value error in
ila_xlat_nl_cmd_flush.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6c4128f65857 ("rhashtable: Remove obsolete...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
wenxu [Fri, 22 Feb 2019 07:58:12 +0000 (15:58 +0800)]
net/sched: act_tunnel_key: Add dst_cache support
The metadata_dst is not init the dst_cache which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 04:27:51 +0000 (20:27 -0800)]
Merge branch 'code-optimizations-and-bugfixes-for-HNS3-driver'
Huazhong Tan says:
====================
code optimizations & bugfixes for HNS3 driver
This patchset includes bugfixes and code optimizations for
the HNS3 ethernet controller driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Sat, 23 Feb 2019 09:22:19 +0000 (17:22 +0800)]
net: hns3: fix improper error handling for hns3_client_start
If hns3_client_start() failed in the hns3_client_init(),
register_dev() should be undo in its error handling.
Fixes: a6d818e31d08 ("net: hns3: Add vport alive state checking support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shiju Jose [Sat, 23 Feb 2019 09:22:18 +0000 (17:22 +0800)]
net: hns3: fix setting of the hns reset_type for rdma hw errors
Presently the hns reset_type for the roce errors is set
in the hclge_log_and_clear_rocee_ras_error function.
This function is also called to detect and clear roce errors
while enabling the rdma error interrupts. However there is no hns
reset requested for this case. This can cause issue of wrong
reset_type used with subsequent hns reset as the
reset_type set in the above case was not cleared.
This patch moves setting of hns reset_type for the roce errors from
hclge_log_and_clear_rocee_ras_error function
to hclge_handle_rocee_ras_error.
Fixes: 630ba007f475 ("net: hns3: add handling of RDMA RAS errors")
Reported-by: Huazhong Tan <tanhuazhong@huawei.com>
Reported-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Sat, 23 Feb 2019 09:22:17 +0000 (17:22 +0800)]
net: hns3: fix get VF RSS issue
For revision 0x20, VF shares the same RSS config with PF.
In original codes, it always return 0 when query RSS hash
key for VF. This patch fixes it by return the hash key
got from PF.
Fixes: 374ad291762a ("net: hns3: net: hns3: Add RSS general configuration support for VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Sat, 23 Feb 2019 09:22:16 +0000 (17:22 +0800)]
net: hns3: enable VF VLAN filter for each VF when initializing
For revision 0x21, the switch of VF VLAN filter is per function.
It's necessary to enable VF VLAN filter for each VF when initializing.
Otherwise, VF will be able to receive broadcast packets with unknown
VLAN when PF enters promisc mode.
Fixes: 64d114f0a750 ("net: hns3: Add egress/ingress vlan filter for revision 0x21")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Sat, 23 Feb 2019 09:22:15 +0000 (17:22 +0800)]
net: hns3: add support to config depth for tx|rx ring separately
This patch adds support to config depth for tx|rx ring separately
by ethtool command "-G".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:14 +0000 (17:22 +0800)]
net: hns3: remove hnae3_get_bit in data path
The hnae3_get_bit uses hnae3_get_field, and hnae3_get_field
masks the data, which is unnecessary in data path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:13 +0000 (17:22 +0800)]
net: hns3: replace hnae3_set_bit and hnae3_set_field in data path
hnae3_set_bit and hnae3_set_field masks the data before setting
the field or bit, which is unnecessary because the data is already
zero initialized.
Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:12 +0000 (17:22 +0800)]
net: hns3: add unlikely for error handling in data path
This patch adds unlikely hint for error handling in critical data
path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:11 +0000 (17:22 +0800)]
net: hns3: remove some ops in struct hns3_nic_ops
The fill_desc ops has only one implementation, and
get_rxd_bnum has not been used, so this patch removes
them.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:10 +0000 (17:22 +0800)]
net: hns3: limit some variable scope in critical data path
This patch limits some variables' scope as much as possible in
hns3_fill_desc.
Also, only set l3_type and l4_type when necessary.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:09 +0000 (17:22 +0800)]
net: hns3: avoid mult + div op in critical data path
This patch uses shift offset to avoid doing mult and div operation.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:08 +0000 (17:22 +0800)]
net: hns3: add xps setting support for hns3 driver
This patch adds xps setting support for hns3 driver based on
the interrupt affinity info.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 04:25:30 +0000 (20:25 -0800)]
Merge branch 'mlxsw-spectrum_acl-Don-t-take-rtnl-mutex-for-region-rehash'
Ido Schimmel says:
====================
mlxsw: spectrum_acl: Don't take rtnl mutex for region rehash
Jiri says:
During region rehash, a new region is created with a more optimized set
of masks (ERPs). When transitioning to the new region, all the rules
from the old region are copied one-by-one to the new region. This
transition can be time consuming and currently done under RTNL lock.
In order to remove RTNL lock dependency during region rehash, introduce
multiple smaller locks guarding dedicated structures or parts of them.
That is the vast majority of this patchset. Only patch #1 is simple
cleanup and patches 12-15 are improving or introducing new selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:34 +0000 (06:46 +0000)]
selftests: mlxsw: spectrum-2: Add massive delta rehash test
Do insertions and removal of filters during rehash in higher volumes.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:33 +0000 (06:46 +0000)]
selftests: mlxsw: spectrum-2: Check migrate end trace
Add checking of newly added trace.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:33 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Add vregion migration end tracepoint
Hit the new tracepoint once the vregion migration ends.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:32 +0000 (06:46 +0000)]
selftests: mlxsw: spectrum-2: Add IPv6 variant of simple delta rehash test
Track the basic codepaths of delta rehash handling,
using mlxsw tracepoints. Use IPv6 addresses.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:31 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()
Other mutexes are taking care of proper locking for this, no longer
needed to take RTNL mutex here.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:30 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Remove RTNL lock assertions from ERP code
No longer require RTNL lock in this code. Newly introduced mutexes take
care of guarding objagg and bloom filter. There is no need to guard
gen_pool_alloc()/gen_pool_free() as they are fine to be called lockless.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:29 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Don't take rtnl lock during vregion_rehash_intrvl_set()
Relax dependency on rtnl mutex during vregion_rehash_intrvl_set(). The
vregion list is protected with newly introduced mutex.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:28 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Introduce a mutex to guard objagg instance manipulation
Protect objagg structures by adding a mutex to ERP code and take it
during the structure manipulation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:28 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Enable vregion rehash per-profile
For MR ACL profile is does not make sense to do periodical rehashes, as
there is only one mask in use during the whole vregion lifetime.
Therefore periodical work is scheduled but the rehash never happens.
So allow to enable/disable rehash for the whole group, which is added
per-profile. Disable rehashing for MR profile.
Addition to the vregion list is done only in case the rehash is enable
on the particular vregion. Also, the addition is moved after delayed
work init to avoid schedule of uninitialized work
from vregion_rehash_intrvl_set(). Symmetrically, deletion from
the list is done before canceling the delayed work so it is
not scheduled by vregion_rehash_intrvl_set() again.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:27 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Introduce mutex to guard Bloom Filter updates
Bloom filter is shared within multiple regions. For updates, it needs to
be guarded by a separate mutex. Do that in order to not rely on RTNL
mutex.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:26 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Introduce vregion mutex
In order to remove dependency on RTNL, introduce a mutex
to guard vregion structure, list of chunks and list of entries in
chunks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:25 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Refactor vregion association code
Refactor existing _vchunk_assoc/_vchunk_deassoc() functions into
_vregion_get()/_vregion_put() to make the code simpler and prepared for
vregion locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:24 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Introduce a mutex to guard region list updates
In order to remove RTNL lock dependency, it is needed to protect
the regions list in a group. Introduce a mutex to do the job.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:23 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Split TCAM group structure into two
Make the existing group structure to contain fields needed for HW region
list manipulations. Move the rest of the fields into new vgroup struct.
This makes layering cleaner as the vgroup struct is on higher level than
low-level group struct. Also, this makes it possible to introduce
fine-grained locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:22 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Remove unused ops field from group structure
Never used, remove it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 01:49:59 +0000 (17:49 -0800)]
Merge branch 'net-dsa-microchip-add-MIB-counters-support'
Tristram Ha says:
====================
net: dsa: microchip: add MIB counters support
This series of patches is to modify the KSZ9477 DSA driver to read MIB
counters periodically to avoid overflow.
The MIB counters should be read only when there is link. Otherwise it is
a waste of time as hardware never increases the counters.
Functions are added to check the port link status so that MIB counters
read call is used efficiently.
v4
- Use readx_poll_timeout
- Fix using mutex in a timer callback function problem
- use dp->slave directly instead of checking whether it is valid
- Add port_cleanup function in a separate patch
- Add a mutex so that changing device variables is safe
v3
- Use netif_carrier_ok instead of checking the phy device pointer
v2
- Create macro similar to readx_poll_timeout to use with switch
- Create ksz_port_cleanup function so that variables like on_ports and
live_ports can be updated inside it
v1
- Use readx_poll_timeout
- Do not clear MIB counters when port is enabled
- Do not advertise 1000 half-duplex mode when port is enabled
- Do not use freeze function as MIB counters may miss counts
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:51 +0000 (16:36 -0800)]
net: dsa: microchip: add port_cleanup function
Add port_cleanup function to reset some device variables when the port is
disabled. Add a mutex to make sure changing those variables is
thread-safe.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:50 +0000 (16:36 -0800)]
net: dsa: microchip: remove unnecessary include headers
Remove unnecessary header include.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:49 +0000 (16:36 -0800)]
net: dsa: microchip: get port link status
Get port link status to know whether to read MIB counters when the link
is going down.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:48 +0000 (16:36 -0800)]
net: dsa: microchip: add MIB counter reading support
Add background MIB counter reading support.
Port MIB counters should only be read when there is link. Otherwise it is
a waste of time as hardware never increases those counters. There are
exceptions as some switches keep track of dropped counts no matter what.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:47 +0000 (16:36 -0800)]
net: dsa: microchip: prepare PHY for proper advertisement
Prepare PHY for proper advertisement as sometimes the PHY in the switch
has its own problems even though it may share the PHY id from regular PHY
but the fixes in the PHY driver do not apply.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 01:45:25 +0000 (17:45 -0800)]
Merge branch 'net-phy-marvell10g-Add-2-5GBaseT-support'
Maxime Chevallier says:
====================
net: phy: marvell10g: Add 2.5GBaseT support
This series adds the missing bits necessary to fully support 2.5GBaseT
in the Marvell Alaska PHYs.
The main points for that support are :
- Making use of the .get_features call, recently introduced by Heiner
and Andrew, that allows having a fully populated list of supported
modes, including 2500BaseT.
- Configuring the MII to 2500BaseX when establishing a link at 2.5G
- Adding a small quirk to take into account the fact that some PHYs in
the family won't report the correct supported abilities
The rest of the series consists of small cosmetic improvements such as
using the correct helper to set a linkmode bit and adding macros for the
PHY ids.
We also add support for the
88E2110 PHY, which doesn't require the
quirk, and support for 2500BaseT in the PPv2 driver, in order to have a
fully working setup on the MacchiatoBin board.
Changes since V1 : Fixed formatting issue in patch 01, rebased.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:44 +0000 (00:37 +0100)]
net: phy: marvell10g: add support for the 88x2110 PHY
This patch adds support for the 88x2110 PHY, which is similar to the
already supported 88x3310 PHY without the SFP interface.
It supports 10/100/1000BASET along with 2.5GBASET, 5GBASET and 10GBASET,
with the same interface modes that are used by the 3310.
This PHY don't have the same issue as the 88x3310 regarding 2.5/5G
abilities, and correctly follows the 802.3bz standard to list the
supported abilities.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Suggested-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:43 +0000 (00:37 +0100)]
net: mvpp2: Add 2.5GBaseT support
The PPv2 controller is able to support 2.5G speeds, allowing to use
2.5GBASET in conjunction with PHYs that use 2500BASEX as their MII
interface when using this mode.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:42 +0000 (00:37 +0100)]
net: phy: marvell10g: Force reading of 2.5/5G
As per 802.3bz, if bit 14 of (1.11) "PMA Extended Abilities" indicates
whether or not we should read register (1.21) "2.52/5G PMA Extended
Abilities", which contains information on the support of 2.5GBASET and
5GBASET.
After testing on several variants of PHYS of this family, it appears
that bit 14 in (1.11) isn't always set when it should be.
PHYs 88X3310 (on MacchiatoBin) and
88E2010 do support 2.5G and 5GBASET,
but don't have 1.11.14 set. Their register 1.21 is filled with the
correct values, indicating 2.5G and 5G support.
PHYs
88E2110 do have their 1.11.14 bit set, as it should.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:41 +0000 (00:37 +0100)]
net: phy: marvell10g: Use a #define for 88X3310 family id
The PHY ID corresponding to the 88X3310 is also used for other PHYs in
the same family, such as the
88E2010. Use a #define for the PHY id, that
ignores the last nibble.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:40 +0000 (00:37 +0100)]
net: phy: marvell10g: Use 2500BASEX when using 2.5GBASET
The Marvell Alaska family of PHYs supports 2.5GBaseT and 5GBaseT modes,
as defined in the 802.3bz specification.
Upon establishing a 2.5GBASET link, the PHY will reconfigure it's MII
interface to 2500BASEX.
At 5G, the PHY will reconfigure it's interface to 5GBASE-R, but this
mode isn't supported by any MAC for now.
This was tested with :
- The 88X3310, which is on the MacchiatoBin
- The
88E2010, an Alaska PHY that has no fiber interfaces, and is
limited to 5G maximum speed.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:39 +0000 (00:37 +0100)]
net: phy: marvell10g: Use linkmode_set_bit helper instead of __set_bit
Cosmetic patch making use of helpers dedicated to linkmodes handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Fri, 22 Feb 2019 23:37:38 +0000 (00:37 +0100)]
net: phy: marvell10g: Use get_features to get the PHY abilities
The Alaska family of 10G PHYs has more abilities than the ones listed in
PHY_10GBIT_FULL_FEATURES, the exact list depending on the model.
Make use of the newly introduced .get_features call to build this list,
using genphy_c45_pma_read_abilities to build the list of supported
linkmodes, and adding autoneg ability based on what's reported by the AN
MMD.
.config_init is still used to validate the interface_mode.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 21:59:38 +0000 (22:59 +0100)]
net: phy: check PMAPMD link status only in genphy_c45_read_link
The current code reports a link as up if all devices (except a few
blacklisted ones) report the link as up. This breaks Aquantia AQCS109
for lower speeds because on this PHY the PCS link status reflects a
10G link only. For Marvell there's a similar issue, therefore PHYXS
device isn't checked.
There may be more PHYs where depending on the mode the link status
of only selected devices is relevant.
For now it seems to be sufficient to check the link status of the
PMAPMD device only. Leave the loop in the code to be prepared in
case we have to add functionality to check more than one device,
depending on the mode.
Successfully tested on a board with an AQCS109.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Feb 2019 01:40:47 +0000 (17:40 -0800)]
Merge branch 'net-switchdev-h-inclusion-removal'
Florian Fainelli says:
====================
net: switchdev.h inclusion removal
This targets a few drivers that no longer to have net/switchdev.h
included.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 22 Feb 2019 20:31:34 +0000 (12:31 -0800)]
net: Remove switchdev.h inclusion from team/bond/vlan
This is no longer necessary after
eca59f691566 ("net: Remove support for bridge bypass ndos from stacked devices")
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 22 Feb 2019 20:31:33 +0000 (12:31 -0800)]
nfp: Remove switchdev.h inclusion
This is no longer necessary after
a5084bb71fa4 ("nfp: Implement
ndo_get_port_parent_id()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Fri, 22 Feb 2019 19:12:57 +0000 (20:12 +0100)]
net: lantiq: Do not use eth_change_mtu()
eth_change_mtu() is not needed any more, the networking subsystem will
call it automatically when this callback is not implemented.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 18:25:59 +0000 (19:25 +0100)]
net: phy: improve definition of __ETHTOOL_LINK_MODE_MASK_NBITS
The way to define __ETHTOOL_LINK_MODE_MASK_NBITS seems to be overly
complicated, go with a standard approach instead.
Whilst we're at it, move the comment to the right place.
v2:
- rebased
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 24 Feb 2019 21:01:05 +0000 (13:01 -0800)]
Merge branch 'net-protodown-support-for-macvlan-and-vxlan'
Andy Roulin says:
====================
net: protodown support for macvlan and vxlan
This patch series adds dev_change_proto_down_generic, a generic
implementation of ndo_change_proto_down, which sets the netdev carrier
state according to the new proto_down value.
This handler adds the ability to set protodown on macvlan and vxlan
interfaces in a generic way for use by control protocols like VRRPD.
Patch (1) introduces the handler in net/code/dev.c. Patch (2) and (3) add
support for change_proto_down in macvlan and vxlan drivers, respectively,
using the new function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:38 +0000 (18:06 +0000)]
vxlan: add ndo_change_proto_down support
Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:37 +0000 (18:06 +0000)]
macvlan: add ndo_change_proto_down support
Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:36 +0000 (18:06 +0000)]
net: dev: add generic protodown handler
Introduce dev_change_proto_down_generic, a generic ndo_change_proto_down
implementation, which sets the netdev carrier state according to proto_down.
This adds the ability to set protodown on vxlan and macvlan devices in a
generic way for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 24 Feb 2019 20:49:59 +0000 (12:49 -0800)]
Merge branch 'Add-tests-for-unlocked-flower-classifier-implementation'
Vlad Buslov says:
====================
Add tests for unlocked flower classifier implementation
Implement tests for tdc testsuite to verify concurrent rules update with
rtnl-unlocked flower classifier implementation. The goal of these tests
is to verify general flower classifier correctness by updating filters
on same classifier instance in parallel and to verify its atomicity by
concurrently updating filters in same handle range. All three filter
update operations (add, replace, delete) are tested.
Existing script tdc_batch.py is re-used for batch file generation. It is
extended with several optional CLI arguments that are needed for
concurrency tests. Thin wrapper tdc_multibatch.py is implemented on top
of tdc_batch.py to simplify its usage when generating multiple batch
files for several test configurations.
Parallelism in tests is implemented by running multiple instances of tc
in batch mode with xargs tool. Xargs is chosen for its ease of use and
because it is available by default on most modern Linux distributions.
====================
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:47 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel replace/delete
Implement test that runs 5 instances of tc replace filter in parallel with
5 instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:46 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel add/delete
Implement test that runs 5 instances of tc add filter in parallel with 5
instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.
Extend tdc_multibatch.py with additional options required to implement the
test: common prefix for all generated batch files, first value of filter
handle range, MAC address prefix modifier. These are necessary to allow
creating batch files with unique keys and handle ranges with multiple
invocation of tdc_multibatch.py helper script.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:45 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify concurrent delete
Implement test that verifies concurrent deletion of rules by executing 10
tc instances that delete flower filters in same handle range. In this case
only one tc instance succeeds in deleting a filter with particular handle.
To mitigate expected failures of all other instances, run tc with 'force'
option to continue processing batch file in case of errors and expect xargs
to return code '123' that indicates that invocation of command(s) exited
with error in range 1-125.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:44 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify concurrent replace
Implement test that verifies concurrent replacement of rules by executing
10 tc instances that replace flower filters in same handle range.
Extend tdc_multibatch.py script with new optional CLI argument that is used
to generate all batch files with same filter handle range.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:43 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules replace
Implement test that verifies parallel rules replacement by adding 1 million
flower filters and then replacing them with 10 concurrent tc instances.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:42 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules deletion
Implement test that verifies parallel rules deletion by adding 1 million
flower filters and then deleting them with 10 concurrent tc instances.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:41 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules insertion
Implement test that verifies parallel rules insertion by adding 1 million
flower filters with 10 concurrent tc instances. Put it to standalone
'concurrency' category.
Implement tdc_multibatch.py helper script that is used to generate multiple
batch files for concurrent tc execution. Extend config with new 'BATCH_DIR'
variable to specify temporary output directory that is used to store batch
files generated by tdc_multibatch.py.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:40 +0000 (16:00 +0200)]
selftests: tdc_batch.py: add options needed for concurrency tests
Extend tdc_batch.py with several optional CLI arguments that are used for
implementation of concurrency tests in following patches in this set:
- Add optional argument to specify range of filter handles used in batch
file [fitler_handle, filter_handle+number). This is needed for testing
filter deletion where it is necessary to know exact handles of configured
filters.
- Add optional argument to specify filter operation type (possible values
are ['add', 'del', 'replace']) instead of hardcoded "add" value. This
allows generation of batches for filter addition, deletion and
replacement.
- Add optional argument to allow user to change mac address prefix that is
used for all filters in batch. This is necessary to allow generating
multiple batches with unique flower classifier keys.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Mikityanskiy [Fri, 22 Feb 2019 12:55:22 +0000 (12:55 +0000)]
net: Skip GSO length estimation if transport header is not set
qdisc_pkt_len_init expects transport_header to be set for GSO packets.
Patch [1] skips transport_header validation for GSO packets that don't
have network_header set at the moment of calling virtio_net_hdr_to_skb,
and allows them to pass into the stack. After patch [2] no placeholder
value is assigned to transport_header if dissection fails, so this patch
adds a check to the place where the value of transport_header is used.
[1] https://patchwork.ozlabs.org/patch/
1044429/
[2] https://patchwork.ozlabs.org/patch/
1046122/
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 22 Feb 2019 11:31:46 +0000 (11:31 +0000)]
doc: add phylink documentation to the networking book
Add some phylink documentation to the networking book detailing how
to convert network drivers from phylib to phylink.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 22 Feb 2019 11:31:41 +0000 (11:31 +0000)]
net: phylink: update mac_config() documentation
A detail for mac_config() had been missed in the documentation for the
method - it is expected that the method will update the MAC to the
settings, rather than completely reprogram the MAC on each call.
Update the documentation for this method for this detail.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 22 Feb 2019 09:08:22 +0000 (17:08 +0800)]
net: Use RCU_INIT_POINTER() to set sk_wq
This pointer is RCU protected, so proper primitives should be used.
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 07:23:04 +0000 (08:23 +0100)]
net: phy: let genphy_c45_read_abilities also check aneg capability
When using genphy_c45_read_abilities() as get_features callback we
also have to set the autoneg capability in phydev->supported.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 24 Feb 2019 19:48:04 +0000 (11:48 -0800)]
Merge git://git./linux/kernel/git/davem/net
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 24 Feb 2019 17:47:07 +0000 (09:47 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bug fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
kvm: x86: Return LA57 feature based on hardware capability
x86/kvm/mmu: fix switch between root and guest MMUs
s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity
Linus Torvalds [Sun, 24 Feb 2019 17:28:26 +0000 (09:28 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Hopefully the last pull request for this release. Fingers crossed:
1) Only refcount ESP stats on full sockets, from Martin Willi.
2) Missing barriers in AF_UNIX, from Al Viro.
3) RCU protection fixes in ipv6 route code, from Paolo Abeni.
4) Avoid false positives in untrusted GSO validation, from Willem de
Bruijn.
5) Forwarded mesh packets in mac80211 need more tailroom allocated,
from Felix Fietkau.
6) Use operstate consistently for linkup in team driver, from George
Wilkie.
7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
and PF code paths.
8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.
9) nfp eBPF code gen fixes from Jiong Wang.
10) bnxt_en firmware timeout fix from Michael Chan.
11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.
12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
net: phy: realtek: Dummy IRQ calls for RTL8366RB
tcp: repaired skbs must init their tso_segs
net/x25: fix a race in x25_bind()
net: dsa: Remove documentation for port_fdb_prepare
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
bpf, doc: add bpf list as secondary entry to maintainers file
udp: fix possible user after free in error handler
udpv6: fix possible user after free in error handler
fou6: fix proto error handler argument type
udpv6: add the required annotation to mib type
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Documentation: networking: switchdev: Update port parent ID section
...
Roopa Prabhu [Sun, 24 Feb 2019 06:25:12 +0000 (22:25 -0800)]
trace: events: neigh_update: print new state in string format
Also, extend neigh_state_str to include neigh dummy states
noarp and permanent
Fixes: 9c03b282badb ("trace: events: add a few neigh tracepoints")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 24 Feb 2019 00:11:15 +0000 (01:11 +0100)]
net: phy: realtek: Dummy IRQ calls for RTL8366RB
This fixes a regression introduced by
commit
0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7
"net: phy: replace PHY_HAS_INTERRUPT with a check for
config_intr and ack_interrupt".
This assumes that a PHY cannot trigger interrupt unless
it has .config_intr() or .ack_interrupt() implemented.
A later patch makes the code assume both need to be
implemented for interrupts to be present.
But this PHY (which is inside a DSA) will happily
fire interrupts without either callback.
Implement dummy callbacks for .config_intr() and
.ack_interrupt() in the phy header to fix this.
Tested on the RTL8366RB on D-Link DIR-685.
Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 23 Feb 2019 23:51:51 +0000 (15:51 -0800)]
tcp: repaired skbs must init their tso_segs
syzbot reported a WARN_ON(!tcp_skb_pcount(skb))
in tcp_send_loss_probe() [1]
This was caused by TCP_REPAIR sent skbs that inadvertenly
were missing a call to tcp_init_tso_segs()
[1]
WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x45 kernel/panic.c:571
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:178 [inline]
fixup_bug arch/x86/kernel/traps.c:173 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb <0f> 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89
RSP: 0018:
ffff8880ae907c60 EFLAGS:
00010206
RAX:
ffff8880a989c340 RBX:
0000000000000000 RCX:
ffffffff85dedbdb
RDX:
0000000000000100 RSI:
ffffffff85dee0b1 RDI:
0000000000000005
RBP:
ffff8880ae907c90 R08:
ffff8880a989c340 R09:
ffffed10147d1ae1
R10:
ffffed10147d1ae0 R11:
ffff8880a3e8d703 R12:
ffff888091b90040
R13:
ffff8880a3e8d540 R14:
0000000000008000 R15:
ffff888091b90860
tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583
tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607
call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
expire_timers kernel/time/timer.c:1362 [inline]
__run_timers kernel/time/timer.c:1681 [inline]
__run_timers kernel/time/timer.c:1649 [inline]
run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
__do_softirq+0x266/0x95a kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
</IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:
ffff8880a98afd78 EFLAGS:
00000286 ORIG_RAX:
ffffffffffffff13
RAX:
1ffffffff1125061 RBX:
ffff8880a989c340 RCX:
0000000000000000
RDX:
dffffc0000000000 RSI:
0000000000000001 RDI:
ffff8880a989cbbc
RBP:
ffff8880a98afda8 R08:
ffff8880a989c340 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000001
R13:
ffffffff889282f8 R14:
0000000000000001 R15:
0000000000000000
arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555
default_idle_call+0x36/0x90 kernel/sched/idle.c:93
cpuidle_idle_call kernel/sched/idle.c:153 [inline]
do_idle+0x386/0x570 kernel/sched/idle.c:262
cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353
start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
Kernel Offset: disabled
Rebooting in 86400 seconds..
Fixes: 79861919b889 ("tcp: fix TCP_REPAIR xmit queue setup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>