Cong Wang [Tue, 24 Oct 2017 22:30:37 +0000 (15:30 -0700)]
vsock: always call vsock_init_tables()
Although CONFIG_VSOCKETS_DIAG depends on CONFIG_VSOCKETS,
vsock_init_tables() is not always called, it is called only
if other modules call its caller. Therefore if we only
enable CONFIG_VSOCKETS_DIAG, it would crash kernel on uninitialized
vsock_bind_table.
This patch fixes it by moving vsock_init_tables() to its own
module_init().
Fixes: 413a4317aca7 ("VSOCK: add sock_diag interface")
Reported-by: syzkaller bot
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Tue, 24 Oct 2017 15:14:10 +0000 (17:14 +0200)]
net: dsa: lan9303: Do not disable switch fabric port 0 at .probe
Make the LAN9303 work when lan9303_probe() is called twice.
For some unknown reason the LAN9303 switch fail to forward data when switch
fabric port 0 TX is disabled during probe. (Write of LAN9303_MAC_TX_CFG_0
in lan9303_disable_processing_port().)
In that situation the switch fabric seem to receive frames, because the ALR
is learning addresses. But no frames are transmitted on any of the ports.
In our system lan9303_probe() is called twice, first time
dsa_register_switch() return -EPROBE_DEFER. As an experiment, modified the
code to skip writing LAN9303_MAC_TX_CFG_0, port 0 during the first probe.
Then the switch works as expected.
Resolve the problem by not calling lan9303_disable_processing_port() on
port 0 during probe. Ports 1 and 2 are still disabled.
Although unsatisfying that the exact failure mechanism is not known,
the patch should not cause any harm.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Tue, 24 Oct 2017 13:58:26 +0000 (19:28 +0530)]
cxgb4: fix overflow in collecting IBQ and OBQ dump
Destination buffer already has offset added. So, don't add offset
again.
Fetch actual size of configured OBQ from hardware, instead of using
hardcoded value.
Fixes: 7c075ce221cf ("cxgb4: collect IBQ and OBQ dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Oct 2017 08:25:36 +0000 (17:25 +0900)]
Merge branch 'hns3-fixes'
Lipeng says:
====================
net: hns3: fix some bugs for HNS3 driver
This patchset fixes some bugs reported by Hisilicon test team.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Tue, 24 Oct 2017 13:02:12 +0000 (21:02 +0800)]
net: hns3: fix the bug when reuse command description in hclge_add_mac_vlan_tbl
When reusing a command description read from HW, driver should set
IN_VLD bit, WR bit and NO_INTR bit. If IN_VLD bit and NO_INTR bit
are not set, the command fails and driver prints error message:
[ 135.261284] hns3 0000:7d:00.0: cmdq execute failed for get_mac_vlan_cmd_status,status=2.
[ 135.270983] hns3 0000:7d:00.0: add mac addr failed for cmd_send, ret =-5.
This patch fixes the bug.
Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Tue, 24 Oct 2017 13:02:11 +0000 (21:02 +0800)]
net: hns3: fix a bug in hclge_uninit_client_instance
HNS3 driver initialize hdev->roce_client and vport->roce.client in
hclge_init_client_instance, and need set hdev->roce_client and
vport->roce.client NULL.
If do not set them NULL when uninit, it will fail in the scene:
insmod hns3.ko, hns-roce.ko, hns-roce-hw-v3.ko successfully, but
rmmod hns3.ko after rmmod hns-roce-hw-v2.ko and hns-roce.ko.
This patch fixes the issue.
Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Tue, 24 Oct 2017 13:02:10 +0000 (21:02 +0800)]
net: hns3: add nic_client check when initialize roce base information
Roce driver works base on HNS3 driver.If insmod Roce driver before
NIC driver there is a error because do not check nic_client. This patch
adds nic_client check when initialize roce base information.
Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Tue, 24 Oct 2017 13:02:09 +0000 (21:02 +0800)]
net: hns3: fix the bug of hns3_set_txbd_baseinfo
The SC bits of TX BD mean switch control. For this area, value 0
indicates no switch control, the packet is routed according to the
forwarding table. Value 1 indicates that the packet is transmitted
to the network bypassing the forwarding table.
As HNS3 driver need support VF later, VF conmunicate with its own
PF need forwarding table. This patch sets SC bits of TX BD 0 and use
forwarding table.
Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Oct 2017 08:05:04 +0000 (17:05 +0900)]
Merge branch 'dsa-dont-unmask-port-bitmaps'
Vivien Didelot says:
====================
net: dsa: don't unmask port bitmaps
DSA has several bitmaps to store the type of ports: cpu_port_mask,
dsa_port_mask and enabled_port_mask. But the code is inconsistently
unmasking them.
The legacy code tries to unmask cpu_port_mask and dsa_port_mask but
skips enabled_port_mask.
The new bindings unmasks cpu_port_mask and enabled_port_mask but skips
dsa_port_mask.
In fact there is no need to unmask them because we are in the error
path, and they won't be used after. Instead of fixing the unmasking,
simply remove them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 23 Oct 2017 18:17:31 +0000 (14:17 -0400)]
net: dsa: don't unmask port bitmaps
The unapply functions are called on the error path.
As for dsa_port_mask, enabled_port_mask and cpu_port_mask won't be used
after so there's no need to unmask the corresponding port bit from them.
This makes dsa_cpu_port_unapply() and dsa_dsa_port_unapply() identical,
which can be factorized later.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 23 Oct 2017 18:17:30 +0000 (14:17 -0400)]
net: dsa: legacy: don't unmask port bitmaps
The legacy code does not unmask the cpu_port_mask and dsa_port_mask as
stated. But this is done on the error path and those masks won't be used
after that. So instead of fixing the bit operation, simply remove it.
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Oct 2017 01:14:55 +0000 (10:14 +0900)]
Merge branch 'bcmgenet-start-stop-sequence-refinement'
Doug Berger says:
====================
net: bcmgenet: start/stop sequence refinement
This commit set is the result of an investigation into an issue that
occurred when bringing the interface up and down repeatedly with an
external 100BASE-T PHY. In some cases the MAC would experience mass
receive packet duplication that could in rare cases lead to a stall
from overflow. The fix for this is contained in the third commit.
The first 3 commits represent bug fixes that should be applied to the
net repository and are candidates for backporting to stable releases.
The remaining commits are enhancements which is why the set is being
submitted to net-next but they are implemented on top of the fixes.
The first fix is provided as justification for why the set isn't
split between a net submission and a net-next submission.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:19 +0000 (15:04 -0700)]
net: bcmgenet: use dev->phydev instead of priv->phydev
Now that the software reset of the PHY has been removed it is no
longer necessary to retain a private pointer to the phydev for
use when the PHY is detached (which isn't generally safe anyway).
The driver now uses the phydev member attached to the net_device.
For ethtool commands that have a PHY component, an explicit check
is made to prevent accessing an invalid phydev pointer when one
is not attached (e.g. interface is down).
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:18 +0000 (15:04 -0700)]
Revert "net: bcmgenet: Software reset EPHY after power on"
With commit
f7d72996e222 ("net: bcmgenet: enable loopback during
UniMAC sw_reset") it is no longer necessary to force the software
reset of the internal EPHY before resetting the UniMAC to ensure a
clean reset.
Therefore this commit reverts commit
5dbebbb44a6a ("net: bcmgenet:
Software reset EPHY after power on").
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:17 +0000 (15:04 -0700)]
net: bcmgenet: relax lock constraints to reduce IRQ latency
Since the ring locks are not used in a hard IRQ context it is often
not necessary to disable global IRQs while waiting on a lock.
Using less restrictive lock and unlock calls improves the real-time
responsiveness of the system.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:16 +0000 (15:04 -0700)]
net: bcmgenet: rework bcmgenet_netif_start and bcmgenet_netif_stop
This commit consolidates more common functionality from
bcmgenet_close and bcmgenet_suspend into bcmgenet_netif_stop and
modifies the start and stop sequences to better suit the design
of the GENET hardware.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:15 +0000 (15:04 -0700)]
net: bcmgenet: cleanup ring interrupt masking and unmasking
Since the NAPI interrupts are basically ignored when NAPI is
disabled we don't need to mask them within the functions
bcmgenet_disable_tx_napi() and bcmgenet_disable_rx_napi().
So wait until all NAPI instances are disabled and mask all of the
bcmgenet driver interrupts together in bcmgenet_netif_stop().
The interrupts can still be enabled in the functions
bcmgenet_enable_tx_napi() and bcmgenet_enable_rx_napi(), but use
the ring context int_enable() method to keep the functionality
consistent and the code cleaner.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:14 +0000 (15:04 -0700)]
net: bcmgenet: move NAPI initialization to ring initialization
Since each ring has its own NAPI instance it might as well be
initialized along with the other ring context.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:13 +0000 (15:04 -0700)]
net: bcmgenet: enable loopback during UniMAC sw_reset
It is necessary for the UniMAC to be clocked at least 5 cycles
while the sw_reset is asserted to ensure a clean reset.
It was discovered that this condition was not being met when
connected to an external RGMII PHY that disabled the Rx clock in
the Power Save state.
This commit modifies the reset_umac function to place the (RG)MII
interface into a local loopback mode where the Rx clock comes
from the GENET sourced Tx clk during the sw_reset to ensure the
presence and stability of the clock.
In addition, it turns out that the sw_reset of the UniMAC is not
self clearing, but this was masked by a bug in the timeout code.
The sw_reset is now explicitly cleared by zeroing the UMAC_CMD
register before returning from reset_umac which makes it no
longer necessary to do so in init_umac and makes the clearing of
CMD_TX_EN and CMD_RX_EN by umac_enable_set redundant. The
timeout code (and its associated bug) are removed so reset_umac
no longer needs to return a result, and that means init_umac
that calls reset_umac does not need to as well.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:12 +0000 (15:04 -0700)]
net: bcmgenet: prevent duplicate calls of bcmgenet_dma_teardown
When bcmgenet_dma_teardown is called from bcmgenet_fini_dma it ends
up getting called twice from the bcmgenet_close and bcmgenet_suspend
functions (once directly and once inside the bcmgenet_fini_dma call).
This commit removes the call from bcmgenet_fini_dma and ensures that
bcmgenet_dma_teardown is called before bcmgenet_fini_dma in all paths
of execution.
Fixes: 4a0c081eff43 ("net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doug Berger [Wed, 25 Oct 2017 22:04:11 +0000 (15:04 -0700)]
net: bcmgenet: correct bad merge
As noted in the net-next submission for GENETv5 support [1], there
were merge conflicts with an earlier net submission [2] that had not
yet found its way to the net-next repository.
Unfortunately, when the branches were merged the conflicts were not
correctly resolved. This commit attempts to correct that.
[1] https://lkml.org/lkml/2017/3/13/1145
[2] https://lkml.org/lkml/2017/3/9/890
Fixes: 101c431492d2 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Girish Moodalbail [Wed, 25 Oct 2017 19:26:43 +0000 (12:26 -0700)]
macvlan: remove unused fields in struct macvlan_dev
commit
635b8c8ecdd2 ("tap: Renaming tap related APIs, data structures,
macros") captured all the tap related fields into a new struct tap_dev.
However, it failed to remove those fields from struct macvlan_dev.
Those fields are currently unused and must be removed. While there
I moved the comment for MAX_TAP_QUEUES to the right place.
Fixes: 635b8c8ecdd27142 (tap: Renaming tap related APIs, data structures, macros)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven J. Hill [Wed, 25 Oct 2017 16:44:32 +0000 (11:44 -0500)]
ethernet: cavium: octeon: Switch to using netdev_info().
Signed-off-by: Steven J. Hill <Steven.Hill@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Wed, 25 Oct 2017 14:19:52 +0000 (16:19 +0200)]
tipc: eliminate KASAN warning
The following warning was reported by syzbot on Oct 24. 2017:
KASAN: slab-out-of-bounds Read in tipc_nametbl_lookup_dst_nodes
This is a harmless bug, but we still want to get rid of the warning,
so we swap the two conditions in question.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:47:00 +0000 (01:47 -0700)]
drivers/net: wan/sbni: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:46:52 +0000 (01:46 -0700)]
drivers/net: sis: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Daniele Venzano <venza@brownhat.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniele Venzano <venza@brownhat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:46:45 +0000 (01:46 -0700)]
net: atm/mpc: Stop using open-coded timer .data field
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using an explicit static variable to hold
additional expiration details.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Bhumika Goyal <bhumirks@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:46:26 +0000 (01:46 -0700)]
net: af_packet: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Jarno Rajahalme <jarno@ovn.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:46:16 +0000 (01:46 -0700)]
net: hsr: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:46:09 +0000 (01:46 -0700)]
net: dccp: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a pointer back to the sock.
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: dccp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:45:59 +0000 (01:45 -0700)]
net: ethernet/sfc: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:45:48 +0000 (01:45 -0700)]
net: LLC: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:45:39 +0000 (01:45 -0700)]
net: ax25: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Joerg Reuter <jreuter@yaina.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Tue, 24 Oct 2017 08:45:31 +0000 (01:45 -0700)]
net: sctp: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Oct 2017 01:54:40 +0000 (10:54 +0900)]
Merge branch 'net-remove-rtmsg_ifinfo-used-in-bridge-and-bonding'
Xin Long says:
====================
net: remove rtmsg_ifinfo used in bridge and bonding
It's better to send notifications to userspace by the events in
rtnetlink_event, instead of calling rtmsg_ifinfo directly.
This patcheset is to remove rtmsg_ifinfo called in bonding and
bridge, the notifications can be handled by NETDEV_CHANGEUPPER
and NETDEV_CHANGELOWERSTATE events in rtnetlink_event.
It could also fix some redundant notifications from bonding and
bridge.
v1->v2:
- post to net-next.git instead of net.git, for it's more like an
improvement for bonding
v2->v3:
- add patch 1/4 to remove rtmsg_ifinfo called in add_del_if
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 24 Oct 2017 05:54:20 +0000 (13:54 +0800)]
bonding: remove rtmsg_ifinfo called after bond_lower_state_changed
After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event
process back to rtnetlink_event', bond_lower_state_changed would
generate NETDEV_CHANGEUPPER event which would send a notification
to userspace in rtnetlink_event.
There's no need to call rtmsg_ifinfo to send the notification
any more. So this patch is to remove it from these places after
bond_lower_state_changed.
Besides, after this, rtmsg_ifinfo is not needed to be exported.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 24 Oct 2017 05:54:19 +0000 (13:54 +0800)]
rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event
This patch is to bring NETDEV_CHANGELOWERSTATE event process back
to rtnetlink_event so that bonding could use it instead of calling
rtmsg_ifinfo to send a notification to userspace after netdev lower
state is changed in the later patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 24 Oct 2017 05:54:18 +0000 (13:54 +0800)]
bonding: remove rtmsg_ifinfo called in bond_master_upper_dev_link
Since commit
42e52bf9e3ae ("net: add netnotifier event for upper device
change"), netdev_master_upper_dev_link has generated NETDEV_CHANGEUPPER
event which would send a notification to userspace in rtnetlink_event.
There's no need to call rtmsg_ifinfo to send the notification any more.
So this patch is to remove it from bond_master_upper_dev_link as well
as bond_upper_dev_unlink to avoid the redundant notifications.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 24 Oct 2017 05:54:17 +0000 (13:54 +0800)]
bridge: remove rtmsg_ifinfo called in add_del_if
Since commit
dc709f375743 ("rtnetlink: bring NETDEV_CHANGEUPPER event
process back in rtnetlink_event"), rtnetlink_event would process the
NETDEV_CHANGEUPPER event and send a notification to userspace.
In add_del_if, it would generate NETDEV_CHANGEUPPER event by whether
netdev_master_upper_dev_link or netdev_upper_dev_unlink. There's
no need to call rtmsg_ifinfo to send the notification any more.
So this patch is to remove it from add_del_if also to avoid redundant
notifications.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Oct 2017 01:47:47 +0000 (10:47 +0900)]
Merge branch 'bpf-permit-multiple-bpf-attachments-for-a-single-perf-tracepoint-event'
Yonghong Song says:
====================
bpf: permit multiple bpf attachments for a single perf tracepoint event
This patch set adds support to permit multiple bpf prog attachments
for a single perf tracepoint event. Patch 1 does some cleanup such
that perf_event_{set|free}_bpf_handler is called under the
same condition. Patch 2 has the core implementation, and
Patch 3 adds a test case.
Changelogs:
v2 -> v3:
. fix compilation error.
v1 -> v2:
. fix a potential deadlock issue discovered by Daniel.
. fix some coding style issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 24 Oct 2017 06:53:09 +0000 (23:53 -0700)]
bpf: add a test case to test single tp multiple bpf attachment
The bpf sample program syscall_tp is modified to
show attachment of more than bpf programs
for a particular kernel tracepoint.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 24 Oct 2017 06:53:08 +0000 (23:53 -0700)]
bpf: permit multiple bpf attachments for a single perf event
This patch enables multiple bpf attachments for a
kprobe/uprobe/tracepoint single trace event.
Each trace_event keeps a list of attached perf events.
When an event happens, all attached bpf programs will
be executed based on the order of attachment.
A global bpf_event_mutex lock is introduced to protect
prog_array attaching and detaching. An alternative will
be introduce a mutex lock in every trace_event_call
structure, but it takes a lot of extra memory.
So a global bpf_event_mutex lock is a good compromise.
The bpf prog detachment involves allocation of memory.
If the allocation fails, a dummy do-nothing program
will replace to-be-detached program in-place.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Tue, 24 Oct 2017 06:53:07 +0000 (23:53 -0700)]
bpf: use the same condition in perf event set/free bpf handler
This is a cleanup such that doing the same check in
perf_event_free_bpf_prog as we already do in
perf_event_set_bpf_prog step.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Fri, 20 Oct 2017 21:25:15 +0000 (00:25 +0300)]
ip6_tunnel: Allow rcv/xmit even if remote address is a local address
Currently, ip6_tnl_xmit_ctl drops tunneled packets if the remote
address (outer v6 destination) is one of host's locally configured
addresses.
Same applies to ip6_tnl_rcv_ctl: it drops packets if the remote address
(outer v6 source) is a local address.
This prevents using ipxip6 (and ip6_gre) tunnels whose local/remote
endpoints are on same host; OTOH v4 tunnels (ipip or gre) allow such
configurations.
An example where this proves useful is a system where entities are
identified by their unique v6 addresses, and use tunnels to encapsulate
traffic between them. The limitation prevents placing several entities
on same host.
Introduce IP6_TNL_F_ALLOW_LOCAL_REMOTE which allows to bypass this
restriction.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 10:07:13 +0000 (19:07 +0900)]
Merge branch 'mlxsw-Various-fixes'
Jiri Pirko says:
====================
mlxsw: Various fixes
Yotam says:
Cleanup some issues reported by sparse.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 24 Oct 2017 09:17:17 +0000 (11:17 +0200)]
mlxsw: spectrum: mr_tcam: Include the mr_tcam header file
Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.
Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?
Fixes: 0e14c7777acb6 ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 24 Oct 2017 09:17:16 +0000 (11:17 +0200)]
mlxsw: spectrum: mr: Make the function mlxsw_sp_mr_dev_vif_lookup static
The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.
Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?
Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yotam Gigi [Tue, 24 Oct 2017 09:17:15 +0000 (11:17 +0200)]
mlxsw: spectrum: mr: Fix various endianness issues
Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.
Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
base types)
spectrum_mr.c:212:31: expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31: got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
base types)
spectrum_mr.c:214:32: expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32: got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer
Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Tue, 24 Oct 2017 08:11:42 +0000 (10:11 +0200)]
mlxsw: spectrum_dpipe: Fix entries dump of the adjacency table
During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.
Fixes: 190d38a52a73 ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 24 Oct 2017 05:58:02 +0000 (08:58 +0300)]
net/sched: Fix actions list corruption when adding offloaded tc flows
Prior to commit
b3f55bdda8df, the networking core doesn't wire an in-place
actions list the when the low level driver is called to offload the flow,
but all low level drivers do that (call tcf_exts_to_list()) in their
offloading "add" logic.
Now, the in-place list is set in the core which goes over the list in a loop,
but also by the hw driver when their offloading code is invoked indirectly:
cls_xxx add flow -> tc_setup_cb_call -> tc_exts_setup_cb_egdev_call -> hw driver
which messes up the core list instance upon driver return. Fix that by avoiding
in-place list on the net core code that deals with adding flows.
Fixes: b3f55bdda8df ('net: sched: introduce per-egress action device callbacks')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veerasenareddy Burru [Tue, 24 Oct 2017 03:33:25 +0000 (20:33 -0700)]
liquidio: pass date and time info to NIC firmware
Pass date and time information to NIC at the time of loading
firmware and periodically update the host time to NIC firmware.
This is to make NIC firmware use the same time reference as Host,
so that it is easy to correlate logs from firmware and host for debugging.
Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Mon, 23 Oct 2017 21:59:35 +0000 (14:59 -0700)]
ipv6: add ip6_null_entry check in rt6_select()
In rt6_select(), fn->leaf could be pointing to net->ipv6.ip6_null_entry.
In this case, we should directly return instead of trying to carry on
with the rest of the process.
If not, we could crash at:
spin_lock_bh(&leaf->rt6i_table->rt6_lock);
because net->ipv6.ip6_null_entry does not have rt6i_table set.
Syzkaller recently reported following issue on net-next:
Use struct sctp_sack_info instead
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
sctp: [Deprecated]: syz-executor4 (pid 26496) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
CPU: 1 PID: 26523 Comm: syz-executor6 Not tainted 4.14.0-rc4+ #85
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task:
ffff8801d147e3c0 task.stack:
ffff8801a4328000
RIP: 0010:debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
RIP: 0010:do_raw_spin_lock+0x23/0x1e0 kernel/locking/spinlock_debug.c:112
RSP: 0018:
ffff8801a432ed70 EFLAGS:
00010207
RAX:
dffffc0000000000 RBX:
0000000000000018 RCX:
0000000000000000
RDX:
0000000000000003 RSI:
0000000000000000 RDI:
000000000000001c
RBP:
ffff8801a432ed90 R08:
0000000000000001 R09:
0000000000000000
R10:
0000000000000000 R11:
ffffffff8482b279 R12:
ffff8801ce2ff3a0
sctp: [Deprecated]: syz-executor1 (pid 26546) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
R13:
dffffc0000000000 R14:
ffff8801d971e000 R15:
ffff8801ce2ff0d8
FS:
00007f56e82f5700(0000) GS:
ffff8801db300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001ddbc22000 CR3:
00000001a4a04000 CR4:
00000000001406e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
_raw_spin_lock_bh+0x39/0x40 kernel/locking/spinlock.c:175
spin_lock_bh include/linux/spinlock.h:321 [inline]
rt6_select net/ipv6/route.c:786 [inline]
ip6_pol_route+0x1be3/0x3bd0 net/ipv6/route.c:1650
sctp: [Deprecated]: syz-executor1 (pid 26576) Use of int in maxseg socket option.
Use struct sctp_assoc_value instead
TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies. Check SNMP counters.
ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1843
fib6_rule_lookup+0x9e/0x2a0 net/ipv6/ip6_fib.c:309
ip6_route_output_flags+0x1f1/0x2b0 net/ipv6/route.c:1871
ip6_route_output include/net/ip6_route.h:80 [inline]
ip6_dst_lookup_tail+0x4ea/0x970 net/ipv6/ip6_output.c:953
ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1076
sctp_v6_get_dst+0x675/0x1c30 net/sctp/ipv6.c:274
sctp_transport_route+0xa8/0x430 net/sctp/transport.c:287
sctp_assoc_add_peer+0x4fe/0x1100 net/sctp/associola.c:656
__sctp_connect+0x251/0xc80 net/sctp/socket.c:1187
sctp_connect+0xb4/0xf0 net/sctp/socket.c:4209
inet_dgram_connect+0x16b/0x1f0 net/ipv4/af_inet.c:541
SYSC_connect+0x20a/0x480 net/socket.c:1642
SyS_connect+0x24/0x30 net/socket.c:1623
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Paasch [Mon, 23 Oct 2017 20:22:23 +0000 (13:22 -0700)]
tcp: Configure TFO without cookie per socket and/or per route
We already allow to enable TFO without a cookie by using the
fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
TFO_CLIENT_NO_COOKIE).
This is safe to do in certain environments where we know that there
isn't a malicous host (aka., data-centers) or when the
application-protocol already provides an authentication mechanism in the
first flight of data.
A server however might be providing multiple services or talking to both
sides (public Internet and data-center). So, this server would want to
enable cookie-less TFO for certain services and/or for connections that
go to the data-center.
This patch exposes a socket-option and a per-route attribute to enable such
fine-grained configurations.
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Hansen [Mon, 23 Oct 2017 19:35:58 +0000 (15:35 -0400)]
net/sock: Update sk rcu iterator macro.
Mark hlist node in sk rcu iterator as protected by the rcu.
hlist_next_rcu accomplishes this and silences the warnings
sparse throws.
Found with make C=1 net/ipv4/udp.o on linux-next tag
next-
20171009.
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 23 Oct 2017 18:10:56 +0000 (13:10 -0500)]
ipv4: tcp_minisocks: use BUG_ON instead of if condition followed by BUG
Use BUG_ON instead of if condition followed by BUG in tcp_time_wait.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Mon, 23 Oct 2017 18:08:14 +0000 (13:08 -0500)]
ipv4: icmp: use BUG_ON instead of if condition followed by BUG
Use BUG_ON instead of if condition followed by BUG in icmp_timestamp.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 23 Oct 2017 17:39:28 +0000 (19:39 +0200)]
bpf: cpumap fix potential lost wake-up problem
As pointed out by Michael, commit
1c601d829ab0 ("bpf: cpumap xdp_buff
to skb conversion and allocation") contains a classical example of the
potential lost wake-up problem.
We need to recheck the condition __ptr_ring_empty() after changing
current->state to TASK_INTERRUPTIBLE, this avoids a race between
wake_up_process() and schedule(). After this, a race with
wake_up_process() will simply change the state to TASK_RUNNING, and
the schedule() call not really put us to sleep.
Fixes: 1c601d829ab0 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Sun, 22 Oct 2017 01:35:30 +0000 (20:35 -0500)]
net: smc_close: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 19 Oct 2017 21:54:48 +0000 (16:54 -0500)]
net: rxrpc: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 08:54:20 +0000 (17:54 +0900)]
Merge branch 'ipv6-addrconf-hash-improvements-and-cleanups'
Eric Dumazet says:
====================
ipv6: addrconf: hash improvements and cleanups
Remove unecessary BH blocking, and bring IPv6 addrconf to modern world,
with per netns hash perturbation and decent hash size.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:51 +0000 (16:17 -0700)]
ipv6: addrconf: do not block BH in ipv6_chk_home_addr()
rcu_read_lock() is enough here, no need to block BH.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:50 +0000 (16:17 -0700)]
ipv6: addrconf: do not block BH in /proc/net/if_inet6 handling
Table is really RCU protected, no need to block BH
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:49 +0000 (16:17 -0700)]
ipv6: addrconf: do not block BH in ipv6_get_ifaddr()
rcu_read_lock() is enough here, no need to block BH.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:48 +0000 (16:17 -0700)]
ipv6: addrconf: do not block BH in ipv6_chk_addr_and_flags()
rcu_read_lock() is enough here, as inet6_ifa_finish_destroy()
uses kfree_rcu()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:47 +0000 (16:17 -0700)]
ipv6: addrconf: add per netns perturbation in inet6_addr_hash()
Bring IPv6 in par with IPv4 :
- Use net_hash_mix() to spread addresses a bit more.
- Use 256 slots hash table instead of 16
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:46 +0000 (16:17 -0700)]
ipv6: addrconf: factorize inet6_addr_hash() call
ipv6_add_addr_hash() can compute the hash value outside of
locked section and pass it to ipv6_chk_same_addr().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 23 Oct 2017 23:17:45 +0000 (16:17 -0700)]
ipv6: addrconf: move ipv6_chk_same_addr() to avoid forward declaration
ipv6_chk_same_addr() is only used by ipv6_add_addr_hash(),
so moving it avoids a forward declaration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 08:38:38 +0000 (17:38 +0900)]
Merge branch 'nfp-bpf-stack-support-in-offload'
Jakub Kicinski says:
====================
nfp: bpf: stack support in offload
This series brings stack support for offload.
We use the LMEM (Local memory) register file as memory to store
the stack. Since this is a register file we need to do appropriate
shifts on unaligned accesses. Verifier's state tracking helps us
with that.
LMEM can't be accessed directly, so we add support for setting
pointer registers through which one can read/write LMEM.
This set does not support accessing the stack when the alignment
is not known. This can be added later (most likely using the byte_align
instructions). There is also a number of optimizations which have been
left out:
- in more complex non aligned accesses, double shift and rotation
can save us a cycle. This, however, leads to code explosion
since all access sizes have to be coded separately;
- since setting LM pointers costs around 5 cycles, we should be
tracking their values to make sure we don't move them when
they're already set correctly for earlier access;
- in case of 8 byte access aligned to 4 bytes and crossing
32 byte boundary but not crossing a 64 byte boundary we don't
have to increment the pointer, but this seems like a pretty
rare case to justify the added complexity.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:14 +0000 (11:58 -0700)]
nfp: bpf: optimize mov64 a little
Loading 64bit constants require up to 4 load immediates, since
we can only load 16 bits at a time. If the 32bit halves of
the 64bit constant are the same, however, we can save a cycle
by doing a register move instead of two loads of 16 bits.
Note that we don't optimize the normal ALU64 load because even
though it's a 64 bit load the upper half of the register is
a coming from sign extension so we can load it in one cycle
anyway.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:13 +0000 (11:58 -0700)]
nfp: bpf: support stack accesses via non-constant pointers
If stack pointer has a different value on different paths
but the alignment to words (4B) remains the same, we can
set a new LMEM access pointer to the calculated value and
access whichever word it's pointing to.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:12 +0000 (11:58 -0700)]
nfp: bpf: support accessing the stack beyond 64 bytes
To access beyond 64th byte of the stack we need to set a new
stack pointer register (LMEM is accessed indirectly through
those pointers). Add a function for encoding local CSR access
instruction. Use stack pointer number 3.
Note that stack pointer registers allow us to index into 32
bytes of LMEM (with shift operations i.e. when operands are
restricted). This means if access is crossing 32 byte boundary
we must not use offsetting, we have to set the pointer to the
exact address and move it with post-increments.
We depend on the datapath placing the stack base address in
GPR A22 for our use.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:11 +0000 (11:58 -0700)]
nfp: bpf: allow stack accesses via modified stack registers
As long as the verifier tells us the stack offset exactly we
can render the LMEM reads quite easily. Simply make sure that
the offset is constant for a given instruction and add it to
the instruction's offset.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:10 +0000 (11:58 -0700)]
nfp: bpf: optimize the RMW for stack accesses
When we are performing unaligned stack accesses in the 32-64B window
we have to do a read-modify-write cycle. E.g. for reading 8 bytes
from address 17:
0: tmp = stack[16]
1: gprLo = tmp >> 8
2: tmp = stack[20]
3: gprLo |= tmp << 24
4: tmp = stack[20]
5: gprHi = tmp >> 8
6: tmp = stack[24]
7: gprHi |= tmp << 24
The load on line 4 is unnecessary, because tmp already contains data
from stack[20].
For write we can optimize both loads and writebacks away.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:09 +0000 (11:58 -0700)]
nfp: bpf: add stack read support
Add simple stack read support, similar to write in every aspect,
but data flowing the other way. Note that unlike write which can
be done in smaller than word quantities, if registers are loaded
with less-than-word of stack contents - the values have to be
zero extended.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:08 +0000 (11:58 -0700)]
nfp: bpf: add stack write support
Stack is implemented by the LMEM register file. Unaligned accesses
to LMEM are not allowed. Accesses also have to be 4B wide.
To support stack we need to make sure offsets of pointers are known
at translation time (for now) and perform correct load/mask/shift
operations.
Since we can access first 64B of LMEM without much effort support
only stacks not bigger than 64B. Following commits will extend
the possible sizes beyond that.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:07 +0000 (11:58 -0700)]
nfp: bpf: refactor nfp_bpf_check_ptr()
nfp_bpf_check_ptr() mostly looks at the pointer register.
Add a temporary variable to shorten the code.
While at it make sure we print error messages if translation
fails to help users identify the problem (to be carried in
ext_ack in due course).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 23 Oct 2017 18:58:06 +0000 (11:58 -0700)]
nfp: bpf: add helper for emitting nops
The need to emitting a few nops will become more common soon
as we add stack and map support. Add a helper. This allows
for code to be shorter but also may be handy for marking the
nops with a "reason" to ease applying optimizations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 00:25:10 +0000 (01:25 +0100)]
Merge branch 'bpftool-JSON'
Jakub Kicinski says:
====================
tools: bpftool: Add JSON output to bpftool
Quentin says:
This series introduces support for JSON output to all bpftool commands. It
adds option parsing, and several options are created:
* -j, --json Switch to JSON output.
* -p, --pretty Switch to JSON and print it in a human-friendly fashion.
* -h, --help Print generic help message.
* -V, --version Print version number.
This code uses a "json_writer", which is a copy of the one written by
Stephen Hemminger in iproute2.
---
I don't know if there is an easy way to share the code for json_write
without copying the file, so I am very open to suggestions on this matter.
====================
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:16 +0000 (09:24 -0700)]
tools: bpftool: update documentation for --json and --pretty usage
Update the documentation to provide help about JSON output generation,
and add an example in bpftool-prog manual page.
Also reintroduce an example that was left aside when the tool was moved
from GitHub to the kernel sources, in order to show how to mount the
bpffs file system (to pin programs) inside the bpftool-prog manual page.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:15 +0000 (09:24 -0700)]
tools: bpftool: add cosmetic changes for the manual pages
Make the look-and-feel of the manual pages somewhat closer to other
manual pages, such as the ones from the utilities from iproute2, by
highlighting more keywords.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:14 +0000 (09:24 -0700)]
tools: bpftool: provide JSON output for all possible commands
As all commands can now return JSON output (possibly just a "null"
value), output of `bpftool --json batch file FILE` should also be fully
JSON compliant.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:13 +0000 (09:24 -0700)]
tools: bpftool: turn err() and info() macros into functions
Turn err() and info() macros into functions.
In order to avoid naming conflicts with variables in the code, rename
them as p_err() and p_info() respectively.
The behavior of these functions is similar to the one of the macros for
plain output. However, when JSON output is requested, these macros
return a JSON-formatted "error" object instead of printing a message to
stderr.
To handle error messages correctly with JSON, a modification was brought
to their behavior nonetheless: the functions now append a end-of-line
character at the end of the message. This way, we can remove end-of-line
characters at the end of the argument strings, and not have them in the
JSON output.
All error messages are formatted to hold in a single call to p_err(), in
order to produce a single JSON field.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:12 +0000 (09:24 -0700)]
tools: bpftool: add JSON output for `bpftool batch file FILE` command
`bpftool batch file FILE` takes FILE as an argument and executes all the
bpftool commands it finds inside (or stops if an error occurs).
To obtain a consistent JSON output, create a root JSON array, then for
each command create a new object containing two fields: one with the
command arguments, the other with the output (which is the JSON object
that the command would have produced, if called on its own).
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:11 +0000 (09:24 -0700)]
tools: bpftool: add JSON output for `bpftool map *` commands
Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on
`bpftool map { show | dump | lookup | getnext }` commands. Remaining
commands produce no output.
Some functions have been spit into plain-output and JSON versions in
order to remain readable.
Outputs for sample maps have been successfully tested against a JSON
validator.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:10 +0000 (09:24 -0700)]
tools: bpftool: add JSON output for `bpftool prog dump xlated *` command
Add a new printing function to dump translated eBPF instructions as
JSON. As for plain output, opcodes are printed only on request (when
`opcodes` is provided on the command line).
The disassembled output is generated by the same code that is used by
the kernel verifier.
Example output:
$ bpftool --json --pretty prog dump xlated id 1
[{
"disasm": "(bf) r6 = r1"
},{
"disasm": "(61) r7 = *(u32 *)(r6 +16)"
},{
"disasm": "(95) exit"
}
]
$ bpftool --json --pretty prog dump xlated id 1 opcodes
[{
"disasm": "(bf) r6 = r1",
"opcodes": {
"code": "0xbf",
"src_reg": "0x1",
"dst_reg": "0x6",
"off": ["0x00","0x00"
],
"imm": ["0x00","0x00","0x00","0x00"
]
}
},{
"disasm": "(61) r7 = *(u32 *)(r6 +16)",
"opcodes": {
"code": "0x61",
"src_reg": "0x6",
"dst_reg": "0x7",
"off": ["0x10","0x00"
],
"imm": ["0x00","0x00","0x00","0x00"
]
}
},{
"disasm": "(95) exit",
"opcodes": {
"code": "0x95",
"src_reg": "0x0",
"dst_reg": "0x0",
"off": ["0x00","0x00"
],
"imm": ["0x00","0x00","0x00","0x00"
]
}
}
]
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:09 +0000 (09:24 -0700)]
tools: bpftool: add JSON output for `bpftool prog dump jited *` command
Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on `bpftool prog show *` commands.
A new printing function is created to be passed as an argument to the
disassembler.
Similarly to plain output, opcodes are printed on request.
Outputs from sample programs have been successfully tested against a
JSON validator.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:08 +0000 (09:24 -0700)]
tools: bpftool: add JSON output for `bpftool prog show *` command
Reuse the json_writer API introduced in an earlier commit to make
bpftool able to generate JSON output on `bpftool prog show *` commands.
For readability, the code from show_prog() has been split into two
functions, one for plain output, one for JSON.
Outputs from sample programs have been successfully tested against a
JSON validator.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:07 +0000 (09:24 -0700)]
tools: bpftool: introduce --json and --pretty options
These two options can be used to ask for a JSON output (--j or -json),
and to make this JSON human-readable (-p or --pretty).
A json_writer object is created when JSON is required, and will be used
in follow-up commits to produce JSON output.
Note that --pretty implies --json.
Update for the manual pages and interactive help messages comes in a
later patch of the series.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:06 +0000 (09:24 -0700)]
tools: bpftool: add option parsing to bpftool, --help and --version
Add an option parsing facility to bpftool, in prevision of future
options for demanding JSON output. Currently, two options are added:
--help and --version, that act the same as the respective commands
`help` and `version`.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Mon, 23 Oct 2017 16:24:05 +0000 (09:24 -0700)]
tools: bpftool: copy JSON writer from iproute2 repository
In prevision of following commits, supposed to add JSON output to the
tool, two files are copied from the iproute2 repository (taken at commit
268a9eee985f): lib/json_writer.c and include/json_writer.h.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 00:21:26 +0000 (01:21 +0100)]
Merge branch 'tcp-tracepoints'
Song Liu says:
====================
net: add a set of tracepoints to tcp stack
Changes from v1:
Fix build error (with ipv6 as ko) by adding EXPORT_TRACEPOINT_SYMBOL_GPL
for trace_tcp_send_reset.
These patches add the following tracepoints to tcp stack.
tcp_send_reset
tcp_receive_reset
tcp_destroy_sock
tcp_set_state
These tracepoints can be used to track TCP state changes. Such state
changes include but are not limited to: connection establish,
connection termination, tx and rx of RST, various retransmits.
Currently, we use the following kprobes to trace these events:
int kprobe__tcp_validate_incoming
int kprobe__tcp_send_active_reset
int kprobe__tcp_v4_send_reset
int kprobe__tcp_v6_send_reset
int kprobe__tcp_v4_destroy_sock
int kprobe__tcp_set_state
int kprobe__tcp_retransmit_skb
These tracepoints will help us simplify this work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:27 +0000 (09:20 -0700)]
tcp: add tracepoint trace_tcp_set_state()
This patch adds tracepoint trace_tcp_set_state. Besides usual fields
(s/d ports, IP addresses), old and new state of the socket is also
printed with TP_printk, with __print_symbolic().
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:26 +0000 (09:20 -0700)]
tcp: add tracepoint trace_tcp_destroy_sock
This patch adds trace event trace_tcp_destroy_sock.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:25 +0000 (09:20 -0700)]
tcp: add tracepoint trace_tcp_receive_reset
New tracepoint trace_tcp_receive_reset is added and called from
tcp_reset(). This tracepoint is define with a new class tcp_event_sk.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:24 +0000 (09:20 -0700)]
tcp: add tracepoint trace_tcp_send_reset
New tracepoint trace_tcp_send_reset is added and called from
tcp_v4_send_reset(), tcp_v6_send_reset() and tcp_send_active_reset().
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:23 +0000 (09:20 -0700)]
tcp: mark trace event arguments sk and skb as const
Some functions that we plan to add trace points require const sk
and/or skb. So we mark these fields as const in the tracepoint.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Song Liu [Mon, 23 Oct 2017 16:20:22 +0000 (09:20 -0700)]
tcp: add trace event class tcp_event_sk_skb
Introduce event class tcp_event_sk_skb for tcp tracepoints that
have arguments sk and skb.
Existing tracepoint trace_tcp_retransmit_skb() falls into this class.
This patch rewrites the definition of trace_tcp_retransmit_skb() with
tcp_event_sk_skb.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Oct 2017 00:16:42 +0000 (01:16 +0100)]
Merge branch 'hns3-next'
Lipeng says:
====================
net: hns3: bug fixes & code improvements
This patchset introduces various HNS3 bug fixes, optimizations and code
improvements.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Mon, 23 Oct 2017 11:51:07 +0000 (19:51 +0800)]
net: hns3: fix a bug about hns3_clean_tx_ring
The return value of hns3_clean_tx_ring means tx ring clean result.
Return true means clean complete and there is no more pakcet need
clean. Retrun false means there is packets need clean and napi need
poll again. The last return of hns3_clean_tx_ring is
"return !!budget" as budget will decrease when clean a buffer.
If there is no valid BD in TX ring, return 0 for hns3_clean_tx_ring
will cause napi poll again and never complete the napi poll. This
patch fixes the bug.
Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lipeng [Mon, 23 Oct 2017 11:51:06 +0000 (19:51 +0800)]
net: hns3: remove redundant memset when alloc buffer
HW will use packet length to write packets to buffer or read
packets from buffer. There is a redundant memset when alloc buffer,
the memset have no sense and will increase time-consuming.
This patch removes it.
Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC)
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>