git.openwrt.org Git - openwrt/staging/blogic.git/log

projects / openwrt / staging / blogic.git / log

Sunil Goutham [Mon, 27 Jan 2020 13:05:25 +0000 (18:35 +0530)]

octeontx2-pf: Receive side scaling support

Adds receive side scaling (RSS) support to distribute
pkts/flows across multiple queues. Sets up key, indirection
table etc. Also added extraction of HW calculated rxhash and
adding to same to SKB ie NETIF_F_RXHASH offload support.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Geetha sowjanya [Mon, 27 Jan 2020 13:05:24 +0000 (18:35 +0530)]

octeontx2-pf: Error handling support

HW reports many errors on the receive and transmit paths.
Such as incorrect queue configuration, pkt transmission errors,
LMTST instruction errors, transmit queue full etc. These are reported
via QINT interrupt. Most of the errors are fatal and needs
reinitialization.

Also added support to allocate receive buffers in non-atomic context
when allocation fails in NAPI context.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:23 +0000 (18:35 +0530)]

octeontx2-pf: MTU, MAC and RX mode config support

This patch addes support to change interface MTU, MAC address
retrieval and config, RX mode ie unicast, multicast and promiscuous.
Also added link loopback support

Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linu Cherian [Mon, 27 Jan 2020 13:05:22 +0000 (18:35 +0530)]

octeontx2-pf: Register and handle link notifications

PF and AF (admin function) shares 64KB of reserved memory region for
communication. This region is shared for
- Messages sent by PF and responses sent by AF.
- Notifications sent by AF and ACKs sent by PF.

This patch adds infrastructure to handle notifications sent
by AF and adds handlers to process them.

One of the main usecase of notifications from AF is physical
link changes. So this patch adds registration of PF with AF
to receive link status change notifications and also adds
the handler for that notification.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:21 +0000 (18:35 +0530)]

octeontx2-pf: Add packet transmission support

This patch adds the packet transmission support.
For a given skb prepares send queue descriptors (SQEs) and pushes them
to HW. Here driver doesn't maintain it's own SQ rings, SQEs are pushed
to HW using a silicon specific operations called LMTST. From the
instuction HW derives the transmit queue number and queues the SQE to
that queue. These LMTST instructions are designed to avoid queue
maintenance in SW and lockless behavior ie when multiple cores are trying
to add SQEs to same queue then HW will takecare of serialization, no need
for SW to hold locks.

Also supports scatter/gather.

Co-developed-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:20 +0000 (18:35 +0530)]

octeontx2-pf: Receive packet handling support

Added receive packet handling (NAPI) support, error stats, RX_ALL
capability config option to passon error pkts to stack upon user request.

In subsequent patches these error stats will be added to ethttool.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:19 +0000 (18:35 +0530)]

octeontx2-pf: Setup interrupts and NAPI handler

Completion queue (CQ) is the one with which HW notifies SW on a packet
reception or transmission. Each of the RQ and SQ are mapped to a unique
CQ and again both CQs are mapped to same interrupt ie the CINT. So that
each core has one interrupt source in whose handler both Rx and Tx
notifications are processed.

Also
- Registered a NAPI handler for the CINT.
- Setup coalescing parameters.
- IRQ affinity hints etc

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:18 +0000 (18:35 +0530)]

octeontx2-pf: Initialize and config queues

This patch does the initialization of all queues ie the
receive buffer pools, receive and transmit queues, completion
or notification queues etc. Allocates all required resources
(eg transmit schedulers, receive buffers etc) and configures
them for proper functioning of queues. Also sets up receive
queue's RED dropping levels.

Co-developed-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:17 +0000 (18:35 +0530)]

octeontx2-pf: Attach NIX and NPA block LFs

For a PF to function as a NIC, NPA (for Rx buffers, Tx descriptors etc)
and NIX (for rcv, send and completion queues) are the minimum resources
needed. So request admin function (AF) to attach one each of NIX and NPA
block LFs (local functions).

Only AF can configure a LF's contexts, so request AF to allocate memory
for NPA aura/pool and NIX RQ/SQ/CQ HW contexts. Upon receiving response,
save some of the HW constants like number of pointers per stack page,
size of send queue buffer (SQBs, where SQEs are queued by HW) e.t.c which
are later used to initialize queues.

A HW context here is like a state machine maintained for a descriptor
queue. eg size, head/tail pointers, irq etc etc. HW maintains this in
memory.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:16 +0000 (18:35 +0530)]

octeontx2-pf: Mailbox communication with AF

In the resource virtualization unit (RVU) each of the PF and AF
(admin function) share a 64KB of reserved memory region for
communication. This patch initializes PF <=> AF mailbox IRQs,
registers handlers for processing these communication messages.
Also adds support to process these messages in both directions
ie responses to PF initiated DOWN (PF => AF) messages and AF
initiated UP messages (AF => PF).

Mbox communication APIs and message formats are defined in AF driver
(drivers/net/ethernet/marvell/octeontx2/af), mbox.h from AF driver is
included here to avoid duplication.

Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Christina Jacob <cjacob@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sunil Goutham [Mon, 27 Jan 2020 13:05:15 +0000 (18:35 +0530)]

octeontx2-pf: Add Marvell OcteonTX2 NIC driver

This patch adds template for the Marvell's OcteonTX2 network
controller's physical function driver. Just the probe, PCI
specific initialization and netdev registration.

Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 13:31:40 +0000 (14:31 +0100)]

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2020-01-27

The following pull-request contains BPF updates for your *net-next* tree.

We've added 20 non-merge commits during the last 5 day(s) which contain
a total of 24 files changed, 433 insertions(+), 104 deletions(-).

The main changes are:

1) Make BPF trampolines and dispatcher aware for the stack unwinder, from Jiri Olsa.

2) Improve handling of failed CO-RE relocations in libbpf, from Andrii Nakryiko.

3) Several fixes to BPF sockmap and reuseport selftests, from Lorenz Bauer.

4) Various cleanups in BPF devmap's XDP flush code, from John Fastabend.

5) Fix BPF flow dissector when used with port ranges, from Yoshiki Komachi.

6) Fix bpffs' map_seq_next callback to always inc position index, from Vasily Averin.

7) Allow overriding LLVM tooling for runqslower utility, from Andrey Ignatov.

8) Silence false-positive lockdep splats in devmap hash lookup, from Amol Grover.

9) Fix fentry/fexit selftests to initialize a variable before use, from John Sperbeck.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 12:49:33 +0000 (13:49 +0100)]

Revert "pktgen: Allow configuration of IPv6 source address range"

This reverts commit 7786a1af2a6bceb07860ec720e74714004438834.

It causes build failures on 32-bit, for example:

net/core/pktgen.o: In function `mod_cur_headers':
>> pktgen.c:(.text.mod_cur_headers+0xba0): undefined reference to `__umoddi3'

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Leon Romanovsky [Mon, 27 Jan 2020 07:20:28 +0000 (09:20 +0200)]

net/core: Replace driver version to be kernel version

In order to stop useless driver version bumps and unify output
presented by ethtool -i, let's set default version string.

As Linus said in [1]: "Things are supposed to be backwards and
forwards compatible, because we don't accept breakage in user
space anyway. So versioning is pointless, and only causes
problems."

They cause problems when users start to see version changes
and expect specific set of features which will be different
for stable@, vanilla and distribution kernels.

Distribution kernels are based on some kernel version with extra
patches on top, for example, in RedHat world this "extra" is a lot
and for them your driver version say nothing. Users who run vanilla
kernels won't use driver version information too, because running
such kernels requires knowledge and understanding.

Another set of problems are related to difference in versioning scheme
and such doesn't allow to write meaningful automation which will work
sanely on all ethtool capable devices.

Before this change:
[leonro@erver ~]$ ethtool -i eth0
driver: virtio_net
version: 1.0.0
After this change and once ->version assignment will be deleted
from virtio_net:
[leonro@server ~]$ ethtool -i eth0
driver: virtio_net
version: 5.5.0-rc6+

Link: https://lore.kernel.org/ksummit-discuss/CA+55aFx9A=5cc0QZ7CySC4F2K7eYaEfzkdYEc9JaNgCcV25=rg@mail.gmail.com/
Link: https://lore.kernel.org/linux-rdma/20200122152627.14903-1-michal.kalderon@marvell.com/T/#md460ff8f976c532a89d6860411c3c50bb811038b
Link: https://lore.kernel.org/linux-rdma/20200127060835.GA570@unicorn.suse.cz
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 12:05:43 +0000 (13:05 +0100)]

Merge branch 'sfc-refactor-mcdi-filtering-code'

Alex Maftei says:

====================
sfc: refactor mcdi filtering code

Splitting final bits of the driver code into different files, which
will later be used in another driver for a new product.

This is a continuation to my previous patch series. (three of them)
Refactoring will be concluded with this series, for now.

As instructed, split the renaming and moving into different patches.
Removed stray spaces before tabs... twice.
Minor refactoring was done with the renaming, as explained in the
first patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alex Maftei (amaftei) [Mon, 27 Jan 2020 11:13:55 +0000 (11:13 +0000)]

sfc: move mcdi filtering code

Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alex Maftei (amaftei) [Mon, 27 Jan 2020 11:13:40 +0000 (11:13 +0000)]

sfc: create header for mcdi filtering code

Moved structs, enums, and added function prototypes.

The affected functions are no longer static.

Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alex Maftei (amaftei) [Mon, 27 Jan 2020 11:13:27 +0000 (11:13 +0000)]

sfc: rename mcdi filtering functions/structs

Minor style fixes included due to name lengths changing.

Signed-off-by: Alexandru-Mihai Maftei <amaftei@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:33:29 +0000 (11:33 +0100)]

Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

This patch-set includes link up and link initialization improvements,
RSS and aRFS improvements, devlink refactoring and registration
improvements, devlink info support including documentation.

v2: Removed the TC ingress rate limiting patch. The developer Harsha needs
to rework some code.
Use fw.psid suggested by Jakub Kicinski.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:27 +0000 (04:56 -0500)]

devlink: document devlink info versions reported by bnxt_en driver

Add the set of info versions reported by bnxt_en driver, including
a description of what the version represents, and what modes (fixed,
running, stored) it reports.

v2: Use fw.psid.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:26 +0000 (04:56 -0500)]

bnxt_en: Add support for devlink info command

Display the following information via devlink info command:
  - Driver name
  - Board id
  - Broad revision
  - Board Serial number
  - Board FW version
  - FW parameter set version
  - FW App version
  - FW management version
  - FW RoCE version

  Standard output example:
  $ devlink dev info pci/0000:3b:00.0
  pci/0000:3b:00.0:
  driver bnxt_en
  serial_number 00-10-18-FF-FE-AD-05-00
  versions:
      fixed:
        asic.id D802
        asic.rev 1
      running:
        fw 216.1.124.0
        fw.psid 0.0.0
        fw.app 216.1.122.0
        fw.mgmt 864.0.32.0
        fw.roce 216.1.15.0

[ This version has incorporated changes suggested by Jakub Kicinski to
  use generic devlink version tags. ]

v2: Use fw.psid

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:25 +0000 (04:56 -0500)]

devlink: add macro for "fw.roce"

Add definition and documentation for the new generic info "fw.roce".

v2: Remove board.nvm_cfg since fw.psid is similar.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:24 +0000 (04:56 -0500)]

bnxt_en: Rename switch_id to dsn

Instead of switch_id, renaming it to dsn will be more meaningful
so that it can be used to display device serial number in follow up
patch via devlink_info command.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:23 +0000 (04:56 -0500)]

bnxt_en: Add support to update progress of flash update

This patch adds status notification to devlink flash update
while flashing is in progress.

$ devlink dev flash pci/0000:05:00.0 file 103.pkg
Preparing to flash
Flashing done

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:22 +0000 (04:56 -0500)]

bnxt_en: Move devlink_register before registering netdev

Latest kernels get the phys_port_name via devlink, if
ndo_get_phys_port_name is not defined. To provide the phys_port_name
correctly, register devlink before registering netdev.

Also call devlink_port_type_eth_set() after registering netdev as
devlink port updates the netdev structure and notifies user.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:21 +0000 (04:56 -0500)]

bnxt_en: Register devlink irrespective of firmware spec version

This will allow to register for devlink port and use port features.
Also register params only if firmware spec version is at least 0x10600
which will support reading/setting numbered variables in NVRAM.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasundhara Volam [Mon, 27 Jan 2020 09:56:20 +0000 (04:56 -0500)]

bnxt_en: Refactor bnxt_dl_register()

Define bnxt_dl_params_register() and bnxt_dl_params_unregister()
functions and move params register/unregister code to these newly
defined functions. This patch is in preparation to register
devlink irrespective of firmware spec. version in the next patch.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:19 +0000 (04:56 -0500)]

bnxt_en: Disable workaround for lost interrupts on 575XX B0 and newer chips.

The hardware bug has been fixed on B0 and newer chips, so disable the
workaround on these chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Pavan Chebbi [Mon, 27 Jan 2020 09:56:18 +0000 (04:56 -0500)]

bnxt_en: Periodically check and remove aged-out ntuple filters

Currently the only time we check and remove expired filters is
when we are inserting new filters.
Improving the aRFS expiry handling by adding code to do the above
work periodically.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:17 +0000 (04:56 -0500)]

bnxt_en: Do not accept fragments for aRFS flow steering.

In bnxt_rx_flow_steer(), if the dissected packet is a fragment, do not
proceed to create the ntuple filter and return error instead. Otherwise
we would create a filter with 0 source and destination ports because
the dissected ports would not be available for fragments.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:16 +0000 (04:56 -0500)]

bnxt_en: Support UDP RSS hashing on 575XX chips.

575XX (P5) chips have the same UDP RSS hashing capability as P4 chips,
so we can enable it on P5 chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:15 +0000 (04:56 -0500)]

bnxt_en: Remove the setting of dev_port.

The dev_port is meant to distinguish the network ports belonging to
the same PCI function. Our devices only have one network port
associated with each PCI function and so we should not set it for
correctness.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:14 +0000 (04:56 -0500)]

bnxt_en: Improve bnxt_probe_phy().

If the 2nd parameter fw_dflt is not set, we are calling bnxt_probe_phy()
after the firmware has reset. There is no need to query the current
PHY settings from firmware as these settings may be different from
the ethtool settings that the driver will re-establish later. So
return earlier in bnxt_probe_phy() to save one firmware call.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michael Chan [Mon, 27 Jan 2020 09:56:13 +0000 (04:56 -0500)]

bnxt_en: Improve link up detection.

In bnxt_update_phy_setting(), ethtool_get_link_ksettings() and
bnxt_disable_an_for_lpbk(), we inconsistently use netif_carrier_ok()
to determine link.  Instead, we should use bp->link_info.link_up
which has the true link state.  The netif_carrier state may be off
during self-test and while the device is being reset and may not always
reflect the true link state.

By always using bp->link_info.link_up, the code is now more
consistent and more correct.  Some unnecessary link toggles are
now prevented with this patch.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:31:36 +0000 (11:31 +0100)]

Merge branch 'ethtool-netlink-interface-part-2'

Michal Kubecek says:

====================
ethtool netlink interface, part 2

This shorter series adds support for getting and setting of wake-on-lan
settings and message mask (originally message level). Together with the
code already in net-next, this will allow full implementation of
"ethtool <dev>" and "ethtool -s <dev> ...".

Older versions of the ethtool netlink series allowed getting WoL settings
by unprivileged users and only filtered out the password but this was
a source of controversy so for now, ETHTOOL_MSG_WOL_GET request always
requires CAP_NET_ADMIN as ETHTOOL_GWOL ioctl request does.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:19 +0000 (23:11 +0100)]

ethtool: add WOL_NTF notification

Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.

As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:16 +0000 (23:11 +0100)]

ethtool: set wake-on-lan settings with WOL_SET request

Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:13 +0000 (23:11 +0100)]

ethtool: provide WoL settings with WOL_GET request

Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.

As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:10 +0000 (23:11 +0100)]

ethtool: add DEBUG_NTF notification

Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.

The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:07 +0000 (23:11 +0100)]

ethtool: set message mask with DEBUG_SET request

Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:04 +0000 (23:11 +0100)]

ethtool: provide message mask with DEBUG_GET request

Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)

As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Michal Kubecek [Sun, 26 Jan 2020 22:11:01 +0000 (23:11 +0100)]

ethtool: fix kernel-doc descriptions

Fix missing or incorrect function argument and struct member descriptions.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:25:36 +0000 (11:25 +0100)]

Merge tag 'wireless-drivers-next-2020-01-26' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.6

Second set of patches for v5.6. Nothing special standing out, smaller
new features and fixes allover.

Major changes:

ar5523

* add support for SMCWUSBT-G2 USB device

iwlwifi

* support new versions of the FTM FW APIs

* support new version of the beacon template FW API

* print some extra information when the driver is loaded

rtw88

* support wowlan feature for 8822c

* add support for WIPHY_WOWLAN_NET_DETECT

brcmfmac

* add initial support for monitor mode

qtnfmac

* add module parameter to enable DFS offloading in firmware

* add support for STA HE rates

* add support for TWT responder and spatial reuse
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Daniel Borkmann [Mon, 27 Jan 2020 10:25:07 +0000 (11:25 +0100)]

Merge branch 'bpf-flow-dissector-fix-port-ranges'

Yoshiki Komachi says:

====================
When I tried a test based on the selftest program for BPF flow dissector
(test_flow_dissector.sh), I observed unexpected result as below:

$ tc filter add dev lo parent ffff: protocol ip pref 1337 flower ip_proto \
udp src_port 8-10 action drop
$ tools/testing/selftests/bpf/test_flow_dissector -i 4 -f 9 -F
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10

The last rx means the number of received packets. I expected rx=0 in this
test (i.e., all received packets should have been dropped), but it resulted
in acceptance.

Although the previous commit 8ffb055beae5 ("cls_flower: Fix the behavior
using port ranges with hw-offload") added new flag and field toward filtering
based on port ranges with hw-offload, it missed applying for BPF flow dissector
then. As a result, BPF flow dissector currently stores data extracted from
packets in incorrect field used for exact match whenever packets are classified
by filters based on port ranges. Thus, they never match rules in such cases
because flow dissector gives rise to generating incorrect flow keys.

This series fixes the issue by replacing incorrect flag and field with new
ones in BPF flow dissector, and adds a test for filtering based on specified
port ranges to the existing selftest program.

Changes in v2:
- set key_ports to NULL at the top of __skb_flow_bpf_to_target()
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

commit | commitdiff | tree

Yoshiki Komachi [Fri, 17 Jan 2020 07:05:33 +0000 (16:05 +0900)]

selftests/bpf: Add test based on port range for BPF flow dissector

Add a simple test to make sure that a filter based on specified port
range classifies packets correctly.

Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117070533.402240-3-komachi.yoshiki@gmail.com

commit | commitdiff | tree

Yoshiki Komachi [Fri, 17 Jan 2020 07:05:32 +0000 (16:05 +0900)]

flow_dissector: Fix to use new variables for port ranges in bpf hook

This patch applies new flag (FLOW_DISSECTOR_KEY_PORTS_RANGE) and
field (tp_range) to BPF flow dissector to generate appropriate flow
keys when classified by specified port ranges.

Fixes: 8ffb055beae5 ("cls_flower: Fix the behavior using port ranges with hw-offload")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117070533.402240-2-komachi.yoshiki@gmail.com

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:24:46 +0000 (11:24 +0100)]

Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2020-01-26

Here's (probably) the last bluetooth-next pull request for the 5.6 kernel.

- Initial pieces of Bluetooth 5.2 Isochronous Channels support
- mgmt: Various cleanups and a new Set Blocked Keys command
- btusb: Added support for 04ca:3021 QCA_ROME device
- hci_qca: Multiple fixes & cleanups
- hci_bcm: Fixes & improved device tree support
- Fixed attempts to create duplicate debugfs entries

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christophe JAILLET [Sun, 26 Jan 2020 10:44:29 +0000 (11:44 +0100)]

drivers: net: xgene: Fix the order of the arguments of 'alloc_etherdev_mqs()'

'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here
looks reversed.

Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep
the correct semantic.

In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8.

Fixes: 107dec2749fe ("drivers: net: xgene: Add support for multiple queues")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

John Fastabend [Mon, 27 Jan 2020 00:14:02 +0000 (16:14 -0800)]

bpf, xdp: Remove no longer required rcu_read_{un}lock()

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com

commit | commitdiff | tree

John Fastabend [Mon, 27 Jan 2020 00:14:01 +0000 (16:14 -0800)]

bpf, xdp: virtio_net use access ptr macro for xdp enable check

virtio_net currently relies on rcu critical section to access the xdp
program in its xdp_xmit handler. However, the pointer to the xdp program
is only used to do a NULL pointer comparison to determine if xdp is
enabled or not.

Use rcu_access_pointer() instead of rcu_dereference() to reflect this.
Then later when we drop rcu_read critical section virtio_net will not
need in special handling.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-3-git-send-email-john.fastabend@gmail.com

commit | commitdiff | tree

John Fastabend [Mon, 27 Jan 2020 00:14:00 +0000 (16:14 -0800)]

bpf, xdp: Update devmap comments to reflect napi/rcu usage

Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-2-git-send-email-john.fastabend@gmail.com

commit | commitdiff | tree

Heiner Kallweit [Sun, 26 Jan 2020 09:40:44 +0000 (10:40 +0100)]

r8169: don't set min_mtu/max_mtu if not needed

Defaults for min_mtu and max_mtu are set by ether_setup(), which is
called from devm_alloc_etherdev(). Let rtl_jumbo_max() only return
a positive value if actually jumbo packets are supported. This also
allows to remove constant Jumbo_1K which is a little misleading anyway.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Christophe JAILLET [Sat, 25 Jan 2020 21:18:47 +0000 (22:18 +0100)]

mlxsw: minimal: Fix an error handling path in 'mlxsw_m_port_create()'

An 'alloc_etherdev()' called is not ballanced by a corresponding
'free_netdev()' call in one error handling path.

Slighly reorder the error handling code to catch the missed case.

Fixes: c100e47caa8e ("mlxsw: minimal: Add ethtool support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vladimir Oltean [Sat, 25 Jan 2020 21:01:11 +0000 (23:01 +0200)]

net: dsa: Fix use-after-free in probing of DSA switch tree

DSA sets up a switch tree little by little. Every switch of the N
members of the tree calls dsa_register_switch, and (N - 1) will just
touch the dst->ports list with their ports and quickly exit. Only the
last switch that calls dsa_register_switch will find all DSA links
complete in dsa_tree_setup_routing_table, and not return zero as a
result but instead go ahead and set up the entire DSA switch tree
(practically on behalf of the other switches too).

The trouble is that the (N - 1) switches don't clean up after themselves
after they get an error such as EPROBE_DEFER. Their footprint left in
dst->ports by dsa_switch_touch_ports is still there. And switch N, the
one responsible with actually setting up the tree, is going to work with
those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
ds->dev might get freed by the device driver.

Be there a 2-switch tree and the following calling order:
- Switch 1 calls dsa_register_switch
  - Calls dsa_switch_touch_ports, populates dst->ports
  - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
- Switch 2 calls dsa_register_switch
  - Calls dsa_switch_touch_ports, populates dst->ports
  - Probe doesn't get deferred, so it goes ahead.
  - Calls dsa_tree_setup_routing_table, which returns "complete == true"
    due to Switch 1 having called dsa_switch_touch_ports before.
  - Because the DSA links are complete, it calls dsa_tree_setup_switches
    now.
  - dsa_tree_setup_switches iterates through dst->ports, initializing
    the Switch 1 ds structure (invalid) and the Switch 2 ds structure
    (valid).
  - Undefined behavior (use after free, sometimes NULL pointers, etc).

Real example below (debugging prints added by me, as well as guards
against NULL pointers):

[    5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
[    6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
[    6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[    7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)

The solution is to recognize that the functions that call
dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
effects, and therefore one should clean up their side effects on error
path. The cleanup of dst->ports was taken from dsa_switch_remove and
moved into a dedicated dsa_switch_release_ports function, which should
really be per-switch (free only the members of dst->ports that are also
members of ds, instead of all switch ports).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Heiner Kallweit [Sat, 25 Jan 2020 12:42:14 +0000 (13:42 +0100)]

net: remove eth_change_mtu

All usage of this function was removed three years ago, and the
function was marked as deprecated:
a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage")
So I think we can remove it now.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:05:42 +0000 (11:05 +0100)]

Merge branch 'XDP-fixes-for-socionext-driver'

Lorenzo Bianconi says:

====================
XDP fixes for socionext driver

Fix possible user-after-in XDP rx path
Fix rx statistics accounting if no bpf program is attached
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Bianconi [Sat, 25 Jan 2020 11:48:51 +0000 (12:48 +0100)]

net: socionext: fix xdp_result initialization in netsec_process_rx

Fix xdp_result initialization in netsec_process_rx in order to not
increase rx counters if there is no bpf program attached to the xdp hook
and napi_gro_receive returns GRO_DROP

Fixes: ba2b232108d3c ("net: netsec: add XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Bianconi [Sat, 25 Jan 2020 11:48:50 +0000 (12:48 +0100)]

net: socionext: fix possible user-after-free in netsec_process_rx

Fix possible use-after-free in in netsec_process_rx that can occurs if
the first packet is sent to the normal networking stack and the
following one is dropped by the bpf program attached to the xdp hook.
Fix the issue defining the skb pointer in the 'budget' loop

Fixes: ba2b232108d3c ("net: netsec: add XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:03:44 +0000 (11:03 +0100)]

Merge branch 'net-allow-per-net-notifier-to-follow-netdev-into-namespace'

Jiri Pirko says:

====================
net: allow per-net notifier to follow netdev into namespace

Currently we have per-net notifier, which allows to get only
notifications relevant to particular network namespace. That is enough
for drivers that have netdevs local in a particular namespace (cannot
move elsewhere).

However if netdev can change namespace, per-net notifier cannot be used.
Introduce dev_net variant that is basically per-net notifier with an
extension that re-registers the per-net notifier upon netdev namespace
change. Basically the per-net notifier follows the netdev into
namespace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Sat, 25 Jan 2020 11:17:09 +0000 (12:17 +0100)]

mlx5: Use dev_net netdevice notifier registrations

Register the dev_net notifier and allow the per-net notifier to follow
the device into different namespace.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Sat, 25 Jan 2020 11:17:08 +0000 (12:17 +0100)]

net: introduce dev_net notifier register/unregister variants

Introduce dev_net variants of netdev notifier register/unregister functions
and allow per-net notifier to follow the netdevice into the namespace it is
moved to.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Sat, 25 Jan 2020 11:17:07 +0000 (12:17 +0100)]

net: push code from net notifier reg/unreg into helpers

Push the code which is done under rtnl lock in net notifier register and
unregister function into separate helpers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Pirko [Sat, 25 Jan 2020 11:17:06 +0000 (12:17 +0100)]

net: call call_netdevice_unregister_net_notifiers from unregister

The function does the same thing as the existing code, so rather call
call_netdevice_unregister_net_notifiers() instead of code duplication.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kuniyuki Iwashima [Sat, 25 Jan 2020 10:41:02 +0000 (10:41 +0000)]

soreuseport: Cleanup duplicate initialization of more_reuse->max_socks.

reuseport_grow() does not need to initialize the more_reuse->max_socks
again. It is already initialized in __reuseport_alloc().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 10:00:21 +0000 (11:00 +0100)]

Merge branch 'Support-fraglist-GRO-GSO'

Steffen Klassert says:

====================
Support fraglist GRO/GSO

This patchset adds support to do GRO/GSO by chaining packets
of the same flow at the SKB frag_list pointer. This avoids
the overhead to merge payloads into one big packet, and
on the other end, if GSO is needed it avoids the overhead
of splitting the big packet back to the native form.

Patch 1 adds netdev feature flags to enable fraglist GRO,
this implements one of the configuration options discussed
at netconf 2019.

Patch 2 adds a netdev software feature set that defaults to off
and assigns the new fraglist GRO feature flag to it.

Patch 3 adds the core infrastructure to do fraglist GRO/GSO.

Patch 4 enables UDP to use fraglist GRO/GSO if configured.

I have only meaningful forwarding performance measurements.
I did some tests for the local receive path with netperf and iperf,
but in this case the sender that generates the packets is the
bottleneck. So the benchmarks are not that meaningful for the
receive path.

Paolo Abeni did some benchmarks of the local receive path for the
RFC v2 version of this pachset, results can be found here:

https://www.spinics.net/lists/netdev/msg551158.html

I used my IPsec forwarding test setup for the performance measurements:

           ------------         ------------
        -->| router 1 |-------->| router 2 |--
        |  ------------         ------------  |
        |                                     |
        |       --------------------          |
        --------|Spirent Testcenter|<----------
                --------------------

net-next (September 7th 2019):

Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps).

----------------------------------------------------------------------

net-next (September 7th 2019) + standard UDP GRO/GSO (not implemented
in this patchset):

Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps).

----------------------------------------------------------------------

net-next (September 7th 2019) + fraglist UDP GRO/GSO:

Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps).

=======================================================================

net-next (January 23th 2020):

Single stream UDP frame size 1460 Bytes: 919.000 fps (10.73 Gbps).

----------------------------------------------------------------------

net-next (January 23th 2020) + fraglist UDP GRO/GSO:

Single stream UDP frame size 1460 Bytes: 2.430.000 fps (28.38 Gbps).

-----------------------------------------------------------------------

Changes from RFC v1:

- Add IPv6 support.
- Split patchset to enable UDP GRO by default before adding
  fraglist GRO support.
- Mark fraglist GRO packets as CHECKSUM_NONE.
- Take a refcount on the first segment skb when doing fraglist
  segmentation. With this we can use the same error handling
  path as with standard segmentation.

Changes from RFC v2:

- Add a netdev feature flag to configure listifyed GRO.
- Fix UDP GRO enabling for IPv6.
- Fix a rcu_read_lock() imbalance.
- Fix error path in skb_segment_list().

Changes from RFC v3:

- Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add
  NETIF_F_GSO_FRAGLIST.
- Move introduction of SKB_GSO_FRAGLIST to patch 2.
- Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6.
- Move some missplaced code from patch 5 to patch 1 where it belongs to.

Changes from RFC v4:

- Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO
  is not changed with this patchset.
- Rebase to net-next current.

Changes fom v1 (December 18th):

- Do a full __copy_skb_header instead of tryng to find the really
  needed subset header fields. Thisa can be done later.
- Mark all fraglist GRO packets with CHECKSUM_UNNECESSARY.
- Rebase to net-next current.

Changes fom v2 (January 24th):

- Do the CHECKSUM_UNNECESSARY setting from IPv4 for IPv6 too.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Steffen Klassert [Sat, 25 Jan 2020 10:26:45 +0000 (11:26 +0100)]

udp: Support UDP fraglist GRO/GSO.

This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
If the feature is enabled, all UDP packets are going to
fraglist GRO (local input and forward).

After validating the csum, we mark ip_summed as
CHECKSUM_UNNECESSARY for fraglist GRO packets to
make sure that the csum is not touched.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Steffen Klassert [Sat, 25 Jan 2020 10:26:44 +0000 (11:26 +0100)]

net: Support GRO/GSO fraglist chaining.

This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Steffen Klassert [Sat, 25 Jan 2020 10:26:43 +0000 (11:26 +0100)]

net: Add a netdev software feature set that defaults to off.

The previous patch added the NETIF_F_GRO_FRAGLIST feature.
This is a software feature that should default to off.
Current software features default to on, so add a new
feature set that defaults to off.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Steffen Klassert [Sat, 25 Jan 2020 10:26:42 +0000 (11:26 +0100)]

net: Add fraglist GRO/GSO feature flags

This adds new Fraglist GRO/GSO feature flags. They will be used
to configure fraglist GRO/GSO what will be implemented with some
followup paches.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sven Auhagen [Sat, 25 Jan 2020 08:07:03 +0000 (08:07 +0000)]

mvneta driver disallow XDP program on hardware buffer management

Recently XDP Support was added to the mvneta driver
for software buffer management only.
It is still possible to attach an XDP program if
hardware buffer management is used.
It is not doing anything at that point.

The patch disallows attaching XDP programs to mvneta
if hardware buffer management is used.

I am sorry about that. It is my first submission and I am having
some troubles with the format of my emails.

v4 -> v5:
- Remove extra tabs

v3 -> v4:
- Please ignore v3 I accidentally submitted
my other patch with git-send-mail and v4 is correct

v2 -> v3:
- My mailserver corrupted the patch
resubmission with git-send-email

v1 -> v2:
- Fixing the patches indentation

Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Howells [Fri, 24 Jan 2020 23:08:04 +0000 (23:08 +0000)]

rxrpc: Fix use-after-free in rxrpc_receive_data()

The subpacket scanning loop in rxrpc_receive_data() references the
subpacket count in the private data part of the sk_buff in the loop
termination condition. However, when the final subpacket is pasted into
the ring buffer, the function is no longer has a ref on the sk_buff and
should not be looking at sp->* any more. This point is actually marked in
the code when skb is cleared (but sp is not - which is an error).

Fix this by caching sp->nr_subpackets in a local variable and using that
instead.

Also clear 'sp' to catch accesses after that point.

This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
trashed by the sk_buff getting freed and reused in the meantime.

Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Fri, 24 Jan 2020 22:57:20 +0000 (14:57 -0800)]

net_sched: ematch: reject invalid TCF_EM_SIMPLE

It is possible for malicious userspace to set TCF_EM_SIMPLE bit
even for matches that should not have this bit set.

This can fool two places using tcf_em_is_simple()

1) tcf_em_tree_destroy() -> memory leak of em->data
   if ops->destroy() is NULL

2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes
   of a kernel pointer.

BUG: memory leak
unreferenced object 0xffff888121850a40 (size 32):
  comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s)
  hex dump (first 32 bytes):
    00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
    [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline]
    [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline]
    [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline]
    [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671
    [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127
    [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline]
    [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32
    [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline]
    [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline]
    [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300
    [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline]
    [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219
    [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104
    [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415
    [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
    [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
    [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
    [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
    [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
    [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline]
    [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659
    [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
    [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
    [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
    [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline]
    [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline]
    [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vasily Averin [Sat, 25 Jan 2020 09:10:02 +0000 (12:10 +0300)]

bpf: map_seq_next should always increase position index

If seq_file .next fuction does not change position index,
read after some lseek can generate an unexpected output.

See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283

v1 -> v2: removed missed increment in end of function

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com

commit | commitdiff | tree

Stephen Worley [Fri, 24 Jan 2020 21:53:27 +0000 (16:53 -0500)]

net: include struct nhmsg size in nh nlmsg size

Include the size of struct nhmsg size when calculating
how much of a payload to allocate in a new netlink nexthop
notification message.

Without this, we will fail to fill the skbuff at certain nexthop
group sizes.

You can reproduce the failure with the following iproute2 commands:

ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link add dummy3 type dummy
ip link add dummy4 type dummy
ip link add dummy5 type dummy
ip link add dummy6 type dummy
ip link add dummy7 type dummy
ip link add dummy8 type dummy
ip link add dummy9 type dummy
ip link add dummy10 type dummy
ip link add dummy11 type dummy
ip link add dummy12 type dummy
ip link add dummy13 type dummy
ip link add dummy14 type dummy
ip link add dummy15 type dummy
ip link add dummy16 type dummy
ip link add dummy17 type dummy
ip link add dummy18 type dummy
ip link add dummy19 type dummy

ip ro add 1.1.1.1/32 dev dummy1
ip ro add 1.1.1.2/32 dev dummy2
ip ro add 1.1.1.3/32 dev dummy3
ip ro add 1.1.1.4/32 dev dummy4
ip ro add 1.1.1.5/32 dev dummy5
ip ro add 1.1.1.6/32 dev dummy6
ip ro add 1.1.1.7/32 dev dummy7
ip ro add 1.1.1.8/32 dev dummy8
ip ro add 1.1.1.9/32 dev dummy9
ip ro add 1.1.1.10/32 dev dummy10
ip ro add 1.1.1.11/32 dev dummy11
ip ro add 1.1.1.12/32 dev dummy12
ip ro add 1.1.1.13/32 dev dummy13
ip ro add 1.1.1.14/32 dev dummy14
ip ro add 1.1.1.15/32 dev dummy15
ip ro add 1.1.1.16/32 dev dummy16
ip ro add 1.1.1.17/32 dev dummy17
ip ro add 1.1.1.18/32 dev dummy18
ip ro add 1.1.1.19/32 dev dummy19

ip next add id 1 via 1.1.1.1 dev dummy1
ip next add id 2 via 1.1.1.2 dev dummy2
ip next add id 3 via 1.1.1.3 dev dummy3
ip next add id 4 via 1.1.1.4 dev dummy4
ip next add id 5 via 1.1.1.5 dev dummy5
ip next add id 6 via 1.1.1.6 dev dummy6
ip next add id 7 via 1.1.1.7 dev dummy7
ip next add id 8 via 1.1.1.8 dev dummy8
ip next add id 9 via 1.1.1.9 dev dummy9
ip next add id 10 via 1.1.1.10 dev dummy10
ip next add id 11 via 1.1.1.11 dev dummy11
ip next add id 12 via 1.1.1.12 dev dummy12
ip next add id 13 via 1.1.1.13 dev dummy13
ip next add id 14 via 1.1.1.14 dev dummy14
ip next add id 15 via 1.1.1.15 dev dummy15
ip next add id 16 via 1.1.1.16 dev dummy16
ip next add id 17 via 1.1.1.17 dev dummy17
ip next add id 18 via 1.1.1.18 dev dummy18
ip next add id 19 via 1.1.1.19 dev dummy19

ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
ip next del id 1111

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrey Ignatov [Fri, 24 Jan 2020 22:41:42 +0000 (14:41 -0800)]

tools/bpf: Allow overriding llvm tools for runqslower

tools/testing/selftests/bpf/Makefile supports overriding clang, llc and
other tools so that custom ones can be used instead of those from PATH.
It's convinient and heavily used by some users.

Apply same rules to runqslower/Makefile.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200124224142.1833678-1-rdna@fb.com

commit | commitdiff | tree

Cong Wang [Fri, 24 Jan 2020 01:27:08 +0000 (17:27 -0800)]

net_sched: walk through all child classes in tc_bind_tclass()

In a complex TC class hierarchy like this:

tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         \
  avpkt 1000 cell 8
tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  \
  rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      \
  avpkt 1000 bounded

tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
  sport 80 0xffff flowid 1:3
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
  sport 25 0xffff flowid 1:4

tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  \
  rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      \
  avpkt 1000
tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  \
  rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      \
  avpkt 1000

where filters are installed on qdisc 1:0, so we can't merely
search from class 1:1 when creating class 1:3 and class 1:4. We have
to walk through all the child classes of the direct parent qdisc.
Otherwise we would miss filters those need reverse binding.

Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Cong Wang [Fri, 24 Jan 2020 00:26:18 +0000 (16:26 -0800)]

net_sched: fix ops->bind_class() implementations

The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().

In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.

Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.

Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 27 Jan 2020 09:17:15 +0000 (10:17 +0100)]

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

This batch contains Netfilter updates for net-next:

1) Add nft_setelem_parse_key() helper function.

2) Add NFTA_SET_ELEM_KEY_END to specify a range with one single element.

3) Add NFTA_SET_DESC_CONCAT to describe the set element concatenation,
   from Stefano Brivio.

4) Add bitmap_cut() to copy n-bits from source to destination,
   from Stefano Brivio.

5) Add set to match on arbitrary concatenations, from Stefano Brivio.

6) Add selftest for this new set type. An extract of Stefano's
   description follows:

"Existing nftables set implementations allow matching entries with
interval expressions (rbtree), e.g. 192.0.2.1-192.0.2.4, entries
specifying field concatenation (hash, rhash), e.g. 192.0.2.1:22,
but not both.

In other words, none of the set types allows matching on range
expressions for more than one packet field at a time, such as ipset
does with types bitmap:ip,mac, and, to a more limited extent
(netmasks, not arbitrary ranges), with types hash:net,net,
hash:net,port, hash:ip,port,net, and hash:net,port,net.

As a pure hash-based approach is unsuitable for matching on ranges,
and "proxying" the existing red-black tree type looks impractical as
elements would need to be shared and managed across all employed
trees, this new set implementation intends to fill the functionality
gap by employing a relatively novel approach.

The fundamental idea, illustrated in deeper detail in patch 5/9, is to
use lookup tables classifying a small number of grouped bits from each
field, and map the lookup results in a way that yields a verdict for
the full set of specified fields.

The grouping bit aspect is loosely inspired by the Grouper algorithm,
by Jay Ligatti, Josh Kuhn, and Chris Gage (see patch 5/9 for the full
reference).

A reference, stand-alone implementation of the algorithm itself is
available at:
        https://pipapo.lameexcu.se

Some notes about possible future optimisations are also mentioned
there. This algorithm reduces the matching problem to, essentially,
a repetitive sequence of simple bitwise operations, and is
particularly suitable to be optimised by leveraging SIMD instruction
sets."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Stefano Brivio [Tue, 21 Jan 2020 23:17:56 +0000 (00:17 +0100)]

selftests: netfilter: Introduce tests for sets with range concatenation

This test covers functionality and stability of the newly added
nftables set implementation supporting concatenation of ranged
fields.

For some selected set expression types, test:
- correctness, by checking that packets match or don't
- concurrency, by attempting races between insertion, deletion, lookup
- timeout feature, checking that packets don't match expired entries

and (roughly) estimate matching rates, comparing to baselines for
simple drop on netdev ingress hook and for hash and rbtrees sets.

In order to send packets, this needs one of sendip, netcat or bash.
To flood with traffic, iperf3, iperf and netperf are supported. For
performance measurements, this relies on the sample pktgen script
pktgen_bench_xmit_mode_netif_receive.sh.

If none of the tools suitable for a given test are available, specific
tests will be skipped.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Stefano Brivio [Tue, 21 Jan 2020 23:17:55 +0000 (00:17 +0100)]

nf_tables: Add set type for arbitrary concatenation of ranges

This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.

Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
http://www.cse.usf.edu/~ligatti/projects/grouper/

Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.

In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.

The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.

A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.

A stand-alone implementation of this algorithm is available at:
https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).

This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.

At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):

TEST: performance
  net,port                                                      [ OK ]
    baseline (drop from netdev hook):              10190076pps
    baseline hash (non-ranged entries):             6179564pps
    baseline rbtree (match on first field only):    2950341pps
    set with  1000 full, ranged entries:            2304165pps
  port,net                                                      [ OK ]
    baseline (drop from netdev hook):              10143615pps
    baseline hash (non-ranged entries):             6135776pps
    baseline rbtree (match on first field only):    4311934pps
    set with   100 full, ranged entries:            4131471pps
  net6,port                                                     [ OK ]
    baseline (drop from netdev hook):               9730404pps
    baseline hash (non-ranged entries):             4809557pps
    baseline rbtree (match on first field only):    1501699pps
    set with  1000 full, ranged entries:            1092557pps
  port,proto                                                    [ OK ]
    baseline (drop from netdev hook):              10812426pps
    baseline hash (non-ranged entries):             6929353pps
    baseline rbtree (match on first field only):    3027105pps
    set with 30000 full, ranged entries:             284147pps
  net6,port,mac                                                 [ OK ]
    baseline (drop from netdev hook):               9660114pps
    baseline hash (non-ranged entries):             3778877pps
    baseline rbtree (match on first field only):    3179379pps
    set with    10 full, ranged entries:            2082880pps
  net6,port,mac,proto                                           [ OK ]
    baseline (drop from netdev hook):               9718324pps
    baseline hash (non-ranged entries):             3799021pps
    baseline rbtree (match on first field only):    1506689pps
    set with  1000 full, ranged entries:             783810pps
  net,mac                                                       [ OK ]
    baseline (drop from netdev hook):              10190029pps
    baseline hash (non-ranged entries):             5172218pps
    baseline rbtree (match on first field only):    2946863pps
    set with  1000 full, ranged entries:            1279122pps

v4:
- fix build for 32-bit architectures: 64-bit division needs
   div_u64() (kbuild test robot <lkp@intel.com>)
v3:
- rework interface for field length specification,
   NFT_SET_SUBKEY disappears and information is stored in
   description
- remove scratch area to store closing element of ranges,
   as elements now come with an actual attribute to specify
   the upper range limit (Pablo Neira Ayuso)
- also remove pointer to 'start' element from mapping table,
   closing key is now accessible via extension data
- use bytes right away instead of bits for field lengths,
   this way we can also double the inner loop of the lookup
   function to take care of upper and lower bits in a single
   iteration (minor performance improvement)
- make it clearer that set operations are actually atomic
   API-wise, but we can't e.g. implement flush() as one-shot
   action
- fix type for 'dup' in nft_pipapo_insert(), check for
   duplicates only in the next generation, and in general take
   care of differentiating generation mask cases depending on
   the operation (Pablo Neira Ayuso)
- report C implementation matching rate in commit message, so
   that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
- protect access to scratch maps in nft_pipapo_lookup() with
   local_bh_disable/enable() (Florian Westphal)
- drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
   already implied (Florian Westphal)
- explain why partial allocation failures don't need handling
   in pipapo_realloc_scratch(), rename 'm' to clone and update
   related kerneldoc to make it clear we're not operating on
   the live copy (Florian Westphal)
- add expicit check for priv->start_elem in
   nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
   with a NULL start element, and also zero it out in every
   operation that might make it invalid, so that insertion
   doesn't proceed with an invalid element (Florian Westphal)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Stefano Brivio [Tue, 21 Jan 2020 23:17:54 +0000 (00:17 +0100)]

bitmap: Introduce bitmap_cut(): cut bits and shift remaining

The new bitmap function bitmap_cut() copies bits from source to
destination by removing the region specified by parameters first
and cut, and remapping the bits above the cut region by right
shifting them.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Stefano Brivio [Tue, 21 Jan 2020 23:17:53 +0000 (00:17 +0100)]

netfilter: nf_tables: Support for sets with multiple ranged fields

Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.

This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.

In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.

While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.

For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:

  NFTA_SET_ELEM_KEY:            192.0.2.0 . 22
  NFTA_SET_ELEM_KEY_END:        192.0.2.42 . 25

and NFTA_SET_DESC_CONCAT attribute would contain:

  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN: 4
  NFTA_LIST_ELEM
    NFTA_SET_FIELD_LEN: 2

v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Pablo Neira Ayuso [Tue, 21 Jan 2020 23:17:52 +0000 (00:17 +0100)]

netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute

Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.

This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; add corresponding
nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Pablo Neira Ayuso [Tue, 21 Jan 2020 23:17:51 +0000 (00:17 +0100)]

netfilter: nf_tables: add nft_setelem_parse_key()

Add helper function to parse the set element key netlink attribute.

v4: No changes
v3: New patch

[sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN
instead of sizeof(*key) in helper, value can be longer than that;
rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

commit | commitdiff | tree

Kalle Valo [Sun, 26 Jan 2020 15:54:46 +0000 (17:54 +0200)]

Merge ath-next from git://git./linux/kernel/git/kvalo/ath.git

ath.git patches for v5.6. Major changes:

ar5523

* add support for SMCWUSBT-G2 USB device

commit | commitdiff | tree

Colin Ian King [Sun, 26 Jan 2020 00:09:54 +0000 (00:09 +0000)]

iwlegacy: ensure loop counter addr does not wrap and cause an infinite loop

The loop counter addr is a u16 where as the upper limit of the loop
is an int. In the unlikely event that the il->cfg->eeprom_size is
greater than 64K then we end up with an infinite loop since addr will
wrap around an never reach upper loop limit. Fix this by making addr
an int.

Addresses-Coverity: ("Infinite loop")
Fixes: be663ab67077 ("iwlwifi: split the drivers for agn and legacy devices 3945/4965")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Colin Ian King [Wed, 22 Jan 2020 09:33:40 +0000 (09:33 +0000)]

rtlwifi: btcoex: fix spelling mistake "initilized" -> "initialized"

There is a spelling mistake in one of the fields in the btc_coexist struct,
fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

YueHaibing [Tue, 21 Jan 2020 03:04:28 +0000 (11:04 +0800)]

rtlwifi: rtl8723ae: remove unused variables

drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:16:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:56:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:92:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]

These variable is never used, so remove them.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

YueHaibing [Tue, 21 Jan 2020 02:19:24 +0000 (10:19 +0800)]

rtlwifi: rtl8192ee: remove unused variables

drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:15:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:61:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:97:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]

These variable is never used, so remove them.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

YueHaibing [Tue, 21 Jan 2020 02:09:58 +0000 (10:09 +0800)]

rtlwifi: rtl8821ae: remove unused variables

drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:142:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:178:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:96:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]

These variable is never used, so remove them.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Colin Ian King [Tue, 14 Jan 2020 16:56:01 +0000 (16:56 +0000)]

rtlwifi: rtl8188ee: remove redundant assignment to variable cond

Variable cond is being assigned with a value that is never
read, it is assigned a new value later on. The assignment is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Mikhail Karpenko [Thu, 9 Jan 2020 13:17:55 +0000 (16:17 +0300)]

qtnfmac: add support for TWT responder and spatial reuse

Add support for 11ax features: TWT responder and spatial reuse.
Add separate structure for spatial reuse parameters and pass this
structure to firmware along with other parameters in start_ap
command. Pass TWT responder value to firmware. Bump qlink
protocol version.

Signed-off-by: Mikhail Karpenko <mkarpenko@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Sergey Matyukevich [Thu, 9 Jan 2020 13:17:54 +0000 (16:17 +0300)]

qtnfmac: add support for STA HE rates

Add HE rates into STA info. Report HE Rx/Tx MCS if STA supports them.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Sergey Matyukevich [Thu, 9 Jan 2020 13:17:53 +0000 (16:17 +0300)]

qtnfmac: control qtnfmac wireless interfaces bridging

Bridging qtnfmac interfaces is possible only if the following two
conditions are fulfilled:
- firmware advertises proper support with QLINK_HW_CAPAB_HW_BRIDGE
- kernel is built with CONFIG_NET_SWITCHDEV support

Otherwise adding qtnfmac wireless interfaces into the same bridge
should not be allowed since packets flooded by kernel may break
internal forwarding rules between interfaces.

This patch disables adding qtnfmac wireless interfaces into the
same bridge if no support is provided either by card or by kernel.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Sergey Matyukevich [Thu, 9 Jan 2020 13:17:52 +0000 (16:17 +0300)]

qtnfmac: add module param to configure DFS offload

Firmware may support DFS offload. However the final decision on whether
to use it or not should be up to the user. So even if firmware supports
DFS offload, it should be enabled only if user explicitly requests it.
For this purpose introduce kernel param dfs_offload which is disabled
by default.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Sergey Matyukevich [Thu, 9 Jan 2020 13:17:51 +0000 (16:17 +0300)]

qtnfmac: cleanup slave_radar access function

Currently this parameter is global, it is not specific to mac.
So this function does not need any input parameters.

Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

yuehaibing [Wed, 8 Jan 2020 13:57:48 +0000 (21:57 +0800)]

brcmfmac: Remove always false 'idx < 0' statement

idx is declared as u32, it will never less than 0.

Signed-off-by: yuehaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Yan-Hsuan Chuang [Tue, 7 Jan 2020 14:27:29 +0000 (22:27 +0800)]

rtw88: use shorter delay time to poll PS state

When TX packet arrives, driver should leave deep PS state to make
sure the DMA is working. After requested to leave deep PS state,
driver needs to poll the PS state to check if the mode has been
changed successfully. The driver used to check the state of the
hardware every 20 msecs, which means upon the first failure of
state check, the CPU is delayed 20 msecs for next check. This is
harmful for some time-sensitive applications such as media players.

So, use shorter delay time each check from 20 msecs to 100 usecs.
The state should be changed in several tries. But we still need
to reserve ~15 msecs in total in case of the state just took too
long to be changed successfully. If the states of driver and the
hardware is not synchronized, the power state could be locked
forever, which mean we could never enter/leave the PS state.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Yan-Hsuan Chuang [Tue, 7 Jan 2020 08:08:07 +0000 (16:08 +0800)]

rtw88: fix potential NULL skb access in TX ISR

Sometimes the TX queue may be empty and we could possible
dequeue a NULL pointer, crash the kernel. If the skb is NULL
then there is nothing to do, just leave the ISR.

And the TX queue should not be empty here, so print an error
to see if there is anything wrong for DMA ring.

Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

commit | commitdiff | tree

Rafał Miłecki [Thu, 26 Dec 2019 13:30:50 +0000 (14:30 +0100)]

brcmfmac: add initial support for monitor mode

Report monitor interface availability using cfg80211 and support it in
the add_virtual_intf() and del_virtual_intf() callbacks. This new
feature is conditional and depends on firmware flagging monitor packets.
Receiving monitor frames is already handled by the brcmf_netif_mon_rx().

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

John Crispins staging tree

RSS Atom