openwrt/staging/blogic.git
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Tue, 7 Aug 2018 18:02:05 +0000 (11:02 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-08-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add cgroup local storage for BPF programs, which provides a fast
   accessible memory for storing various per-cgroup data like number
   of transmitted packets, etc, from Roman.

2) Support bpf_get_socket_cookie() BPF helper in several more program
   types that have a full socket available, from Andrey.

3) Significantly improve the performance of perf events which are
   reported from BPF offload. Also convert a couple of BPF AF_XDP
   samples overto use libbpf, both from Jakub.

4) seg6local LWT provides the End.DT6 action, which allows to
   decapsulate an outer IPv6 header containing a Segment Routing Header.
   Adds this action now to the seg6local BPF interface, from Mathieu.

5) Do not mark dst register as unbounded in MOV64 instruction when
   both src and dst register are the same, from Arthur.

6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier
   instructions on arm64 for the AF_XDP sample code, from Brian.

7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts
   over from Python 2 to Python 3, from Jeremy.

8) Enable BTF build flags to the BPF sample code Makefile, from Taeung.

9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee.

10) Several improvements to the README.rst from the BPF documentation
    to make it more consistent with RST format, from Tobin.

11) Replace all occurrences of strerror() by calls to strerror_r()
    in libbpf and fix a FORTIFY_SOURCE build error along with it,
    from Thomas.

12) Fix a bug in bpftool's get_btf() function to correctly propagate
    an error via PTR_ERR(), from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoieee802154: hwsim: fix rcu address annotation
Alexander Aring [Tue, 7 Aug 2018 14:34:44 +0000 (10:34 -0400)]
ieee802154: hwsim: fix rcu address annotation

This patch fixes the following sparse warning about mismatch rcu
attribute for address space annotation:

...
error: incompatible types in comparison expression (different modifiers)
error: incompatible types in comparison expression (different address spaces)
...

Some __rcu annotation was at non-pointers list head structures and one was
missing in edge information which is used by rcu_assign_pointer() to
update edge setting information.

Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb")
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobpf: introduce update_effective_progs()
Roman Gushchin [Mon, 6 Aug 2018 21:27:28 +0000 (14:27 -0700)]
bpf: introduce update_effective_progs()

__cgroup_bpf_attach() and __cgroup_bpf_detach() functions have
a good amount of duplicated code, which is possible to eliminate
by introducing the update_effective_progs() helper function.

The update_effective_progs() calls compute_effective_progs()
and then in case of success it calls activate_effective_progs()
for each descendant cgroup. In case of failure (OOM), it releases
allocated prog arrays and return the error code.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'ieee802154-for-davem-2018-08-06' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Mon, 6 Aug 2018 20:17:48 +0000 (13:17 -0700)]
Merge branch 'ieee802154-for-davem-2018-08-06' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2018-08-06

An update from ieee802154 for *net-next*

Romuald added a socket option to get the LQI value of the received datagram.
Alexander added a new hardware simulation driver modelled after hwsim of the
wireless people. It allows runtime configuration for new nodes and edges over a
netlink interface (a config utlity is making its way into wpan-tools).
We also have three fixes in here. One from Colin which is more of a cleanup and
two from Alex fixing tailroom and single frame space problems.
I would normally put the last two into my fixes tree, but given we are already
in -rc8 I simply put them here and added a cc: stable to them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv4: frags: precedence bug in ip_expire()
Dan Carpenter [Mon, 6 Aug 2018 19:17:35 +0000 (22:17 +0300)]
ipv4: frags: precedence bug in ip_expire()

We accidentally removed the parentheses here, but they are required
because '!' has higher precedence than '&'.

Fixes: fa0f527358bd ("ip: use rb trees for IP frag queue.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptp_qoriq: use div_u64/div_u64_rem for 64-bit division
Yangbo Lu [Mon, 6 Aug 2018 04:39:11 +0000 (12:39 +0800)]
ptp_qoriq: use div_u64/div_u64_rem for 64-bit division

This is a fix-up patch for below build issue with multi_v7_defconfig.

drivers/ptp/ptp_qoriq.o: In function `qoriq_ptp_probe':
ptp_qoriq.c:(.text+0xd0c): undefined reference to `__aeabi_uldivmod'

Fixes: 91305f281262 ("ptp_qoriq: support automatic configuration for ptp timer")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: avoid unnecessary sock_flag() check when enable timestamp
Yafang Shao [Mon, 6 Aug 2018 03:57:02 +0000 (11:57 +0800)]
net: avoid unnecessary sock_flag() check when enable timestamp

The sock_flag() check is alreay inside sock_enable_timestamp(), so it is
unnecessary checking it in the caller.

    void sock_enable_timestamp(struct sock *sk, int flag)
    {
        if (!sock_flag(sk, flag)) {
            ...
        }
    }

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovhost: switch to use new message format
Jason Wang [Mon, 6 Aug 2018 03:17:47 +0000 (11:17 +0800)]
vhost: switch to use new message format

We use to have message like:

struct vhost_msg {
int type;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};

Unfortunately, there will be a hole of 32bit in 64bit machine because
of the alignment. This leads a different formats between 32bit API and
64bit API. What's more it will break 32bit program running on 64bit
machine.

So fixing this by introducing a new message type with an explicit
32bit reserved field after type like:

struct vhost_msg_v2 {
__u32 type;
__u32 reserved;
union {
struct vhost_iotlb_msg iotlb;
__u8 padding[64];
};
};

We will have a consistent ABI after switching to use this. To enable
this capability, introduce a new ioctl (VHOST_SET_BAKCEND_FEATURE) for
userspace to enable this feature (VHOST_BACKEND_F_IOTLB_V2).

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/bridge/br_multicast: remove redundant variable "err"
zhong jiang [Mon, 6 Aug 2018 03:07:23 +0000 (11:07 +0800)]
net/bridge/br_multicast: remove redundant variable "err"

The err is not modified after initalization, So remove it and make
it to be void function.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomellanox: fix the dport endianness in call of __inet6_lookup_established()
Al Viro [Sat, 4 Aug 2018 20:41:27 +0000 (21:41 +0100)]
mellanox: fix the dport endianness in call of __inet6_lookup_established()

__inet6_lookup_established() expect th->dport passed in host-endian,
not net-endian.  The reason is microoptimization in __inet6_lookup(),
but if you use the lower-level helpers, you have to play by their
rules...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoieee802154: fakelb: add deprecated msg while probe
Alexander Aring [Sat, 14 Jul 2018 16:33:06 +0000 (12:33 -0400)]
ieee802154: fakelb: add deprecated msg while probe

Since mac802154_hwsim the fakelb driver will get deprecated. This patch will
notifier all users of fakelb to switch to the new mac802154_hwsim driver.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
6 years agoieee802154: hwsim: add replacement for fakelb
Alexander Aring [Sat, 14 Jul 2018 16:33:05 +0000 (12:33 -0400)]
ieee802154: hwsim: add replacement for fakelb

This patch adds a new virtual driver mac802154_hwsim which is based on
the fakelb driver.
The fakelb driver will get deprecated and hopefully removed someday.
The main reason for doing this step is to rename the driver to
mac802154_hwsim to have a similar naming scheme as mac80211_hwsim,
which is more popular in the 802.11 wireless word and the idea is the
same behind this driver.

The new features of this driver are to have knowledge about connected
edges, which can be changed during runtime. This offers a testing
environment for routing protocols e.g. RPL.
The default behaviour is still as fakelb: two radios connected to each
other. New added radios during runtime will not be connected to other
wpan_hwsim instances.

The netlink api is not namespace aware on purpose, only the registered
wpan_phy's can be moved to namespaces. The physical layer according to
wiresless "air" communication can be handled across namespaces.

Furthermore the edges can be weighted with the LQI value according IEEE
802.15.4 which offers additional handling to mark bad or good connection
indicators to other connected virtual phys.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
6 years agonet: ieee802154: 6lowpan: remove redundant pointers 'fq' and 'net'
Colin Ian King [Tue, 31 Jul 2018 15:45:07 +0000 (16:45 +0100)]
net: ieee802154: 6lowpan: remove redundant pointers 'fq' and 'net'

Pointers fq and net are being assigned but are never used hence they
are redundant and can be removed.

Cleans up clang warnings:
warning: variable 'fq' set but not used [-Wunused-but-set-variable]
warning: variable 'net' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
6 years agonet: mac802154: tx: expand tailroom if necessary
Alexander Aring [Mon, 2 Jul 2018 20:32:03 +0000 (16:32 -0400)]
net: mac802154: tx: expand tailroom if necessary

This patch is necessary if case of AF_PACKET or other socket interface
which I am aware of it and didn't allocated the necessary room.

Reported-by: David Palma <david.palma@ntnu.no>
Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
6 years agonet: 6lowpan: fix reserved space for single frames
Alexander Aring [Sat, 14 Jul 2018 16:52:10 +0000 (12:52 -0400)]
net: 6lowpan: fix reserved space for single frames

This patch fixes patch add handling to take care tail and headroom for
single 6lowpan frames. We need to be sure we have a skb with the right
head and tailroom for single frames. This patch do it by using
skb_copy_expand() if head and tailroom is not enough allocated by upper
layer.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195059
Reported-by: David Palma <david.palma@ntnu.no>
Reported-by: Rabi Narayan Sahoo <rabinarayans0828@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
6 years agoMerge remote-tracking branch 'net-next/master'
Stefan Schmidt [Mon, 6 Aug 2018 07:04:48 +0000 (09:04 +0200)]
Merge remote-tracking branch 'net-next/master'

6 years agotc-testing: remove duplicate spaces in skbedit match patterns
Vlad Buslov [Sun, 5 Aug 2018 19:37:09 +0000 (22:37 +0300)]
tc-testing: remove duplicate spaces in skbedit match patterns

Match patterns for some skbedit tests contain duplicate whitespace that is
not present in actual tc output. This causes tests to fail because they
can't match required action, even when it was successfully created.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-testing: remove duplicate spaces in connmark match patterns
Vlad Buslov [Sun, 5 Aug 2018 19:36:44 +0000 (22:36 +0300)]
tc-testing: remove duplicate spaces in connmark match patterns

Match patterns for some connmark tests contain duplicate whitespace that is
not present in actual tc output. This causes tests to fail because they
can't match required action, even when it was successfully created.

Fixes: 1dad0f9ffff7 ("tc-testing: add connmark action tests")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-testing: flush gact actions on test teardown
Vlad Buslov [Sun, 5 Aug 2018 19:36:25 +0000 (22:36 +0300)]
tc-testing: flush gact actions on test teardown

Test 6fb4 creates one mirred and one pipe action, but only flushes mirred
on teardown. Leaking pipe action causes failures in other tests.

Add additional teardown command to also flush gact actions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-testing: fix ip address in u32 test
Vlad Buslov [Sun, 5 Aug 2018 19:35:56 +0000 (22:35 +0300)]
tc-testing: fix ip address in u32 test

Fix expected ip address to actually match configured ip address.
Fix test to expect single matched filter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'wireless-drivers-next-for-davem-2018-08-05' of git://git.kernel.org/pub...
David S. Miller [Mon, 6 Aug 2018 00:36:01 +0000 (17:36 -0700)]
Merge tag 'wireless-drivers-next-for-davem-2018-08-05' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.19

This time a bigger pull request as we have two new Mediatek drivers
MT76x2u (CONFIG_MT76x2U) and MT76x0U (CONFIG_MT76x0U). Also iwlwifi got
support for the new IEEE 802.11ax standard, the successor for
802.11ac. And naturally smaller new features and bugfixes all over.

Major changes:

wcn36xx

* fix WEP in client mode

wil6210

* add support for Talyn-MB (Talyn ver 2.0) device

* add support for enhanced DMA firmware feature

iwlwifi

* implement 802.11ax D2.0

* support for the new 22560 device family

* new PCI IDs for 22000 and 22560

qtnfmac

* implement cfg80211 power management callback

* enable multiple SSIDs scan support

* qtnfmac: implement basic WoWLAN support

mt7601u

* fall back to software encryption for hw unsupported ciphers

* enable 802.11 Management Frame Protection (MFP)

mt76

* support setting RTS threshold

* add USB support

* add support for MT76x2u devices

* add support for MT76x0U devices

mwifiex

* allow user space to set all other IEs except WMM IE

rsi

* add firmware support for AP+BT dual mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Mon, 6 Aug 2018 00:29:27 +0000 (17:29 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-08-05

Here's the main bluetooth-next pull request for the 4.19 kernel.

 - Added support for Bluetooth Advertising Extensions
 - Added vendor driver support to hci_h5 HCI driver
 - Added serdev support to hci_h5 driver
 - Added support for Qualcomm wcn3990 controller
 - Added support for RTL8723BS and RTL8723DS controllers
 - btusb: Added new ID for Realtek 8723DE
 - Several other smaller fixes & cleanups

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Enable-MC-aware-mode-for-mlxsw-ports'
David S. Miller [Mon, 6 Aug 2018 00:28:22 +0000 (17:28 -0700)]
Merge branch 'mlxsw-Enable-MC-aware-mode-for-mlxsw-ports'

Ido Schimmel says:

====================
mlxsw: Enable MC-aware mode for mlxsw ports

Petr says:

Due to an issue in Spectrum chips, when unicast traffic shares the same
queue as BUM traffic, and there is a congestion, the BUM traffic is
admitted to the queue anyway, thus pushing out all UC traffic. In order
to give unicast traffic precedence over BUM traffic, configure
multicast-aware mode on all ports.

Under multicast-aware regime, when assigning traffic class to a packet,
the switch doesn't merely take the value prescribed by the QTCT
register. For BUM traffic, it instead assigns that value plus 8. That
limits the number of available TCs, but since mlxsw currently only uses
the lower eight anyway, it is no real loss.

The two TCs (UC and MC one) are then mapped to the same subgroup and
strictly prioritized so that UC traffic is preferred in case of
congestion.

In patch #1, introduce a new register, QTCTM, which enables the
multicast-aware mode.

In patch #2, fix a typo in related code.

In patch #3, set up TCs and QTCTM to enable multicast-aware mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Configure MC-aware mode on mlxsw ports
Petr Machata [Sun, 5 Aug 2018 06:03:08 +0000 (09:03 +0300)]
mlxsw: spectrum: Configure MC-aware mode on mlxsw ports

In order to give unicast traffic precedence over BUM traffic, configure
multicast-aware mode on all ports.

Under multicast-aware regime, when assigning traffic class to a packet,
the switch doesn't merely take the value prescribed by the QTCT
register. For BUM traffic, it instead assigns that value plus 8.

ETS elements for TCs 8..15 thus need to be configured as well. Extend
mlxsw_sp_port_ets_init() so that it maps each of them to the same
subgroup as their corresponding TC from the range 0..7, such that TCs X
and X+8 map to the same subgroup.

The existing code configures TCs with strict priority. So far this was
immaterial, because each TC had its own subgroup. Now that two TCs share
a subgroup it becomes important. TCs are prioritized in order of 7, 6,
..., 0, 15, 14, ..., 8: the higher TCs used for BUM traffic end up being
deprioritized. Since that's what's needed, keep that configuration as it
is, and configure the new TCs likewise.

Finally in mlxsw_sp_port_create(), invoke configuration of QTCTM to
enable MC-aware mode on each port.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: spectrum: Fix a typo
Petr Machata [Sun, 5 Aug 2018 06:03:07 +0000 (09:03 +0300)]
mlxsw: spectrum: Fix a typo

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: reg: Add QoS Switch Traffic Class Table is Multicast-Aware Register
Petr Machata [Sun, 5 Aug 2018 06:03:06 +0000 (09:03 +0300)]
mlxsw: reg: Add QoS Switch Traffic Class Table is Multicast-Aware Register

This register configures if the Switch Priority to Traffic Class mapping
is based on Multicast packet indication. If so, then multicast packets
will get a Traffic Class that is plus (cap_max_tclass_data/2) the value
configured by QTCT.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovirtio-net: mark expected switch fall-throughs
Gustavo A. R. Silva [Sun, 5 Aug 2018 02:42:05 +0000 (21:42 -0500)]
virtio-net: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1402059 ("Missing break in switch")
Addresses-Coverity-ID: 1402060 ("Missing break in switch")
Addresses-Coverity-ID: 1402061 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: sched: cls_flower: Fix an error code in fl_tmplt_create()
Dan Carpenter [Fri, 3 Aug 2018 19:27:55 +0000 (22:27 +0300)]
net: sched: cls_flower: Fix an error code in fl_tmplt_create()

We forgot to set the error code on this path, so we return NULL instead
of an error pointer.  In the current code kzalloc() won't fail for small
allocations so this doesn't really affect runtime.

Fixes: b95ec7eb3b4d ("net: sched: cls_flower: implement chain templates")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: check extack._msg before print
Li RongQing [Fri, 3 Aug 2018 07:45:21 +0000 (15:45 +0800)]
net: check extack._msg before print

dev_set_mtu_ext is able to fail with a valid mtu value, at that
condition, extack._msg is not set and random since it is in stack,
then kernel will crash when print it.

Fixes: 7a4c53bee3324a ("net: report invalid mtu value via netlink extack")
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: defrag: drop non-last frags smaller than min mtu
Florian Westphal [Fri, 3 Aug 2018 00:22:20 +0000 (02:22 +0200)]
ipv6: defrag: drop non-last frags smaller than min mtu

don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).

v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
    There were concerns that there could be even smaller frags
    generated by intermediate nodes, e.g. on radio networks.

Cc: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ip-Use-rb-trees-for-IP-frag-queue'
David S. Miller [Mon, 6 Aug 2018 00:16:46 +0000 (17:16 -0700)]
Merge branch 'ip-Use-rb-trees-for-IP-frag-queue'

Peter Oskolkov says:

====================
ip: Use rb trees for IP frag queue.

This patchset
 * changes IPv4 defrag behavior to match that of IPv6: overlapping
   fragments now cause the whole IP datagram to be discarded (suggested
   by David Miller): there are no legitimate use cases for overlapping
   fragments;
 * changes IPv4 defrag queue from a list to a rb tree (suggested
   by Eric Dumazet): this change removes a potential attach vector.

Upcoming patches will contain similar changes for IPv6 frag queue,
as well as a comprehensive IP defrag self-test (temporarily delayed).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip: use rb trees for IP frag queue.
Peter Oskolkov [Thu, 2 Aug 2018 23:34:39 +0000 (23:34 +0000)]
ip: use rb trees for IP frag queue.

Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn <jannh@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: modify skb_rbtree_purge to return the truesize of all purged skbs.
Peter Oskolkov [Thu, 2 Aug 2018 23:34:38 +0000 (23:34 +0000)]
net: modify skb_rbtree_purge to return the truesize of all purged skbs.

Tested: see the next patch is the series.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip: discard IPv4 datagrams with overlapping segments.
Peter Oskolkov [Thu, 2 Aug 2018 23:34:37 +0000 (23:34 +0000)]
ip: discard IPv4 datagrams with overlapping segments.

This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Mark the end in scatterlist table
Vakul Garg [Thu, 2 Aug 2018 15:13:10 +0000 (20:43 +0530)]
net/tls: Mark the end in scatterlist table

Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
adding new entries in it. The last entry in sgtable remained unmarked.
This results in KASAN error report on using apis like sg_nents(). Before
returning, the function needs to mark the 'end' in the last entry it
adds.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: icmp: Updating pmtu for link local route
Georg Kohmann [Thu, 2 Aug 2018 11:56:58 +0000 (13:56 +0200)]
ipv6: icmp: Updating pmtu for link local route

When a ICMPV6_PKT_TOOBIG is received from a link local address the pmtu will
be updated on a route with an arbitrary interface index. Subsequent packets
sent back to the same link local address may therefore end up not
considering the updated pmtu.

Current behavior breaks TAHI v6LC4.1.4 Reduce PMTU On-link. Referring to RFC
1981: Section 3: "Note that Path MTU Discovery must be performed even in
cases where a node "thinks" a destination is attached to the same link as
itself. In a situation such as when a neighboring router acts as proxy [ND]
for some destination, the destination can to appear to be directly
connected but is in fact more than one hop away."

Using the interface index from the incoming ICMPV6_PKT_TOOBIG when updating
the pmtu.

Signed-off-by: Georg Kohmann <geokohma@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoptp_qoriq: support automatic configuration for ptp timer
Yangbo Lu [Wed, 1 Aug 2018 10:05:54 +0000 (18:05 +0800)]
ptp_qoriq: support automatic configuration for ptp timer

This patch is to support automatic configuration for ptp timer.
If required ptp dts properties are not provided, driver could
try to calculate a set of default configurations to initialize
the ptp timer. This makes the driver work for many boards which
don't have the required ptp dts properties in current kernel.
Also the users could set dts properties by themselves according
to their requirement.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agopowerpc/mpc85xx: add clocks property for fman ptp timer node
Yangbo Lu [Wed, 1 Aug 2018 10:05:53 +0000 (18:05 +0800)]
powerpc/mpc85xx: add clocks property for fman ptp timer node

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoarm64: dts: fsl: add clocks property for fman ptp timer node
Yangbo Lu [Wed, 1 Aug 2018 10:05:52 +0000 (18:05 +0800)]
arm64: dts: fsl: add clocks property for fman ptp timer node

This patch is to add clocks property for fman ptp timer node.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bnxt_en-Updates-for-net-next'
David S. Miller [Mon, 6 Aug 2018 00:08:27 +0000 (17:08 -0700)]
Merge branch 'bnxt_en-Updates-for-net-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

This series includes the usual firmware spec update.  The driver has
added external phy loopback test and phy setup retry logic that is
needed during hotplug.  In the SRIOV space, the driver has added a
new VF resource allocation mode that requires the VF driver to
reserve resources during IFUP.  IF state changes are now propagated
to firmware so that firmware can release some resources during IFDOWN.

ethtool method to get firmware core dump and hwmon temperature reading
have been added.  DSCP to user priority support has been added to
the driver's DCBNL interface, and the CoS queue logic has been refined
to make sure that the special RDMA Congestion Notification hardware CoS
queue will not be used for networking traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Do not use the CNP CoS queue for networking traffic.
Michael Chan [Sun, 5 Aug 2018 20:51:58 +0000 (16:51 -0400)]
bnxt_en: Do not use the CNP CoS queue for networking traffic.

The CNP CoS queue is reserved for internal RDMA Congestion Notification
Packets (CNP) and should not be used for a TC.  Modify the CoS queue
discovery code to skip over the CNP CoS queue and to reduce
bp->max_tc accordingly.  However, if RDMA is disabled in NVRAM, the
the CNP CoS queue can be used for a TC.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add DCBNL DSCP application protocol support.
Michael Chan [Sun, 5 Aug 2018 20:51:57 +0000 (16:51 -0400)]
bnxt_en: Add DCBNL DSCP application protocol support.

Expand the .ieee_setapp() and ieee_delapp() DCBNL methods to support
DSCP.  This allows DSCP values to user priority mappings instead
of using VLAN priorities.  Each DSCP mapping is added or deleted one
entry at a time using the firmware API.  The firmware call can only be
made from a PF.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add hwmon sysfs support to read temperature
Vasundhara Volam [Sun, 5 Aug 2018 20:51:56 +0000 (16:51 -0400)]
bnxt_en: Add hwmon sysfs support to read temperature

Export temperature sensor reading via hwmon sysfs.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Notify firmware about IF state changes.
Michael Chan [Sun, 5 Aug 2018 20:51:55 +0000 (16:51 -0400)]
bnxt_en: Notify firmware about IF state changes.

Use latest firmware API to notify firmware about IF state changes.
Firmware has the option to clean up resources during IF down and
to require the driver to reserve resources again during IF up.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Move firmware related flags to a new fw_cap field in struct bnxt.
Michael Chan [Sun, 5 Aug 2018 20:51:54 +0000 (16:51 -0400)]
bnxt_en: Move firmware related flags to a new fw_cap field in struct bnxt.

The flags field is almost getting full.  Move firmware capability flags
to a new fw_cap field to better organize these firmware flags.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add BNXT_NEW_RM() macro.
Michael Chan [Sun, 5 Aug 2018 20:51:53 +0000 (16:51 -0400)]
bnxt_en: Add BNXT_NEW_RM() macro.

The BNXT_FLAG_NEW_RM flag is checked a lot in the code to determine if
the new resource manager is in effect.  Define a macro to perform
this check.

Signed-off-by: Michael Chan <michael.chan@broadocm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add support for ethtool get dump.
Vasundhara Volam [Sun, 5 Aug 2018 20:51:52 +0000 (16:51 -0400)]
bnxt_en: Add support for ethtool get dump.

Add support to collect live firmware coredump via ethtool.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.
Michael Chan [Sun, 5 Aug 2018 20:51:51 +0000 (16:51 -0400)]
bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.

Set the default hash mode flag in HWRM_VNIC_RSS_CFG to signal to the
firmware that the driver is compliant with the latest spec.  With
that, the firmware can return expanded RSS profile IDs that the driver
checks to setup the proper gso_type for GRO-HW packets.  But instead
of checking for the new profile IDs, we check the IP_TYPE flag
in TPA_START which is more straight forward than checking a list of
profile IDs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add new VF resource allocation strategy mode.
Michael Chan [Sun, 5 Aug 2018 20:51:50 +0000 (16:51 -0400)]
bnxt_en: Add new VF resource allocation strategy mode.

The new mode is "minimal-static" to be used when resources are more
limited to support a large number of VFs, for example  The PF driver
will provision guaranteed minimum resources of 0.  Each VF has no
guranteed resources until it tries to reserve resources during device
open.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add PHY retry logic.
Michael Chan [Sun, 5 Aug 2018 20:51:49 +0000 (16:51 -0400)]
bnxt_en: Add PHY retry logic.

During hotplug, the driver's open function can be called almost
immediately after power on reset.  The PHY may not be ready and the
firmware may return failure when the driver tries to update PHY
settings.  Add retry logic fired from the driver's timer to retry
the operation for 5 seconds.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Add external loopback test to ethtool selftest.
Michael Chan [Sun, 5 Aug 2018 20:51:48 +0000 (16:51 -0400)]
bnxt_en: Add external loopback test to ethtool selftest.

Add code to detect firmware support for external loopback and the extra
test entry for external loopback.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Adjust timer based on ethtool stats-block-usecs settings.
Michael Chan [Sun, 5 Aug 2018 20:51:47 +0000 (16:51 -0400)]
bnxt_en: Adjust timer based on ethtool stats-block-usecs settings.

The driver gathers statistics using 2 mechanisms.  Some stats are DMA'ed
directly from hardware and others are polled from the driver's timer.
Currently, we only adjust the DMA frequency based on the ethtool
stats-block-usecs setting.  This patch adjusts the driver's timer
frequency as well to make everything consistent.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Update firmware interface version to 1.9.2.25.
Michael Chan [Sun, 5 Aug 2018 20:51:46 +0000 (16:51 -0400)]
bnxt_en: Update firmware interface version to 1.9.2.25.

New interface has firmware core dump support, new extended port
statistics, and IF state change notifications to the firmware.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Sun, 5 Aug 2018 23:25:22 +0000 (16:25 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree:

1) Support for transparent proxying for nf_tables, from Mate Eckl.

2) Patchset to add OS passive fingerprint recognition for nf_tables,
   from Fernando Fernandez. This takes common code from xt_osf and
   place it into the new nfnetlink_osf module for codebase sharing.

3) Lightweight tunneling support for nf_tables.

4) meta and lookup are likely going to be used in rulesets, make them
   direct calls. From Florian Westphal.

A bunch of incremental updates:

5) use PTR_ERR_OR_ZERO() from nft_numgen, from YueHaibing.

6) Use kvmalloc_array() to allocate hashtables, from Li RongQing.

7) Explicit dependencies between nfnetlink_cttimeout and conntrack
   timeout extensions, from Harsha Sharma.

8) Simplify NLM_F_CREATE handling in nf_tables.

9) Removed unused variable in the get element command, from
   YueHaibing.

10) Expose bridge hook priorities through uapi, from Mate Eckl.

And a few fixes for previous Netfilter batch for net-next:

11) Use per-netns mutex from flowtable event, from Florian Westphal.

12) Remove explicit dependency on iptables CT target from conntrack
    zones, from Florian.

13) Fix use-after-free in rmmod nf_conntrack path, also from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 5 Aug 2018 20:04:31 +0000 (13:04 -0700)]
Merge ra./pub/scm/linux/kernel/git/davem/net

Lots of overlapping changes, mostly trivial in nature.

The mlxsw conflict was resolving using the example
resolution at:

https://github.com/jpirko/linux_mlxsw/blob/combined_queue/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoLinux 4.18-rc8
Linus Torvalds [Sun, 5 Aug 2018 19:37:41 +0000 (12:37 -0700)]
Linux 4.18-rc8

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:39:30 +0000 (09:39 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "A single fix, which addresses boot failures on machines which do not
  report EBDA correctly, which can place the trampoline into reserved
  memory regions. Validating against E820 prevents that"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/compressed/64: Validate trampoline placement against E820

6 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:25:29 +0000 (09:25 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two oneliners addressing NOHZ failures:

   - Use a bitmask to check for the pending timer softirq and not the
     bit number. The existing code using the bit number checked for
     the wrong bit, which caused timers to either expire late or stop
     completely.

   - Make the nohz evaluation on interrupt exit more robust. The
     existing code did not re-arm the hardware when interrupting a
     running softirq in task context (ksoftirqd or tail of
     local_bh_enable()), which caused timers to either expire late
     or stop completely"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Fix missing tick reprogram when interrupting an inline softirq
  nohz: Fix local_timer_softirq_pending()

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 16:13:07 +0000 (09:13 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A set of fixes for perf:

  Kernel side:

   - Fix the hardcoded index of extra PCI devices on Broadwell which
     caused a resource conflict and triggered warnings on CPU hotplug.

  Tooling:

   - Update the tools copy of several files, including perf_event.h,
     powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
     memcpy_64.s (used in 'perf bench mem'), silencing the respective
     warnings during the perf tools build.

   - Fix the build on the alpine:edge distro"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
  perf tools: Fix the build on the alpine:edge distro
  tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
  tools headers uapi: Refresh linux/bpf.h copy
  tools headers powerpc: Update asm/unistd.h copy to pick new
  tools headers uapi: Update tools's copy of linux/perf_event.h

6 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 Aug 2018 15:55:26 +0000 (08:55 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single bugfix for the irq core to prevent silent data corruption and
  malfunction of threaded interrupts under certain conditions"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Make force irq threading setup more robust

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sun, 5 Aug 2018 15:20:39 +0000 (08:20 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Handle frames in error situations properly in AF_XDP, from Jakub
    Kicinski.

 2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
    Maninder Singh.

 3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.

 4) Fix regression in netlink bind handling of multicast gruops, from
    Dmitry Safonov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netlink: Don't shift on 64 for ngroups
  net/smc: no cursor update send in state SMC_INIT
  l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
  mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
  mlxsw: core_acl_flex_actions: Remove redundant counter destruction
  mlxsw: core_acl_flex_actions: Remove redundant resource destruction
  mlxsw: core_acl_flex_actions: Return error for conflicting actions
  selftests/bpf: update test_lwt_seg6local.sh according to iproute2
  drivers: net: lmc: fix case value for target abort error
  selftest/net: fix protocol family to work for IPv4.
  net: xsk: don't return frames via the allocator on error
  tools/bpftool: fix a percpu_array map dump problem

6 years agoMerge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 Aug 2018 01:34:55 +0000 (18:34 -0700)]
Merge tag 'usercopy-fix-v4.18-rc8' of git://git./linux/kernel/git/kees/linux

Pull usercopy whitelisting fix from Kees Cook:
 "Bart Massey discovered that the usercopy whitelist for JFS was
  incomplete: the inline inode data may intentionally "overflow" into
  the neighboring "extended area", so the size of the whitelist needed
  to be raised to include the neighboring field"

* tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  jfs: Fix usercopy whitelist for inline inode data

6 years agoMerge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sun, 5 Aug 2018 01:30:58 +0000 (18:30 -0700)]
Merge tag 'xfs-4.18-fixes-5' of git://git./fs/xfs/xfs-linux

Pull xfs bugfix from Darrick Wong:
 "One more patch for 4.18 to fix a coding error in the iomap_bmap()
  function introduced in -rc1: fix incorrect shifting"

* tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs: fix iomap_bmap position calculation

6 years agoPartially revert "block: fail op_is_write() requests to read-only partitions"
Linus Torvalds [Fri, 3 Aug 2018 19:22:09 +0000 (12:22 -0700)]
Partially revert "block: fail op_is_write() requests to read-only partitions"

It turns out that commit 721c7fc701c7 ("block: fail op_is_write()
requests to read-only partitions"), while obviously correct, causes
problems for some older lvm2 installations.

The reason is that the lvm snapshotting will continue to write to the
snapshow COW volume, even after the volume has been marked read-only.
End result: snapshot failure.

This has actually been fixed in newer version of the lvm2 tool, but the
old tools still exist, and the breakage was reported both in the kernel
bugzilla and in the Debian bugzilla:

  https://bugzilla.kernel.org/show_bug.cgi?id=200439
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442

The lvm2 fix is here

  https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3

but until everybody has updated to recent versions, we'll have to weaken
the "never write to read-only partitions" check.  It now allows the
write to happen, but causes a warning, something like this:

  generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
  Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
  CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
  Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
  Workqueue: ksnaphd do_metadata
  RIP: 0010:generic_make_request_checks+0x4ac/0x600
  ...
  Call Trace:
   generic_make_request+0x64/0x400
   submit_bio+0x6c/0x140
   dispatch_io+0x287/0x430
   sync_io+0xc3/0x120
   dm_io+0x1f8/0x220
   do_metadata+0x1d/0x30
   process_one_work+0x1b9/0x3e0
   worker_thread+0x2b/0x3c0
   kthread+0x113/0x130
   ret_from_fork+0x35/0x40

Note that this is a "revert" in behavior only.  I'm leaving alone the
actual code cleanups in commit 721c7fc701c7, but letting the previously
uncaught request go through with a warning instead of stopping it.

Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions")
Reported-and-tested-by: WGH <wgh@torlan.ru>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonetlink: Don't shift on 64 for ngroups
Dmitry Safonov [Sun, 5 Aug 2018 00:35:53 +0000 (01:35 +0100)]
netlink: Don't shift on 64 for ngroups

It's legal to have 64 groups for netlink_sock.

As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.

The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.

Fixes: 61f4b23769f0 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
David S. Miller [Sun, 5 Aug 2018 00:51:55 +0000 (17:51 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-08-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix bpftool percpu_array dump by using correct roundup to next
   multiple of 8 for the value size, from Yonghong.

2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
   allocator since driver will recycle frame anyway in case of an
   error, from Jakub.

3) Fix up BPF test_lwt_seg6local test cases to final iproute2
   syntax, from Mathieu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoethtool: Remove trailing semicolon for static inline
Florian Fainelli [Sat, 4 Aug 2018 21:20:40 +0000 (14:20 -0700)]
ethtool: Remove trailing semicolon for static inline

Android's header sanitization tool chokes on static inline functions having a
trailing semicolon, leading to an incorrectly parsed header file. While the
tool should obviously be fixed, also fix the header files for the two affected
functions: ethtool_get_flow_spec_ring() and ethtool_get_flow_spec_ring_vf().

Fixes: 8cf6f497de40 ("ethtool: Add helper routines to pass vf to rx_flow_spec")
Reporetd-by: Blair Prescott <blair.prescott@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoBluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV
Johan Hedberg [Sat, 4 Aug 2018 20:40:26 +0000 (23:40 +0300)]
Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV

This driver was recently updated to use serdev, so add the appropriate
dependency. Without this one can get compiler warnings like this if
CONFIG_SERIAL_DEV_BUS is not enabled:

  CC [M]  drivers/bluetooth/hci_h5.o
drivers/bluetooth/hci_h5.c:934:36: warning: ‘h5_serdev_driver’ defined but not used [-Wunused-variable]
 static struct serdev_device_driver h5_serdev_driver = {
                                    ^~~~~~~~~~~~~~~~

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
6 years agoMerge branch 'net-ARRAY_SIZE'
David S. Miller [Sat, 4 Aug 2018 20:23:15 +0000 (13:23 -0700)]
Merge branch 'net-ARRAY_SIZE'

zhong jiang says:

====================
Use ARRAY_SIZE to replace computing the size

zhong jiang (2):
  net:usb: Use ARRAY_SIZE instead of calculating the array size
  include/net/bond_3ad: Simplify the code by using the ARRAY_SIZE
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoinclude/net/bond_3ad: Simplify the code by using the ARRAY_SIZE
zhong jiang [Fri, 3 Aug 2018 06:53:15 +0000 (14:53 +0800)]
include/net/bond_3ad: Simplify the code by using the ARRAY_SIZE

We prefer to ARRAY_SIZE rather than the open code to calculate size.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet:usb: Use ARRAY_SIZE instead of calculating the array size
zhong jiang [Fri, 3 Aug 2018 06:53:14 +0000 (14:53 +0800)]
net:usb: Use ARRAY_SIZE instead of calculating the array size

We use ARRAY_SIZE to replace open code sizeof(lan78xx_regs) / sizeof(u32).
It make the code concise.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotun: not use hardcoded mask value
Li RongQing [Fri, 3 Aug 2018 07:50:02 +0000 (15:50 +0800)]
tun: not use hardcoded mask value

0x3ff in tun_hashfn is mask of TUN_NUM_FLOW_ENTRIES, instead
of hardcode, define a macro to setup the relationship with
TUN_NUM_FLOW_ENTRIES

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/smc: no cursor update send in state SMC_INIT
Ursula Braun [Fri, 3 Aug 2018 08:38:33 +0000 (10:38 +0200)]
net/smc: no cursor update send in state SMC_INIT

If a writer blocked condition is received without data, the current
consumer cursor is immediately sent. Servers could already receive this
condition in state SMC_INIT without finished tx-setup. This patch
avoids sending a consumer cursor update in this case.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: cisco: enic: Replace GFP_ATOMIC with GFP_KERNEL
Jia-Ju Bai [Sat, 4 Aug 2018 00:40:09 +0000 (08:40 +0800)]
net: cisco: enic: Replace GFP_ATOMIC with GFP_KERNEL

vnic_dev_register(), vnic_rq_alloc_bufs() and vnic_wq_alloc_bufs()
are never called in atomic context.
They call kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Remove some unneeded semicolon
zhong jiang [Sat, 4 Aug 2018 11:41:41 +0000 (19:41 +0800)]
net: Remove some unneeded semicolon

These semicolons are not needed.  Just remove them.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonfp: bpf: xdp_adjust_tail support
Jakub Kicinski [Sat, 4 Aug 2018 05:06:00 +0000 (22:06 -0700)]
nfp: bpf: xdp_adjust_tail support

Add support for adjust_tail.  There are no FW changes needed but add
a FW capability just in case there would be any issue with previously
released FW, or we will have to change the ABI in the future.

The helper is trivial and shouldn't be used too often so just inline
the body of the function.  We add the delta to locally maintained
packet length register and check for overflow, since add of negative
value must overflow if result is positive.  Note that if delta of 0
would be allowed in the kernel this trick stops working and we need
one more instruction to compare lengths before and after the change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agojfs: Fix usercopy whitelist for inline inode data
Kees Cook [Fri, 3 Aug 2018 19:52:58 +0000 (12:52 -0700)]
jfs: Fix usercopy whitelist for inline inode data

Bart Massey reported what turned out to be a usercopy whitelist false
positive in JFS when symlink contents exceeded 128 bytes. The inline
inode data (i_inline) is actually designed to overflow into the "extended
area" following it (i_inline_ea) when needed. So the whitelist needed to
be expanded to include both i_inline and i_inline_ea (the whole size
of which is calculated internally using IDATASIZE, 256, instead of
sizeof(i_inline), 128).

$ cd /mnt/jfs
$ touch $(perl -e 'print "B" x 250')
$ ln -s B* b
$ ls -l >/dev/null

[  249.436410] Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'jfs_ip' (offset 616, size 250)!

Reported-by: Bart Massey <bart.massey@gmail.com>
Fixes: 8d2704d382a9 ("jfs: Define usercopy region in jfs_ip slab cache")
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
6 years agomt76x0: rename trace symbols
Stanislaw Gruszka [Fri, 3 Aug 2018 11:44:40 +0000 (13:44 +0200)]
mt76x0: rename trace symbols

Rename trace symbols that conflict with mt7601u and remove some
definitions that are not used.

Patch fixes build errors like this:
ld: drivers/net/wireless/mediatek/mt76/mt76x0/trace.o:(__tracepoints+0x0): multiple definition of `__tracepoint_set_shared_key'; drivers/net/wireless/mediatek/mt7601u/trace.o:(__tracepoints+0x0): first defined here

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 7b4859026ccd ("mt76x0: core files")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agomt76x0: rename mt76_* functions
Stanislaw Gruszka [Fri, 3 Aug 2018 11:44:39 +0000 (13:44 +0200)]
mt76x0: rename mt76_* functions

mt76_* functions conflicts with mt7601u driver what prevents to build
those drivers in the kernel or use both drivers modules at once.

Patch fixes build errors like this:
ld: drivers/net/wireless/mediatek/mt76/mt76x0/mac.o:(.opd+0x30): multiple definition of `mt76_mac_tx_rate_val'; drivers/net/wireless/mediatek/mt7601u/mac.o:(.opd+0x30): first defined here

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 7b4859026ccd ("mt76x0: core files")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
6 years agotcp: remove unneeded variable 'err'
YueHaibing [Fri, 3 Aug 2018 08:28:48 +0000 (16:28 +0800)]
tcp: remove unneeded variable 'err'

variable 'err' is unmodified after initalization,
so simply cleans up it and returns 0.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoaf_unix: ensure POLLOUT on remote close() for connected dgram socket
Jason Baron [Fri, 3 Aug 2018 21:24:53 +0000 (17:24 -0400)]
af_unix: ensure POLLOUT on remote close() for connected dgram socket

Applications use -ECONNREFUSED as returned from write() in order to
determine that a socket should be closed. However, when using connected
dgram unix sockets in a poll/write loop, a final POLLOUT event can be
missed when the remote end closes. Thus, the poll is stuck forever:

          thread 1 (client)                   thread 2 (server)

connect() to server
write() returns -EAGAIN
unix_dgram_poll()
 -> unix_recvq_full() is true
                                       close()
                                        ->unix_release_sock()
                                         ->wake_up_interruptible_all()
unix_dgram_poll() (due to the
     wake_up_interruptible_all)
 -> unix_recvq_full() still is true
                                         ->free all skbs

Now thread 1 is stuck and will not receive anymore wakeups. In this
case, when thread 1 gets the -EAGAIN, it has not queued any skbs
otherwise the 'free all skbs' step would in fact cause a wakeup and
a POLLOUT return. So the race here is probably fairly rare because
it means there are no skbs that thread 1 queued and that thread 1
schedules before the 'free all skbs' step.

This issue was reported as a hang when /dev/log is closed.

The fix is to signal POLLOUT if the socket is marked as SOCK_DEAD, which
means a subsequent write() will get -ECONNREFUSED.

Reported-by: Ian Lance Taylor <iant@golang.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetfilter: nft_tunnel: fix sparse errors
Pablo Neira Ayuso [Fri, 3 Aug 2018 22:31:48 +0000 (00:31 +0200)]
netfilter: nft_tunnel: fix sparse errors

[...]
net/netfilter/nft_tunnel.c:117:25:    expected unsigned int [unsigned] [usertype] flags
net/netfilter/nft_tunnel.c:117:25:    got restricted __be16 [usertype] <noident>
[...]
net/netfilter/nft_tunnel.c:246:33:    expected restricted __be16 [addressable] [assigned] [usertype] tp_dst
net/netfilter/nft_tunnel.c:246:33:    got int

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 3 Aug 2018 20:43:59 +0000 (13:43 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Two vmx bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: vmx: fix vpid leak
  KVM: vmx: use local variable for current_vmptr when emulating VMPTRST

6 years agoppp: mppe: Remove VLA usage
Kees Cook [Fri, 3 Aug 2018 16:37:45 +0000 (09:37 -0700)]
ppp: mppe: Remove VLA usage

In the quest to remove all stack VLA usage from the kernel[1], this
removes the discouraged use of AHASH_REQUEST_ON_STACK (and associated
VLA) by switching to shash directly and keeping the associated descriptor
allocated with the regular state on the heap.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agorxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to caller
David Howells [Fri, 3 Aug 2018 16:06:56 +0000 (17:06 +0100)]
rxrpc: Push iov_iter up from rxrpc_kernel_recv_data() to caller

Push iov_iter up from rxrpc_kernel_recv_data() to its caller to allow
non-contiguous iovs to be passed down, thereby permitting file reading to
be simplified in the AFS filesystem in a future patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
Guillaume Nault [Fri, 3 Aug 2018 15:00:11 +0000 (17:00 +0200)]
l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()

If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().

Fixes: ecd012e45ab5 ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlxsw-Fix-ACL-actions-error-condition-handling'
David S. Miller [Fri, 3 Aug 2018 19:28:02 +0000 (12:28 -0700)]
Merge branch 'mlxsw-Fix-ACL-actions-error-condition-handling'

Ido Schimmel says:

====================
mlxsw: Fix ACL actions error condition handling

Nir says:

Two issues were lately noticed within mlxsw ACL actions error condition
handling. The first patch deals with conflicting actions such as:

 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress redirect dev swp4

The second action will never execute, however SW model allows this
configuration, while the mlxsw driver cannot allow for it as it
implements actions in sets of up to three actions per set with a single
termination marking. Conflicting actions create a contradiction over
this single marking and thus cannot be configured. The fix replaces a
misplaced warning with an error code to be returned.

Patches 2-4 fix a condition of duplicate destruction of resources. Some
actions require allocation of specific resource prior to setting the
action itself. On error condition this resource was destroyed twice,
leading to a crash when using mirror action, and to a redundant
destruction in other cases, since for error condition rule destruction
also takes care of resource destruction. In order to fix this state a
symmetry in behavior is added and resource destruction also takes care
of removing the resource from rule's resource list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:44 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction

In previous patch mlxsw_afa_resource_del() was added to avoid a duplicate
resource detruction scenario.
For mirror actions, such duplicate destruction leads to a crash as in:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent ffff: \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add a call to mlxsw_afa_resource_del() in
mlxsw_afa_mirror_destroy() in order to clear that resource
from rule's resources.

Fixes: d0d13c1858a1 ("mlxsw: spectrum_acl: Add support for mirror action")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant counter destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:43 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant counter destruction

Each tc flower rule uses a hidden count action. As counter resource may
not be available due to limited HW resources, update _counter_create()
and _counter_destroy() pair to follow previously introduced symmetric
error condition handling, add a call to mlxsw_afa_resource_del() as part
of the counter resource destruction.

Fixes: c18c1e186ba8 ("mlxsw: core: Make counter index allocated inside the action append")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Remove redundant resource destruction
Nir Dotan [Fri, 3 Aug 2018 12:57:42 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Remove redundant resource destruction

Some ACL actions require the allocation of a separate resource
prior to applying the action itself. When facing an error condition
during the setup phase of the action, resource should be destroyed.
For such actions the destruction was done twice which is dangerous
and lead to a potential crash.
The destruction took place first upon error on action setup phase
and then as the rule was destroyed.

The following sequence generated a crash:

 # tc qdisc add dev swp49 ingress
 # tc filter add dev swp49 parent ffff: \
   protocol ip chain 100 pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action drop
 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 action goto chain 100 \
   action mirred egress mirror dev swp4

Therefore add mlxsw_afa_resource_del() as a complement of
mlxsw_afa_resource_add() to add symmetry to resource_list membership
handling. Call this from mlxsw_afa_fwd_entry_ref_destroy() to make the
_fwd_entry_ref_create() and _fwd_entry_ref_destroy() pair of calls a
NOP.

Fixes: 140ce421217e ("mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomlxsw: core_acl_flex_actions: Return error for conflicting actions
Nir Dotan [Fri, 3 Aug 2018 12:57:41 +0000 (15:57 +0300)]
mlxsw: core_acl_flex_actions: Return error for conflicting actions

Spectrum switch ACL action set is built in groups of three actions
which may point to additional actions. A group holds a single record
which can be set as goto record for pointing at a following group
or can be set to mark the termination of the lookup. This is perfectly
adequate for handling a series of actions to be executed on a packet.
While the SW model allows configuration of conflicting actions
where it is clear that some actions will never execute, the mlxsw
driver must block such configurations as it creates a conflict
over the single terminate/goto record value.

For a conflicting actions configuration such as:

 # tc filter add dev swp49 parent ffff: \
   protocol ip pref 10 \
   flower skip_sw dst_ip 192.168.101.1 \
   action goto chain 100 \
   action mirred egress mirror dev swp4

Where it is clear that the last action will never execute, the
mlxsw driver was issuing a warning instead of returning an error.
Therefore replace that warning with an error for this specific
case.

Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetfilter: conntrack: avoid use-after free on rmmod
Florian Westphal [Fri, 3 Aug 2018 16:40:21 +0000 (18:40 +0200)]
netfilter: conntrack: avoid use-after free on rmmod

When the conntrack module is removed, we call nf_ct_iterate_destroy via
nf_ct_l4proto_unregister().

Problem is that nf_conntrack_proto_fini() gets called after the
conntrack hash table has already been freed.

Just remove the l4proto unregister call, its unecessary as the
nf_ct_protos[] array gets free'd right after anyway.

v2: add comment wrt. missing unreg call.

Fixes: a0ae2562c6c4b2 ("netfilter: conntrack: remove l3proto abstraction")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: kconfig: remove ct zone/label dependencies
Florian Westphal [Fri, 3 Aug 2018 15:56:12 +0000 (17:56 +0200)]
netfilter: kconfig: remove ct zone/label dependencies

connection tracking zones currently depend on the xtables CT target.
The reasoning was that it makes no sense to support zones if they can't
be configured (which needed CT target).

Nowadays zones can also be used by OVS and configured via nftables,
so remove the dependency.

connection tracking labels are handled via hidden dependency that gets
auto-selected by the connlabel match.
Make it a visible knob, as labels can be attached via ctnetlink
or via nftables rules (nft_ct expression) too.

This allows to use conntrack labels and zones with nftables-only build.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: simplify NLM_F_CREATE handling
Pablo Neira Ayuso [Fri, 3 Aug 2018 11:35:36 +0000 (13:35 +0200)]
netfilter: nf_tables: simplify NLM_F_CREATE handling

* From nf_tables_newchain(), codepath provides context that allows us to
  infer if we are updating a chain (in that case, no module autoload is
  required) or adding a new one (then, module autoload is indeed
  needed).
* We only need it in one single spot in nf_tables_newrule().
* Not needed for nf_tables_newset() at all.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: bridge: Expose nf_tables bridge hook priorities through uapi
Máté Eckl [Fri, 3 Aug 2018 11:36:13 +0000 (13:36 +0200)]
netfilter: bridge: Expose nf_tables bridge hook priorities through uapi

Netfilter exposes standard hook priorities in case of ipv4, ipv6 and
arp but not in case of bridge.

This patch exposes the hook priority values of the bridge family (which are
different from the formerly mentioned) via uapi so that they can be used by
user-space applications just like the others.

Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: match on tunnel metadata
Pablo Neira Ayuso [Thu, 2 Aug 2018 18:51:46 +0000 (20:51 +0200)]
netfilter: nf_tables: match on tunnel metadata

This patch allows us to match on the tunnel metadata that is available
of the packet. We can use this to validate if the packet comes from/goes
to tunnel and the corresponding tunnel ID.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: add tunnel support
Pablo Neira Ayuso [Thu, 2 Aug 2018 18:51:39 +0000 (20:51 +0200)]
netfilter: nf_tables: add tunnel support

This patch implements the tunnel object type that can be used to
configure tunnels via metadata template through the existing lightweight
API from the ingress path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoMerge branch 'dsa-systemport-WoL'
David S. Miller [Fri, 3 Aug 2018 19:11:43 +0000 (12:11 -0700)]
Merge branch 'dsa-systemport-WoL'

Florian Fainelli says:

====================
net: dsa and systemport WoL changes

This patch series extracts what was previously submitted as part of the
"WAKE_FILTER" Wake-on-LAN patch series into patches that do not.

Changes in this series:

- properly align the dsa_is_cpu_port() check in first patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: systemport: Create helper to set MPD
Florian Fainelli [Fri, 3 Aug 2018 18:08:44 +0000 (11:08 -0700)]
net: systemport: Create helper to set MPD

Create a helper function to turn on/off MPD, this will be used to avoid
duplicating code as we are going to add additional types of wake-up
types.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: systemport: Do not re-configure upon WoL interrupt
Florian Fainelli [Fri, 3 Aug 2018 18:08:43 +0000 (11:08 -0700)]
net: systemport: Do not re-configure upon WoL interrupt

We already properly resume from Wake-on-LAN whether such a condition
occured or not, no need to process the WoL interrupt for functional
changes since that could race with other settings.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>