git.openwrt.org Git - openwrt/staging/blogic.git/log

tuntap: fix possible deadlock when fail to register netdev

Private destructor could be called when register_netdev() fail with
rtnl lock held. This will lead deadlock in tun_free_netdev() who tries
to hold rtnl_lock. Fixing this by switching to use spinlock to
synchronize.

Fixes: 96f84061620c ("tun: add eBPF based queue selection method")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-fixes-next'

Ursula Braun says:

====================
smc: fixes 2017-12-07

here are some smc-patches. The initial 4 patches are cleanups.
Patch 5 gets rid of ib_post_sends in tasklet context to avoid peer drops due
to out-of-order receivals.
Patch 6 makes sure, the Linux SMC code understands variable sized CLC proposal
messages built according to RFC7609.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

smc: support variable CLC proposal messages

According to RFC7609 [1] the CLC proposal message contains an area of
unknown length for future growth. Additionally it may contain up to
8 IPv6 prefixes. The current version of the SMC-code does not
understand CLC proposal messages using these variable length fields and,
thus, is incompatible with SMC implementations in other operating
systems.

This patch makes sure, SMC understands incoming CLC proposals
* with arbitrary length values for future growth
* with up to 8 IPv6 prefixes

[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: no consumer update in tasklet context

The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
When receiving a blocked signal from the sender, this update is sent
already in tasklet context. In addition consumer cursor updates are
sent after data receival.
Sending of cursor updates is controlled by sequence numbers.
Assuming receiving stray messages the receiver drops updates with older
sequence numbers than an already received cursor update with a higher
sequence number.
Sending consumer cursor updates in tasklet context may result in
wrong order sends and its corresponding drops at the receiver. Since
it is sufficient to send consumer cursor updates once the data is
received, this patch gets rid of the consumer cursor update in tasklet
context to guarantee in-sequence arrival of cursor updates.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: cleanup close checking during data receival

When waiting for data to be received it must be checked if the
peer signals shutdown. The SMC code uses two different checks
for this purpose, even though just one check is sufficient.
This patch removes the superfluous test for SOCK_DONE.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: no update for unused sk_write_pending

The smc code never checks the sk_write_pending sock field.
Thus there is no need to update it.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: improve smc_clc_send_decline() error handling

Let smc_clc_send_decline() return with an error, if the amount
sent is smaller than the length of an smc decline message.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc: make smc_close_active_abort() static

smc_close_active_abort() is used in smc_close.c only.
Make it static.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Allow compiling out legacy support

Introduce a configuration option: CONFIG_NET_DSA_LEGACY allowing to compile out
support for the old platform device and Device Tree binding registration.
Support for these configurations is scheduled to be removed in 4.17.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Don't print "Link speed -1 no longer supported" messages.

On some dual port NICs, the 2 ports have to be configured with compatible
link speeds.  Under some conditions, a port's configured speed may no
longer be supported.  The firmware will send a message to the driver
when this happens.

Improve this logic that prints out the warning by only printing it if
we can determine the link speed that is no longer supported.  If the
speed is unknown or it is in autoneg mode, skip the warning message.

Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Tested-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: Eliminate duplicated codes with existing function

The recv flow of ipvlan l2 mode performs as same as l3 mode for
non-multicast packet, so use the existing func ipvlan_handle_mode_l3
instead of these duplicated statements in non-multicast case.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: handle NETIF_F_HW_TC changes correctly

Currently, whenever the NETIF_F_HW_TC feature changes, we silently
always allow it, but we actually do not disable the flows in HW
on disable. That breaks user's expectations. So just forbid
the feature disable in case there are any filters offloaded.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: avoid unnecessary READ_ONCE in tun_net_xmit

The statement no longer serves a purpose.

Commit fa35864e0bb7 ("tuntap: Fix for a race in accessing numqueues")
added the ACCESS_ONCE to avoid a race condition with skb_queue_len.

Commit 436accebb530 ("tuntap: remove unnecessary sk_receive_queue
length check during xmit") removed the affected skb_queue_len check.

Commit 96f84061620c ("tun: add eBPF based queue selection method")
split the function, reading the field a second time in the callee.
The temp variable is now only read once, so just remove it.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: debug: fix null check on static array

t_name cannot be NULL since it is an array field of a struct.
Replacing null check on static array with string length check using
strnlen()

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_mirred: get rid of mirred_list_lock spinlock

TC actions are no longer freed in RCU callbacks and we should
always have RTNL lock, so this spinlock is no longer needed.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_mirred: get rid of tcfm_ifindex from struct tcf_mirred

tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.

If we would support moving target netdev across netns, using
pointer would be better than ifindex.

This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-add-ip6erspan-collect_md-mode'

William Tu says:

====================
ipv6: add ip6erspan collect_md mode

Similar to erspan collect_md mode in ipv4, the first patch adds
support for ip6erspan collect metadata mode. The second patch
adds the test case using bpf_skb_[gs]et_tunnel_key helpers.

The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151251545410047&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: add ip6erspan sample code

Extend the existing tests for ip6erspan.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_gre: add ip6 erspan collect_md mode

Similar to ip6 gretap and ip4 gretap, the patch allows
erspan tunnel to operate in collect metadata mode.
bpf_skb_[gs]et_tunnel_key() helpers can make use of
it right away.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Uninitialized variable in bnxt_tc_parse_actions()

Smatch warns that:

drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:160 bnxt_tc_parse_actions()
error: uninitialized symbol 'rc'.

"rc" is either uninitialized or set to zero here so we can just remove
the check.

Fixes: 8c95f773b4a3 ("bnxt_en: add support for Flower based vxlan encap/decap offload")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macb-rx-filter-cleanups'

Julia Cartwright says:

====================
macb rx filter cleanups

Here's a proper patchset based on net-next.

v1 -> v2:
  - Rebased on net-next
  - Add Nicolas's Acks
  - Reorder commits, putting the list_empty() cleanups prior to the
    others.
  - Added commit reverting the GFP_ATOMIC change.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: change GFP_ATOMIC to GFP_KERNEL

Now that the rx_fs_lock is no longer held across allocation, it's safe
to use GFP_KERNEL for allocating new entries.

This reverts commit 81da3bf6e3f88 ("net: macb: change GFP_KERNEL to
GFP_ATOMIC").

Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: reduce scope of rx_fs_lock-protected regions

Commit ae8223de3df5 ("net: macb: Added support for RX filtering")
introduces a lock, rx_fs_lock which is intended to protect the list of
rx_flow items and synchronize access to the hardware rx filtering
registers.

However, the region protected by this lock is overscoped, unnecessarily
including things like slab allocation. Reduce this lock scope to only
include operations which must be performed atomically: list traversal,
addition, and removal, and hitting the macb filtering registers.

This fixes the use of kmalloc w/ GFP_KERNEL in atomic context.

Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering")
Cc: Rafal Ozieblo <rafalo@cadence.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: kill useless use of list_empty()

The list_for_each_entry() macro already handles the case where the list
is empty (by not executing the loop body). It's not necessary to handle
this case specially, so stop doing so.

Cc: Rafal Ozieblo <rafalo@cadence.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: remove unused parameter from act cleanup ops

No one actually uses it.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-use-per-port-upstream-port'

Vivien Didelot says:

====================
net: dsa: use per-port upstream port

An upstream port is a local switch port used to reach a CPU port.

DSA still considers a unique CPU port in the whole switch fabric and
thus return a unique upstream port for a given switch. This is wrong in
a multiple CPU ports environment.

We are now switching to using the dedicated CPU port assigned to each
port in order to get rid of the deprecated unique tree CPU port.

This patchset makes the dsa_upstream_port() helper take a port argument
and goes one step closer complete support for multiple CPU ports.

Changes in v2:
- reverse-christmas-tree-fy variables
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: return per-port upstream port

The current dsa_upstream_port() helper still assumes a unique CPU port
in the whole switch fabric. This is becoming wrong, as every port in the
fabric has its dedicated CPU port, thus every port has an upstream port.

Add a port argument to the dsa_upstream_port() helper and fetch its CPU
port instead of the deprecated unique fabric CPU port. A CPU or unused
port has no dedicated CPU port, so return itself in this case.

At the same time, change the return value from u8 to unsigned int since
there is no need to limit the size here.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: assign a CPU port to DSA port

DSA ports also need to have a dedicated CPU port assigned to them,
because they need to know where to egress frames targeting the CPU,
e.g. To_Cpu frames received on a Marvell Tag port.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: setup global upstream port

Move the setup of the global upstream port within the
mv88e6xxx_setup_upstream_port function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: helper to setup upstream port

Add a helper function to setup the upstream port of a given port.

This is the port used to reach the dedicated CPU port. This function
will be extended later to setup the global upstream port as well.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: egress floods all DSA ports

The mv88e6xxx driver currently assumes a single CPU port in the fabric
and thus floods frames with unknown DA on a single DSA port, the one
that is one hop closer to the CPU port.

With multiple CPU ports in mind, this isn't true anymore because CPU
ports could be found behind both DSA ports of a device in-between
others.

For example in a A <-> B <-> C fabric, both A and C having CPU ports,
device B will have to flood such frame to its two DSA ports.

This patch considers both CPU and DSA ports of a device as upstream
ports, where to flood frames with unknown DA addresses.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sch_api-style'

Alexander Aring says:

====================
net: sched: sch_api: fix coding style issues for extack

this patch prepares to handle extack for qdiscs and fixes checkpatch
issues.

There are a bunch of warnings issued by checkpatch which bothered me.
This first patchset is to get rid of those warnings to make way for
the next patchsets.

I plan to followup with qdiscs, classifiers and actions after this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: sch_api: rearrange init handling

This patch fixes the following checkpatch error:

ERROR: do not use assignment in if condition

by rearranging the if condition to execute init callback only if init
callback exists. The whole setup afterwards is called in any case,
doesn't matter if init callback is set or not. This patch has the same
behaviour as before, just without assign err variable in if condition.
It also makes the code easier to read.

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: sch_api: fix code style issues

This patch fix checkpatch issues for upcomming patches according to the
sched api file. It changes checking on null pointer, remove unnecessary
brackets, add variable names for parameters and adjust 80 char width.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-enhanced-debug-dump-via-ethtool'

Simon Horman says:

====================
nfp: enhanced debug dump via ethtool

Add debug dump implementation to the NFP driver. This makes use of
existing ethtool infrastructure. ethtool -W is used to select the dump
level and ethtool -w is used to dump NFP state.

The existing behaviour of dump level 0, dumping the arm.diag resource, is
preserved. Dump levels greater than 0 are implemented by this patchset and
optionally supported by firmware providing a _abi_dump_spec rtsym. This
rtsym provides a specification, in TLV format, of the information to be
dumped from the NFP at each supported dump level.

Dumps are also structured using a TLVs. They consist a prolog and the data
described int he corresponding dump.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump indirect ME CSRs

- The spec defines CSR address ranges for indirect ME CSRs. For Each TLV
  chunk in the spec, dump a chunk that includes the spec and the data
  over the defined address range.
- Each indirect CSR has 8 contexts. To read one context, first write the
  context to a specific derived address, read it back, and then read the
  register value.
- For each address, read and dump all 8 contexts in this manner.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump CPP, XPB and direct ME CSRs

- The spec defines CSR address ranges for these types.
- Dump each TLV chunk in the spec as a chunk that includes the spec and
the data over the defined address range.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump firmware name

Dump FW name as TLV, based on dump specification.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump single hwinfo field by key

- Add spec TLV for hwinfo field, containing key string as data.
- Add dump TLV for hwinfo field, with data being key and value as packed
zero-terminated strings.
- If specified hwinfo field is not found, dump the spec TLV as -ENOENT
error.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump all hwinfo

- Dump hwinfo as separate TLV chunk, in a packed format containing
zero-separated key and value strings.
- This provides additional debug context, if requested by the dumpspec.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump rtsyms

- Support rtsym TLVs.
- If specified rtsym is not found, dump the spec TLV as -ENOENT error.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dumpspec TLV traversal

- Perform dumpspec traversals for calculating size and populating the
dump.
- Initially, wrap all spec TLVs in dump error TLVs (changed by later
patches in the series).

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: dump prolog

- Use a TLV structure, with the typed chunks aligned to 8-byte sizes.
- Dump numeric fields as big-endian.
- Prolog contains the dump level.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: load debug dump spec

Load the TLV-based binary specification of what needs to be included in
a dump, from the "_abi_dump_spec" rtsymbol. If the symbol is not defined,
then dumps for levels >= 1 are not supported.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: debug dump ethtool ops

- Skeleton code to perform a binary debug dump via ethtoolops
  "set_dump", "get_dump_flags" and "get_dump_data", i.e. the ethtool
  -W/w mechanism.
- Skeleton functions for debugdump operations provided.
- An integer "dump level" can be specified, this is stored between
  ethtool invocations. Dump level 0 is still the "arm.diag" resource for
  backward compatibility. Other dump levels each define a set of state
  information to include in the dump, driven by a spec from FW.

Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: get rid of rcu_barrier() in tcf_block_put_ext()

Both Eric and Paolo noticed the rcu_barrier() we use in
tcf_block_put_ext() could be a performance bottleneck when
we have a lot of tc classes.

Paolo provided the following to demonstrate the issue:

tc qdisc add dev lo root htb
for I in `seq 1 1000`; do
        tc class add dev lo parent 1: classid 1:$I htb rate 100kbit
        tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb
        for J in `seq 1 10`; do
                tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J
        done
done
time tc qdisc del dev root

real    0m54.764s
user    0m0.023s
sys     0m0.000s

The rcu_barrier() there is to ensure we free the block after all chains
are gone, that is, to queue tcf_block_put_final() at the tail of workqueue.
We can achieve this ordering requirement by refcnt'ing tcf block instead,
that is, the tcf block is freed only when the last chain in this block is
gone. This also simplifies the code.

Paolo reported after this patch we get:

real    0m0.017s
user    0m0.000s
sys     0m0.017s

Tested-by: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ieee802154-for-davem-2017-12-04' of git://git./linux/kernel/git/sschmidt/wpan-next

Stefan Schmidt says:

====================
pull-request: ieee802154-next 2017-12-04

Some update from ieee802154 to *net-next*

Jian-Hong Pan updated our docs to match the APIs in code.
Michael Hennerichs enhanced the adf7242 driver to work with adf7241
devices and reworked the IRQ and packet handling in the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: make functions nsim_bpf_create_prog and nsim_bpf_destroy_prog static

Functions nsim_bpf_create_prog and nsim_bpf_destroy_prog are local to the
source and do not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'nsim_bpf_create_prog' was not declared. Should it be static?
symbol 'nsim_bpf_destroy_prog' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phylib-hard-resetting-devices'

Geert Uytterhoeven says:

====================
Teach phylib hard-resetting devices

This patch series adds optional PHY reset support to phylib.

The first two patches are destined for David's net-next tree. They add
core PHY reset code, and update a driver that currently uses its own
reset code.

The last two patches are destined for Simon's renesas tree.  They add
properties to describe the EthernetAVB PHY reset topology to the common
Salvator-X/XS and ULCB DTS files, which solves two issues:
  1. On Salvator-XS, the enable pin of the regulator providing PHY power
     is connected to PRESETn, and PSCI powers down the SoC during system
     suspend.  Hence a PHY reset is needed to restore network
     functionality after system resume.
  2. Linux should not rely on the boot loader having reset the PHY, but
     should reset the PHY during driver probe.

Changes compared to v3:
  - Remove Florian's Acked-by,
  - Add missing #include <linux/gpio/consumer.h>,
  - Re-add the gpiod check, as the dummy gpiod_set_value() for !GPIOLIB
    does not ignore NULL, and calls WARN_ON(1),
  - Do not reassert the reset signal if {mdio,phy}_probe() or
    phy_device_register() succeeded, as that may destroy initial setup,
  - Do not deassert the reset signal in {mdio,phy}_remove(), as it
    should already be deasserted,
  - Bring the PHY back into reset state in phy_device_remove(),
  - Move/consolidate GPIO descriptor acquiring code from
    of_mdiobus_register_phy() and of_mdiobus_register_device() to
    mdiobus_register_device().
    Note that this changes behavior slightly, in that the reset signal
    is now also asserted when called from of_mdiobus_register_device().
  - Add Reviewed-by,

Changes compared to v2, as sent by Sergei Shtylyov:
  - Fix fwnode_get_named_gpiod() call due to added parameters (which
    allowed to eliminate the gpiod_direction_output() call),
  - Rebased, refreshed, reworded,
  - Take over from Sergei,
  - Add Acked-by,
  - Remove unneeded gpiod check, as gpiod_set_value() handles NULL fine,
  - Handle fwnode_get_named_gpiod() errors correctly:
      - -ENOENT is ignored (the GPIO is optional), and turned into NULL,
which allowed to remove all later !IS_ERR() checks,
      - Other errors (incl. -EPROBE_DEFER) are propagated,
  - Extract DTS patches from series "[PATCH 0/4] ravb: Add PHY reset
    support" (https://www.spinics.net/lists/netdev/msg457308.html), and
    incorporate in this series, after moving reset-gpios from the
    ethernet to the ethernet-phy node.

Given (1) the new reset-gpios DT property in the PHY node follows
established practises, (2) the DT binding change in the first patch has
been acked by Rob, and (3) the DTS patch does not cause any regressions
if it is applied before the PHY driver patches, the DTS patches can be
applied independently.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

macb: Kill PHY reset code

With the phylib now being aware of the "reset-gpios" PHY node property,
there should be no need to frob the PHY reset in this driver anymore...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: Add device reset GPIO support

The PHY devices sometimes do have their reset signal (maybe even power
supply?) tied to some GPIO and sometimes it also does happen that a boot
loader does not leave it deasserted. So far this issue has been attacked
from (as I believe) a wrong angle: by teaching the MAC driver to manipulate
the GPIO in question; that solution, when applied to the device trees, led
to adding the PHY reset GPIO properties to the MAC device node, with one
exception: Cadence MACB driver which could handle the "reset-gpios" prop
in a PHY device subnode. I believe that the correct approach is to teach
the 'phylib' to get the MDIO device reset GPIO from the device tree node
corresponding to this device -- which this patch is doing...

Note that I had to modify the AT803x PHY driver as it would stop working
otherwise -- it made use of the reset GPIO for its own purposes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Rob Herring <robh@kernel.org>
[geert: Propagate actual errors from fwnode_get_named_gpiod()]
[geert: Avoid destroying initial setup]
[geert: Consolidate GPIO descriptor acquiring code]
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: dissect tunnel info outside __skb_flow_dissect()

Move dissection of tunnel info to outside of the main flow dissection
function, __skb_flow_dissect(). The sole user of this feature, the flower
classifier, is updated to call tunnel info dissection directly, using
skb_flow_dissect_tunnel_info().

This results in a slightly less complex implementation of
__skb_flow_dissect(), in particular removing logic from that call path
which is not used by the majority of users. The expense of this is borne by
the flower classifier which now has to make an extra call for tunnel info
dissection.

This patch should not result in any behavioural change.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: add eBPF based queue selection method

This patch introduces an eBPF based queue selection method. With this,
the policy could be offloaded to userspace completely through a new
ioctl TUNSETSTEERINGEBPF.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-reset-refactor'

Salil Mehta says:

====================
net: hns3: Refactors "reset" handling code in HCLGE layer of HNS3 driver

This patch refactors the code of the reset feature in HCLGE layer
of HNS3 PF driver. Prime motivation to do this change is:
1. To reduce the time for which common miscellaneous Vector 0
   interrupt is disabled because of the reset. Simplification
   of the common miscellaneous interrupt handler routine(for
   Vector 0) used to handle reset and other sources of Vector
   0 interrupt.
2. Separate the task for handling the reset
3. Simplification of reset request submission and pending reset
   logic.

To achieve above below few things have been done:
1. Interrupt is disabled while common miscellaneous interrupt
   handler is entered and re-enabled before it is exit. This
   reduces the interrupt handling latency as compared to older
   interrupt handling scheme where interrupt was being disabled
   in interrupt handler context and re-enabled in task context
   some time later. Made Miscellaneous interrupt handler more
   generic to handle all sources including reset interrupt source.
2. New reset service task has been introduced to service the
   reset handling.
3. Introduces new reset service task for honoring software reset
   requests like from network stack related to timeout and serving
   the pending reset request(to reset the driver and associated
   clients).

Change Log:
Patch V2: Addressed comment by Andrew Lunn
Link: https://lkml.org/lkml/2017/12/1/366
Patch V1: Initial Submit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Refactors the requested reset & pending reset handling code

In exisiting code, the way to detect if driver/client reset should
be executed or if hardware should be be soft resetted was overly
complex.

Existing code use to read the interrupt status register from task
context to figure out if the interrupt source event was reset and
then use clear the interrupt source for reset while waiting for the
hardware to finish the reset. This behaviour again was confusing
and overly complex in terms of the flow.

This patch simplifies the handling of the requested reset and the
pending reset(i.e. reset which have already been asserted by the
software and hardware has acknowledged back to driver that it is
processing the hardware reset through interrupt)

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add reset service task for handling reset requests

Existing common service task was being used to service the reset
requests. This patch tries to make the handling of reset cleaner
by separating task to handle the reset requests. This might in
turn help in adapting similar handling approach for other
interrupt events like mailbox, sharing vector 0 interrupt.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Refactor of the reset interrupt handling logic

The reset interrupt event shares common miscellaneous interrupt
Vector 0. In the existing reset interrupt handling we disable
the Vector 0 interrupt in misc interrupt handler and re-enable
them later in context to common service task.

This also means other event sources like mailbox would also be
deferred or if the interrupt event was due to mailbox(which shall
be supported for VF soon), it could delay the reset handling.

This patch reorganizes the reset interrupt handling logic and
makes it more fair to other events.

Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: change GFP_KERNEL to GFP_ATOMIC

Function gem_add_flow_filter called on line 2958 inside lock on line 2949
but uses GFP_KERNEL

Generated by: scripts/coccinelle/locks/call_kern.cocci

Fixes: ae8223de3df5 ("net: macb: Added support for RX filtering")
CC: Rafal Ozieblo <rafalo@cadence.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'SFP-phylink-updates'

Russell King says:

====================
SFP/phylink updates

This series, which follows on from the fixes posted earlier, improves
the phylink/sfp support.  Changes included here are:

- Merge 802.3z and SGMII modes into one "in-band" mode, using the
  PHY_INTERFACE_MODE_xxx definition to determine which should be used.
  This allows more flexibility as more interface modes become
  available.

- Allow 2500base-X and 10GBASE-KR to be requested from SFP.

- Remove unused and unnecessary phylink_init_eee()

- Restart 802.3z autonegotiation when starting the network device to
  ensure that the negotiated parameters are always correct.  It has
  been observed on mvneta that this is not always the case without
  this change.

- Add kerneldoc documentation for phylink and sfp upstream facing APIs
  and link it in to the networking documentation.

- Resolve a sparse warning in sfp-bus.c

- Convert phylink/sfp to use fwnode rather than DT so that other firmware
  systems can take advantage of this - I have received a request for it
  to be usable with ACPI.  The exception to this is our interactions with
  phylib, as phylib itself does not yet support fwnode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: convert to fwnode

Convert phylink to fwnode, switching phylink_create() from taking a
device_node to taking a fwnode_handle. This will allow other firmware
systems to take advantage of sfp/phylink support.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: convert to fwnode

Convert sfp-bus to use fwnode rather than device_node internally, so
we can support more than just device tree firmware.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: fix sparse warning

drivers/net/phy/sfp-bus.c:298:13: warning: context imbalance in 'sfp_bus_release' - wrong count at exit

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfp: add documentation for kernel APIs

Add kernel-doc documentation for sfp kernel APIs, and link it into the
networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: add documentation for kernel APIs

Add kernel-doc documentation for phylink kernel APIs, and link it into
the networking kapi documentation under "Network device support".

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: restart 802.3z negotiation when starting net device

Restart 802.3z negotiation when the net device is brought up to ensure
that the link partner has our current link modes.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: remove phylink_init_eee()

phylink_init_eee() serves no purpose, remove it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: add support for 2500baseX and 10GbaseKR

Add support for handling the faster 2.5G and 10G link modes when used
with SFP modules.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: get rid of separate Cisco SGMII and 802.3z modes

Since the handling of SGMII and 802.3z is now the same, combine the
MLO_AN_xxx constants into a single MLO_AN_INBAND, and use the PHY
interface mode to distinguish between Cisco SGMII and 802.3z.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylink: merge SGMII and 802.3z handling

The code handling SGMII and 802.3z is essentially the same, except that
we assume 802.3z has no PHY. Re-organise the code such that these cases
are merged, and exclude 802.3z mode from having a PHY attached. This
results in the same link handling behaviour as before.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: add phy_interface_mode_is_8023z() helper

Add and use phy_interface_mode_is_8023z() helper to identify the
interface modes that use 802.3z negotiation. Use it in phylink's
phylink_mac_an_restart().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: fix rtnl_link msghandler rcu annotations

Incorrect/missing annotations caused a few sparse warnings:

rtnetlink.c:155:15: incompatible types .. (different address spaces)
rtnetlink.c:157:23: incompatible types .. (different address spaces)
rtnetlink.c:185:15: incompatible types .. (different address spaces)
rtnetlink.c:285:15: incompatible types .. (different address spaces)
rtnetlink.c:317:9: incompatible types .. (different address spaces)
rtnetlink.c:3054:23: incompatible types .. (different address spaces)

no change in generated code.

Fixes: addf9b90de22f7 ("net: rtnetlink: use rcu to free rtnl message handlers")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"virtio and qemu bugfixes

  A couple of bugfixes that just became ready"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
  virtio: release virtio index when fail to device_register
  fw_cfg: fix driver remove

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Various TCP control block fixes, including one that crashes with
    SELinux, from David Ahern and Eric Dumazet.

2) Fix ACK generation in rxrpc, from David Howells.

3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
    from Gao Feng.

4) SIT configuration doesn't take on the frag_off ipv4 field
    configuration properly, fix from Hangbin Liu.

5) TSO can fail after device down/up on stmmac, fix from Lars Persson.

6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.

7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
    performance problems). From Wei Xu.

8) mvpps's TX descriptors were not zero initialized, from Yan Markman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
  tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
  rxrpc: Fix the MAINTAINERS record
  rxrpc: Use correct netns source in rxrpc_release_sock()
  liquidio: fix incorrect indentation of assignment statement
  stmmac: reset last TSO segment size after device open
  ipvlan: Add the skb->mark as flow4's member to lookup route
  s390/qeth: build max size GSO skbs on L2 devices
  s390/qeth: fix GSO throughput regression
  s390/qeth: fix thinko in IPv4 multicast address tracking
  tap: free skb if flags error
  tun: free skb in early errors
  vhost: fix skb leak in handle_rx()
  bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
  bnxt_en: fix dst/src fid for vxlan encap/decap actions
  bnxt_en: wildcard smac while creating tunnel decap filter
  bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
  phylink: ensure we take the link down when phylink_stop() is called
  sfp: warn about modules requiring address change sequence
  sfp: improve RX_LOS handling
  ...

arch/tile: mark as orphaned

The chip family of TILEPro and TILE-Gx was developed by Tilera, which
was eventually acquired by Mellanox. The tile architecture was added to
the kernel in 2010 and first appeared in 2.6.36.

Now at Mellanox we are developing new chips based on the ARM64
architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
our customers using tile architecture products are not, as far as we
know, looking to upgrade to newer kernel releases. In the absence of
someone in the community stepping up to take over maintainership, this
commit marks the architecture as orphaned.

Cc: Chris Metcalf <metcalf@alum.mit.edu>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

rtnetlink: ipv6: convert remaining users to rtnl_register_module

convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.

Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-4.16-20171201' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01

this is a pull request of 10 patches for net-next/master.

The first two patches are by Arnd Bergmann, they convert the peak_usb
from using "struct timeval" to "ktime_t". The error handling in the
vxcan driver is clean up by Markus Elfring's patch. Bhumika Goyal
contributes a patch for the c_can_pci driver to make the pci data const.
The six patches by Pankaj Bansal for the flexcan driver add LS1021A
support by making the endianness of the driver configurable by the
device tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Addition of a software model for BPF offloads in order to ease
   testing code changes in that area and make semantics more clear.
   This is implemented in a new driver called netdevsim, which can
   later also be extended for other offloads. SR-IOV support is added
   as well to netdevsim. BPF kernel selftests for offloading are
   added so we can track basic functionality as well as exercising
   all corner cases around BPF offloading, from Jakub.

2) Today drivers have to drop the reference on BPF progs they hold
   due to XDP on device teardown themselves. Change this in order
   to make XDP handling inside the drivers less error prone, and
   move disabling XDP to the core instead, also from Jakub.

3) Misc set of BPF verifier improvements and cleanups as preparatory
   work for upcoming BPF-to-BPF calls. Among others, this set also
   improves liveness marking such that pruning can be slightly more
   effective. Register and stack liveness information is now included
   in the verifier log as well, from Alexei.

4) nfp JIT improvements in order to identify load/store sequences in
   the BPF prog e.g. coming from memcpy lowering and optimizing them
   through the NPU's command push pull (CPP) instruction, from Jiong.

5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
   bpf_prog_attach() magic values and replacing them with actual proper
   attach flag instead, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rtnetlink-rework-handler-registration'

Florian Westphal says:

====================
rtnetlink: rework handler (un)registering

Peter Zijlstra reported (referring to commit 019a316992ee0d983,
"rtnetlink: add reference counting to prevent module unload while dump is in progress"):

1) it not in fact a refcount, so using refcount_t is silly
2) there is a distinct lack of memory barriers, so we can easily
    observe the decrement while the msg_handler is still in progress.
3) waiting with a schedule()/yield() loop is complete crap and subject
    life-locks, imagine doing that rtnl_unregister_all() from a RT task.

In ancient times rtnetlink exposed a statically-sized table with
preset doit/dumpit handlers to be called for a protocol/type pair.

Later the rtnl_register interface was added and the table was allocated
on demand.  Eventually these were also used by modules.

Problem is that nothing prevents module unload while a netlink dump
is in progress.  netlink dumps can be span multiple recv calls and
netlink core saves the to-be-repeated dumper address for later invocation.

To prevent rmmod the netlink core expects callers to pass in the owning
module so a reference can be taken.

So far rtnetlink wasn't doing this, add new interface to pass THIS_MODULE.
Moreover, when converting parts of the rtnetlink handling to rcu this code
gained way too many READ_ONCE spots, remove them and the extra refcounting.

Take a module reference when running dumpit and doit callbacks
and never alter content of rtnl_link structures after they have been
published via rcu_assign_pointer.

Based partially on earlier patch from Peter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: remove __rtnl_register

This removes __rtnl_register and switches callers to either
rtnl_register or rtnl_register_module.

Also, rtnl_register() will now print an error if memory allocation
failed rather than panic the kernel.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use rtnl_register_module where needed

all of these can be compiled as a module, so use new
_module version to make sure module can no longer be removed
while callback/dump is in use.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: get reference on module before invoking handlers

Add yet another rtnl_register function. It will be used by modules
that can be removed.

The passed module struct is used to prevent module unload while
a netlink dump is in progress or when a DOIT_UNLOCKED doit callback
is called.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rtnetlink: use rcu to free rtnl message handlers

rtnetlink is littered with READ_ONCE() because we can have read accesses
while another cpu can write to the structure we're reading by
(un)registering doit or dumpit handlers.

This patch changes this so that (un)registering cpu allocates a new
structure and then publishes it via rcu_assign_pointer, i.e. once
another cpu can see such pointer no modifications will occur anymore.

based on initial patch from Peter Zijlstra.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: broadcom: re-add mistakenly removed config settings

Previous patch mistakenly removed three chip-specific config settings.
Add them again.

Fixes: 80274abafc60 "net: phy: remove generic settings for callbacks config_aneg and read_status from drivers"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-gre-collect_md'

William Tu says:

====================
add ip6 gre and gretap collect_md mode

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6gretap tunnels to
operate in collect metadata mode. The first patch adds the support to
ip6_gre.c. The second patch enables unsetting the csum for ipv6 tunnel,
when using bpf_skb_[gs]et_tunnel_key() helpers. Finally, the last patch
adds the ip6 gre and gretap tunnel test cases to BPF sample code.

The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151216943128087&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: extend test_tunnel_bpf.sh with ip6gre

Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: allow disabling tunnel csum for ipv6

Before the patch, BPF_F_ZERO_CSUM_TX can be used only for ipv4 tunnel.
With introduction of ip6gretap collect_md mode, the flag should be also
supported for ipv6.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_gre: add ip6 gre and gretap collect_md mode

Similar to gre, vxlan, geneve, ipip tunnels, allow ip6 gre and gretap
tunnels to operate in collect metadata mode. bpf_skb_[gs]et_tunnel_key()
helpers can make use of it right away. OVS can use it as well in the
future.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: core: don't disable device interrupts in phy_change

If state is not PHY_HALTED I see no need to temporarily disable
interrupts on the device. As long as the current interrupt isn't acked
on the device no new interrupt can happen anyway.

In addition remove a unneeded enabling of interrupts in the state
machine when handling state PHY_CHANGELINK.

Tested on a Odroid-C2 with RTL8211F phy in interrupt mode.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: core: remove now uneeded disabling of interrupts

After commits c974bdbc3e "net: phy: Use threaded IRQ, to allow IRQ from
sleeping devices" and 664fcf123a30 "net: phy: Threaded interrupts allow
some simplification" all relevant code pieces run in process context
anyway and I don't think we need the disabling of interrupts any longer.

Interestingly enough, latter commit already removed the comment
explaining why interrupts need to be temporarily disabled.

On my system phy interrupt mode works fine with this patch.
However I may miss something, especially in the context of shared phy
interrupts, therefore I'd appreciate if more people could test this.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2017-12-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a compilation warning in xdp redirect tracepoint due to
   missing bpf.h include that pulls in struct bpf_map, from Xie.

2) Limit the maximum number of attachable BPF progs for a given
   perf event as long as uabi is not frozen yet. The hard upper
   limit is now 64 and therefore the same as with BPF multi-prog
   for cgroups. Also add related error checking for the sample
   BPF loader when enabling and attaching to the perf event, from
   Yonghong.

3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
   case, so that the test case can always pass and not fail in
   some environments due to too low default limit, also from
   Yonghong.

4) Fix up a missing license header comment for kernel/bpf/offload.c,
   from Jakub.

5) Several fixes for bpftool, among others a crash on incorrect
   arguments when json output is used, error message handling
   fixes on unknown options and proper destruction of json writer
   for some exit cases, all from Quentin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-cb-selinux-corruption'

Eric Dumazet says:

====================
tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()

James Morris reported kernel stack corruption bug that
we tracked back to commit 971f10eca186 ("tcp: better TCP_SKB_CB
layout to reduce cache line misses")

First patch needs to be backported to kernels >= 3.18,
while second patch needs to be backported to kernels >= 4.9, since
this was the time when inet_exact_dif_match appeared.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()

After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb->cb[] has not been mangled yet by TCP.

Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()

James Morris reported kernel stack corruption bug [1] while
running the SELinux testsuite, and bisected to a recent
commit bffa72cf7f9d ("net: sk_buff rbnode reorg")

We believe this commit is fine, but exposes an older bug.

SELinux code runs from tcp_filter() and might send an ICMP,
expecting IP options to be found in skb->cb[] using regular IPCB placement.

We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.

This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
similar way we added them for IPv6.

[1]
[  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
[  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
[  339.822505]
[  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
[  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
[  339.885060] Call Trace:
[  339.896875]  <IRQ>
[  339.908103]  dump_stack+0x63/0x87
[  339.920645]  panic+0xe8/0x248
[  339.932668]  ? ip_push_pending_frames+0x33/0x40
[  339.946328]  ? icmp_send+0x525/0x530
[  339.958861]  ? kfree_skbmem+0x60/0x70
[  339.971431]  __stack_chk_fail+0x1b/0x20
[  339.984049]  icmp_send+0x525/0x530
[  339.996205]  ? netlbl_skbuff_err+0x36/0x40
[  340.008997]  ? selinux_netlbl_err+0x11/0x20
[  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
[  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
[  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
[  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
[  340.074562]  ? tcp_filter+0x2c/0x40
[  340.086400]  ? tcp_v4_rcv+0x820/0xa20
[  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
[  340.111279]  ? ip_local_deliver+0x6f/0xe0
[  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
[  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
[  340.147442]  ? ip_rcv+0x27c/0x3c0
[  340.158668]  ? inet_del_offload+0x40/0x40
[  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
[  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
[  340.195282]  ? __netif_receive_skb+0x18/0x60
[  340.207288]  ? process_backlog+0x95/0x140
[  340.218948]  ? net_rx_action+0x26c/0x3b0
[  340.230416]  ? __do_softirq+0xc9/0x26a
[  340.241625]  ? do_softirq_own_stack+0x2a/0x40
[  340.253368]  </IRQ>
[  340.262673]  ? do_softirq+0x50/0x60
[  340.273450]  ? __local_bh_enable_ip+0x57/0x60
[  340.285045]  ? ip_finish_output2+0x175/0x350
[  340.296403]  ? ip_finish_output+0x127/0x1d0
[  340.307665]  ? nf_hook_slow+0x3c/0xb0
[  340.318230]  ? ip_output+0x72/0xe0
[  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
[  340.340070]  ? ip_local_out+0x35/0x40
[  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
[  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
[  340.372484]  ? __skb_clone+0x2e/0x130
[  340.382633]  ? tcp_transmit_skb+0x558/0xa10
[  340.393262]  ? tcp_connect+0x938/0xad0
[  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
[  340.414206]  ? tcp_v4_connect+0x457/0x4e0
[  340.424471]  ? __inet_stream_connect+0xb3/0x300
[  340.435195]  ? inet_stream_connect+0x3b/0x60
[  340.445607]  ? SYSC_connect+0xd9/0x110
[  340.455455]  ? __audit_syscall_entry+0xaf/0x100
[  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
[  340.476636]  ? __audit_syscall_exit+0x209/0x290
[  340.487151]  ? SyS_connect+0xe/0x10
[  340.496453]  ? do_syscall_64+0x67/0x1b0
[  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: James Morris <james.l.morris@oracle.com>
Tested-by: James Morris <james.l.morris@oracle.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.15-rc2

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fix from Russell King:
"Just one fix this time around, for the late commit in the merge window
  that triggered a problem with qemu. Qemu is apparently also going to
  receive a fix for the discovered issue"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: avoid faulting on qemu

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Here are two bugfixes for I2C, fixing a memleak in the core and irq
  allocation for i801.

  Also three bugfixes for the at24 eeprom driver which Bartosz collected
  while taking over maintainership for this driver"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  eeprom: at24: check at24_read/write arguments
  eeprom: at24: fix reading from 24MAC402/24MAC602
  eeprom: at24: correctly set the size for at24mac402
  i2c: i2c-boardinfo: fix memory leaks on devinfo
  i2c: i801: Fix Failed to allocate irq -2147483648 error

Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Fixes:

   - Drop reference to obsolete maintainer tree

   - Fix overflow bug in pmbus driver

   - Fix SMBUS timeout problem in jc42 driver

  For the SMBUS timeout handling, we had a brief discussion if this
  should be considered a bug fix or a feature. Peter says "it fixes real
  problems where the application misbehave due to faulty content when
  reading from an eeprom", and he needs the patch in his company's v4.14
  images. This is good enough for me and warrants backport to stable
  kernels"

* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (jc42) optionally try to disable the SMBUS timeout
  hwmon: (pmbus) Use 64bit math for DIRECT format values
  hwmon: Drop reference to Jean's tree

Merge branch 'tcp-2nd-listener-hash'

Martin KaFai Lau says:

====================
tcp: Add a 2nd listener hashtable (port+addr)

This patch set adds a 2nd listener hashtable.  It is to resolve
the performance issue when a process is listening at many IP
addresses with the same port (e.g. [IP1]:443, [IP2]:443... [IPN]:443)

v2:
- Move the new lhash2 and lhash2_mask before the existing
  listening_hash to avoid adding another cacheline
  to inet_hashinfo (Suggested by Eric Dumazet, Thanks!)
- I take this chance to plug an existing 4 bytes hole while
  adding 'unsigned int lhash2_mask'.
- Add some comments about lhash2 in inet_hashtables.h
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Enable 2nd listener hashtable in TCP

Enable the second listener hashtable in TCP.
The scale is the same as UDP which is one slot per 2MB.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>