git.openwrt.org Git - openwrt/staging/blogic.git/log

tipc: eliminate race condition at dual link establishment

Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: clean up handling of link congestion

After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: introduce starvation free send algorithm

Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Handle non-bridge master change

Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch. In that case, do nothing instead of asking
the switch driver to remove a port from a bridge that it didn't join.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: algif - fix warn: unsigned 'used' is never less than zero

Change type from unsigned long to int to fix an issue reported by kbuild robot:
crypto/algif_skcipher.c:596 skcipher_recvmsg_async() warn: unsigned 'used' is
never less than zero.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix a link reset issue due to retransmission failures

When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390: fix /proc/interrupts output

The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: avoid to repeatedly declare external variables

Move the declaration for external variables to sctp.h file avoiding
to repeatedly declare them with extern keyword.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix ipv4 mapped request socks

ss should display ipv4 mapped request sockets like this :

tcp SYN-RECV 0 0 ::ffff:192.168.0.1:8080 ::ffff:192.0.2.1:35261

and not like this :

tcp SYN-RECV 0 0 192.168.0.1:8080 192.0.2.1:35261

We should init ireq->ireq_family based on listener sk_family,
not the actual protocol carried by SYN packet.

This means we can set ireq_family in inet_reqsk_alloc()

Fixes: 3f66b083a5b7 ("inet: introduce ireq_family")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio: change comment in transmit

The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpf_asm: cleanup vlan extension related token

We now have K_VLANT, K_VLANP and K_VLANTPID. Clean them up into more
descriptive token, namely K_VLAN_TCI, K_VLAN_AVAIL and K_VLAN_TPID.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'listener_refactor_16'

Eric Dumazet says:

====================
tcp: listener refactor part 16

A CONFIG_PROVE_RCU=y build revealed an RCU splat I had to fix.

I added const qualifiers to various md5 methods, as I expect
to call them on behalf of request sock traffic even if
the listener socket is not locked. This seems ok, but adding
const makes the contract clearer. Note a good reduction
of code size thanks to request/establish sockets convergence.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()

With request socks convergence, we no longer need
different lookup methods. A request socket can
use generic lookup function.

Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: remove request sock argument of calc_md5_hash()

Since request and established sockets now have same base,
there is no need to pass two pointers to tcp_v4_md5_hash_skb()
or tcp_v6_md5_hash_skb()

Also add a const qualifier to their struct tcp_md5sig_key argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: input path is run under rcu protected sections

It is guaranteed that both tcp_v4_rcv() and tcp_v6_rcv()
run from rcu read locked sections :

ip_local_deliver_finish() and ip6_input_finish() both
use rcu_read_lock()

Also align tcp_v6_inbound_md5_hash() on tcp_v4_inbound_md5_hash()
by returning a boolean.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use C99 initializers in new_state[]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: fix rcu lockdep splat

While timer handler effectively runs a rcu read locked section,
there is no explicit rcu_read_lock()/rcu_read_unlock() annotations
and lockdep can be confused here :

net/ipv4/tcp_ipv4.c-906-        /* caller either holds rcu_read_lock() or socket lock */
net/ipv4/tcp_ipv4.c:907:        md5sig = rcu_dereference_check(tp->md5sig_info,
net/ipv4/tcp_ipv4.c-908-                                       sock_owned_by_user(sk) ||
net/ipv4/tcp_ipv4.c-909-                                       lockdep_is_held(&sk->sk_lock.slock));

Let's explicitely acquire rcu_read_lock() in tcp_make_synack()

Before commit fa76ce7328b ("inet: get rid of central tcp/dccp listener
timer"), we were holding listener lock so lockdep was happy.

Fixes: fa76ce7328b ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric DUmazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rhashtable-next'

Thomas Graf says:

====================
rhashtable updates on top of Herbert's work

Patch 1 is a bugfix for an RCU splash I encountered while testing.
Patch 2 & 3 are pure cleanups. Patch 4 disables automatic shrinking
by default as discussed in previous thread. Patch 5 removes some
rhashtable internal knowledge from nft_hash and fixes another RCU
splash.

I've pushed various rhashtable tests (Netlink, nft) together with a
Makefile to a git tree [0] for easier stress testing.

[0] https://github.com/tgraf/rhashtable
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add rhashtable_free_and_destroy()

rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Disable automatic shrinking by default

Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Mark internal/private inline functions as such

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Use 'unsigned int' consistently

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Extend RCU read lock into rhashtable_insert_rehash()

rhashtable_insert_rehash() requires RCU locks to be held in order
to access ht->tbl and traverse to the last table.

Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

filter: introduce SKF_AD_VLAN_TPID BPF extension

If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-fcoe'

Varun Prakash says:

====================
FCoE support in cxgb4 driver

This patch series enables FCoE support in cxgb4 driver, it enables
FCOE_CRC and FCOE_MTU net device features.

This series is created against net-next tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: update Kconfig and Makefile for FCoE support

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add cxgb4_fcoe.c for FCoE

This patch adds cxgb4_fcoe.c and enables FCOE_CRC, FCOE_MTU
net device features.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add cxgb4_fcoe.h and macro definitions for FCoE

This patch adds new header file cxgb4_fcoe.h and defines new
macros for FCoE support in cxgb4 driver.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix sparse warnings in privacy stable addresses generation

Those warnings reported by sparse endianness check (via kbuild test robot)
are harmless, nevertheless fix them up and make the code a little bit
easier to read.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 622c81d57b392cc ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix compile error when IPV6=m and TIPC=y

When IPV6=m and TIPC=y, below error will appear during building kernel
image:

net/tipc/udp_media.c:196:
undefined reference to `ip6_dst_lookup'
make: *** [vmlinux] Error 1

As ip6_dst_lookup() is implemented in IPV6 and IPV6 is compiled as
module, ip6_dst_lookup() is not built-in core kernel image. As a
result, compiler cannot find 'ip6_dst_lookup' reference while
compiling TIPC code into core kernel image.

But with the method introduced by commit 5f81bd2e5d80 ("ipv6: export a
stub for IPv6 symbols used by vxlan"), we can avoid the compile error
through "ipv6_stub" pointer to access ip6_dst_lookup().

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow to delete a whole device group

With dev group, we can change a batch of net devices,
so we should allow to delete them together too.

Group 0 is not allowed to be deleted since it is
the default group.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add comment on choice of elasticity value

This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

cx82310_eth: fix semicolon.cocci warnings

drivers/net/usb/cx82310_eth.c:175:2-3: Unneeded semicolon

Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: validate length of sockaddr in connect() for dgram/rdm

Commit f2f8036 ("tipc: add support for connect() on dgram/rdm sockets")
hasn't validated user input length for the sockaddr structure which allows
a user to overwrite kernel memory with arbitrary input.

Fixes: f2f8036 ("tipc: add support for connect() on dgram/rdm sockets")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove never used forwarding_accel_ops pointer from net_device

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Fix sleeping inside RCU critical section in walk_stop

The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6_stable_privacy_address'

Hannes Frederic Sowa says:

====================
ipv6: RFC7217 stable privacy addresses implementation

this is an implementation of basic support for RFC7217 stable privacy
addresses. Please review and consider for net-next.

v2:
* Correct references to RFC 7212 -> RFC 7217 in documentation patch (thanks, Eric!)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: introduce idgen_delay and idgen_retries knobs

This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: do retries on stable privacy addresses

If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: collapse state_lock and lock

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: introduce IFA_F_STABLE_PRIVACY flag

We need to mark appropriate addresses so we can do retries in case their
DAD failed.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: generation of stable privacy addresses for link-local and autoconf

This patch implements the stable privacy address generation for
link-local and autoconf addresses as specified in RFC7217.

RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)

is the RID (random identifier). As the hash function F we chose one
round of sha1. Prefix will be either the link-local prefix or the
router advertised one. As Net_Iface we use the MAC address of the
device. DAD_Counter and secret_key are implemented as specified.

We don't use Network_ID, as it couples the code too closely to other
subsystems. It is specified as optional in the RFC.

As Net_Iface we only use the MAC address: we simply have no stable
identifier in the kernel we could possibly use: because this code might
run very early, we cannot depend on names, as they might be changed by
user space early on during the boot process.

A new address generation mode is introduced,
IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
none or eui64 address configuration mode although the stable_secret is
already set.

We refuse writes to ipv6/conf/all/stable_secret but only allow
ipv6/conf/default/stable_secret and the interface specific file to be
written to. The default stable_secret is used as the parameter for the
namespace, the interface specific can overwrite the secret, e.g. when
switching a network configuration from one system to another while
inheriting the secret.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: introduce secret_stable to ipv6_devconf

This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.

We don't inherit the secret to newly created namespaces.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib: EXPORT_SYMBOL sha_init

We need this symbol later on in ipv6.ko, thus export it via EXPORT_SYMBOL
like sha_transform already is.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bcmgenet-next'

Florian Fainelli says:

====================
net: bcmgenet: integrated GPHY power up/down

This patch series implements integrated Gigabit PHY power up/down, which allows
us to save close to 300mW on some designs when the Gigabit PHY is known to be
unused (e.g: during bcmgenet_close or bcmgenet_suspend not doing Wake-on-LAN).

Changes in v2:

- drop an extra bcmgenet_ext_readl in bcmgenet_phy_power_set
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: power down and up GPHY during suspend/resume

In case the interface is not used, power down the integrated GPHY during
suspend. Similarly to bcmgenet_open(), bcmgenet_resume() powers on the GPHY
prior to any UniMAC activity.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: power up and down integrated GPHY when unused

Power up the GPHY while we are bringing-up the network interface, and
conversely, upon bring down, power the GPHY down. In order to avoid
creating hardware hazards, make sure that the GPHY gets powered on
during bcmgenet_open() prior to the UniMAC being reset as the UniMAC may
start creating activity towards the GPHY if we reverse the steps.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: implement GPHY power down sequence

Implement the GPHY power down sequence by setting all power down bits, putting
the GPHY in reset, and finally cutting the 25Mhz reference clock.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: fix GPHY power-up sequence

We were missing a number of extra steps and delays to power-up the GPHY, update
the sequence to reflect the proper procedure here.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: rename bcmgenet_ephy_power_up

In preparation for implementing the power down GPHY sequence, rename
bcmgenet_ephy_power_up to illustrate that it is not EPHY specific but
PHY agnostic, and add an "enable" argument.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: update bcmgenet_ephy_power_up to clear CK25_DIS bit

The CK25_DIS bit controls whether a 25Mhz clock is fed to the GPHY or
not, in preparation for powering down the integrated GPHY when relevant,
make sure we clear that bit.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: propagate errors from bcmgenet_power_down

If bcmgenet_power_down() fails, we would want to propagate a return
value from bcmgenet_wol_power_down_cfg() to know about this.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rhashtable-next'

Herbert Xu says:

====================
rhashtable: Multiple rehashing

This series introduces multiple rehashing.

Recall that the original implementation in br_multicast used
two list pointers per hash node and therefore is limited to at
most one rehash at a time since you need one list pointer for
the old table and one for the new table.

Thanks to Josh Triplett's suggestion of using a single list pointer
we're no longer limited by that.  So it is perfectly OK to have
an arbitrary number of tables in existence at any one time.

The reader and removal simply has to walk from the oldest table
to the newest table in order not to miss anything.  Insertion
without lookup are just as easy as we simply go to the last table
that we can find and add the entry there.

However, insertion with uniqueness lookup is more complicated
because we need to ensure that two simultaneous insertions of the
same key do not both succeed.  To achieve this, all insertions
including those without lookups are required to obtain the bucket
lock from the oldest hash table that is still alive.  This is
determined by having the rehasher (there is only one rehashing
thread in the system) keep a pointer of where it is up to.  If
a bucket has already been rehashed then it is dead, i.e., there
cannot be any more insertions to it, otherwise it is considered
alive.  This guarantees that the same key cannot be inserted
in two different tables in parallel.

Patch 1 is actually a bug fix for the walker.

Patch 2-5 eliminates unnecessary out-of-line copies of jhash.

Patch 6 makes rhashtable_shrink shrink to fit.

Patch 7 introduces multiple rehashing.  This means that if we
decide to grow then we will grow regardless of whether the previous
one has finished.  However, this is still asynchronous meaning
that if insertions come fast enough we may still end up with a
table that is overutilised.

Patch 8 adds support for GFP_ATOMIC allocations of struct bucket_table.

Finally patch 9 enables immediate rehashing.  This is done either
when the table reaches 100% utilisation, or when the chain length
exceeds 16 (the latter can be disabled on request, e.g., for
nft_hash.

With these patches the system should no longer have any trouble
dealing with fast insertions on a small table.  In the worst
case you end up with a list of tables that's log N in length
while the rehasher catches up.

v3 restores rhashtable_shrink and fixes a number of bugs in the
multiple rehashing patches (7 and 9).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add immediate rehash during insertion

This patch reintroduces immediate rehash during insertion. If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail. Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Allow GFP_ATOMIC bucket table allocation

This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL. This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add multiple rehash support

This patch adds the missing bits to allow multiple rehashes. The
read-side as well as remove already handle this correctly. So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Shrink to fit

This patch changes rhashtable_shrink to shrink to the smallest
size possible rather than halving the table. This is needed
because with multiple rehashing we will defer shrinking until
all other rehashing is done, meaning that when we do shrink
we may be able to shrink a lot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: Use default rhashtable hashfn

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable. The default is now jhash so removing the setting
makes no difference apart from making one less copy of jhash in
the kernel.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Use default rhashtable hashfn

This patch removes the explicit jhash value for the hashfn parameter
of rhashtable. As the key length is a multiple of 4, this means that
we will actually end up using jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Allow hashfn to be unset

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset. In which case rhashtable will automatically
set it to jhash. Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Eliminate unnecessary branch in rht_key_hashfn

When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.

For some reason gcc couldn't figure out that params is the same
as ht->p. So let's help it by only checking params.key_len when
it's a constant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add barrier to ensure we see new tables in walker

The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-4.1-20150323' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2015-03-23

this is a pull request of 6 patches for net-next/master.

A patch by Florian Westphal, converts the skb->destructor to use
sock_efree() instead of own destructor. Ahmed S. Darwish's patch
converts the kvaser_usb driver to use unregister_candev(). A patch by
me removes a return from a void function in the m_can driver. Yegor
Yefremov contributes a patch for combined rx/tx LED trigger support. A
sparse warning in the esd_usb2 driver was fixes by Thomas Körper. Ben
Dooks converts the at91_can driver to use endian agnostic IO accessors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next.
Basically, more incremental updates for br_netfilter from Florian
Westphal, small nf_tables updates (including one fix for rb-tree
locking) and small two-liner to add extra validation for the REJECT6
target.

More specifically, they are:

1) Use the conntrack status flags from br_netfilter to know that DNAT is
   happening. Patch for Florian Westphal.

2) nf_bridge->physoutdev == NULL already indicates that the traffic is
   bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian.

3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc,
   from Joe Perches.

4) Consolidation of nf_tables_newtable() error path.

5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(),
   from Florian Westphal.

6) Access rb-tree root node inside the lock and remove unnecessary
   locking from the get path (we already hold nfnl_lock there), from
   Patrick McHardy.

7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't
   support interval, also from Patrick.

8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is
   actually restricting matches to TCP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: pass checksum validation status to the user

Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

For now, the flag may be set for incoming packets only.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: make tpacket_rcv to not set status value before run_filter

It is just an optimization. We don't need the value of status variable
if the packet is filtered.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: fix double request socket freeing

Eric Hugne reported following error :

I'm hitting this warning on latest net-next when i try to SSH into a machine
with eth0 added to a bridge (but i think the problem is older than that)

Steps to reproduce:
node2 ~ # brctl addif br0 eth0
[  223.758785] device eth0 entered promiscuous mode
node2 ~ # ip link set br0 up
[  244.503614] br0: port 1(eth0) entered forwarding state
[  244.505108] br0: port 1(eth0) entered forwarding state
node2 ~ # [  251.160159] ------------[ cut here ]------------
[  251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
[  251.162077] Modules linked in:
[  251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
[  251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  251.164078]  ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
[  251.165084]  0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
[  251.166195]  ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
[  251.167203] Call Trace:
[  251.167533]  [<ffffffff8162ace4>] dump_stack+0x45/0x57
[  251.168206]  [<ffffffff8104da85>] warn_slowpath_common+0x85/0xc0
[  251.169239]  [<ffffffff8104db65>] warn_slowpath_null+0x15/0x20
[  251.170271]  [<ffffffff81559d51>] tcp_v4_err+0x6b1/0x720
[  251.171408]  [<ffffffff81630d03>] ? _raw_read_lock_irq+0x3/0x10
[  251.172589]  [<ffffffff81534e20>] ? inet_del_offload+0x40/0x40
[  251.173366]  [<ffffffff81569295>] icmp_socket_deliver+0x65/0xb0
[  251.174134]  [<ffffffff815693a2>] icmp_unreach+0xc2/0x280
[  251.174820]  [<ffffffff8156a82d>] icmp_rcv+0x2bd/0x3a0
[  251.175473]  [<ffffffff81534ea2>] ip_local_deliver_finish+0x82/0x1e0
[  251.176282]  [<ffffffff815354d8>] ip_local_deliver+0x88/0x90
[  251.177004]  [<ffffffff815350f0>] ip_rcv_finish+0xf0/0x310
[  251.177693]  [<ffffffff815357bc>] ip_rcv+0x2dc/0x390
[  251.178336]  [<ffffffff814f5da3>] __netif_receive_skb_core+0x713/0xa20
[  251.179170]  [<ffffffff814f7fca>] __netif_receive_skb+0x1a/0x80
[  251.179922]  [<ffffffff814f97d4>] process_backlog+0x94/0x120
[  251.180639]  [<ffffffff814f9612>] net_rx_action+0x1e2/0x310
[  251.181356]  [<ffffffff81051267>] __do_softirq+0xa7/0x290
[  251.182046]  [<ffffffff81051469>] run_ksoftirqd+0x19/0x30
[  251.182726]  [<ffffffff8106cc23>] smpboot_thread_fn+0x153/0x1d0
[  251.183485]  [<ffffffff8106cad0>] ? SyS_setgroups+0x130/0x130
[  251.184228]  [<ffffffff8106935e>] kthread+0xee/0x110
[  251.184871]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
[  251.185690]  [<ffffffff81631108>] ret_from_fork+0x58/0x90
[  251.186385]  [<ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
[  251.187216] ---[ end trace c947fc7b24e42ea1 ]---
[  259.542268] br0: port 1(eth0) entered forwarding state

Remove the double calls to reqsk_put()

[edumazet] :

I got confused because reqsk_timer_handler() _has_ to call
reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
the timer handler holds a reference on req.

Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: simplify if clause in dev_close

Dan Carpenter's static checker warned that in vxlan_stop we are checking
if 'vs' can be NULL while later we simply derreference it.

As after commit 56ef9c909b40 ("vxlan: Move socket initialization to
within rtnl scope") 'vs' just cannot be NULL in vxlan_stop() anymore, as
the interface won't go up if the socket initialization fails. So we are
good to just remove the check and make it consistent.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib_trie: Fix regression in handling of inflate/halve failure

When I updated the code to address a possible null pointer dereference in
resize I ended up reverting an exception handling fix for the suffix length
in the event that inflate or halve failed. This change is meant to correct
that by reverting the earlier fix and instead simply getting the parent
again after inflate has been completed to avoid the possible null pointer
issue.

Fixes: ddb4b9a13 ("fib_trie: Address possible NULL pointer dereference in resize")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: implement scatter/gather support

Always use software checksumming, since the hardware does not have any
checksum offload support.
This significantly improves local TCP tx performance.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: implement GRO and use build_skb

This improves performance for routing and local rx

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: fix descriptor frame start/end definitions

The start-of-frame and end-of-frame bits were accidentally swapped.
In the current code it does not make any difference, since they are
always used together.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move the comment about unsettable socket-level options to default clause and update its reference.

We implement the SO_SNDLOWAT etc not to be settable and return
ENOPROTOOPT per 1003.1g 7. Move the comment to appropriate
position and update the reference.

Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'listener_refactor_part_15'

Eric Dumazet says:

====================
tcp listener refactoring part 15

I am trying to make the final patch pushing request socks into ehash
as small as possible. In this patch series, I made various adjustments
for the SYNACK generation, allowing me to reach 1 Mpps SYNACK in my
stress test (still hitting LISTENER spinlock of course, and the syn_wait
spinlock)

I also converted the ICMP handlers a bit ahead of time :

They no longer need to get the LISTENER socket, and can use
only a lookup in ehash table. No big deal if we ignore ICMP
for requests socks before the final steps.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets

dccp_v6_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets

dccp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets

tcp_v6_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets

tcp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New tcp_req_err() helper is exported so that we can use it in IPv6
in following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert syn_wait_lock to a spinlock

This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.

We hold syn_wait_lock for such small sections, that it makes no sense to use
a read/write lock. A spin lock is simply faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: remove some sk_listener dependencies

listener can be source of false sharing. request sock has some
useful information like : ireq->ir_iif, ireq->ir_num, ireq->ireq_net

This patch does not solve the major problem of having to read
sk->sk_protocol which is sharing a cache line with sk->sk_wmem_alloc.
(This same field is read later in ip_build_and_send_pkt())

One idea would be to move sk_protocol close to sk_family
(using 8 bits instead of 16 for sk_family seems enough)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: remove sk_listener parameter from syn_ack_timeout()

It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

inet: cache listen_sock_qlen() and read rskq_defer_accept once

Cache listen_sock_qlen() to limit false sharing, and read
rskq_defer_accept once as it might change under us.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gigaset_modem_response'

Tilman Schmidt says:

====================
isdn/gigaset: restructure modem response parser

This series of patches restructures the Gigaset ISDN driver's
modem response parser to improve code readability and conform
better to the device's specification and actual behaviour.
Could you please merge these through net-next?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: restructure modem response parser (4)

Restructure the control structure of the modem response parser
to improve readability and error handling.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: restructure modem response parser (3)

Separate CID detection from main parser loop.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: restructure modem response parser (2)

Separate literal string handling from main parser loop.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: restructure modem response parser (1)

Factor out queueing of modem response events into helper function
add_cid_event().

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: fix stp update API to work with layered netdevices

make it same as the netdev_switch_port_bridge_setlink/dellink
api (ie traverse lowerdevs to get to the switch port).

removes "WARN_ON(!ops->ndo_switch_parent_id_get)" because
direct bridge ports can be stacked netdevices (like bonds
and team of switch ports) which may not implement this ndo.

v2 to v3:
- remove changes to bond and team. Bring back the
transparently following lowerdevs like i initially
had for setlink/getlink
(http://www.spinics.net/lists/netdev/msg313436.html)
dave and scott feldman also seem to prefer it be that
way and move to non-transparent way of doing things
if we see a problem down the lane.

v3 to v4:
- fix ret initialization

v4 to v5:
- return err on first failure (scott feldman)

v5 to v6:
- change variable name (err) and initialize to
-EOPNOTSUPP (scott feldman).

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: clear skb->priority when forwarding to another netns

skb->priority can be set for two purposes:

1) With respect to IP TOS field, which is computed by a mask.
Ususally used for priority qdisc's (pfifo, prio etc.), on TX
side (we only have ingress qdisc on RX side).

2) Used as a classid or flowid, works in the same way with tc
classid. What's more, this can even override the classid
of tc filters.

For case 1), it has been respected within its netns, I don't
see any point of keeping it for another netns, especially
when packets will be forwarded to Rx path (no matter from TX
path or RX path).

For case 2) we care, our applications run inside a netns,
and we classify the packets by our own filters outside,
If some application sets this priority, it could bypass
our filters, therefore clear it when moving out of a netns,
it makes no sense to bypass tc filters out of its netns.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'crypto_async'

Tadeusz Struk says:

====================
Add support for async socket operations

After the iocb parameter has been removed from sendmsg() and recvmsg() ops
the socket layer, and the network stack no longer support async operations.
This patch set adds support for asynchronous operations on sockets back.

Changes in v3:
* As sugested by Al Viro instead of adding new functions aio_sendmsg
and aio_recvmsg, added a ptr to iocb into the kernel-side msghdr structure.
This way no change to aio.c is required.

Changes in v2:
* removed redundant total_size param from aio_sendmsg and aio_recvmsg functions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: algif - change algif_skcipher to be asynchronous

The way the algif_skcipher works currently is that on sendmsg/sendpage it
builds an sgl for the input data and then on read/recvmsg it sends the job
for encryption putting the user to sleep till the data is processed.
This way it can only handle one job at a given time.
This patch changes it to be asynchronous by adding AIO support.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

crypto: af_alg - Allow to link sgl

Allow to link af_alg sgls.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: socket: add support for async operations

Add support for async operations.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Validate iov ranges before feeding them into iov_iter_init(), from
    Al Viro.

2) We changed copy_from_msghdr_from_user() to zero out the msg_namelen
    is a NULL pointer is given for the msg_name.  Do the same in the
    compat code too.  From Catalin Marinas.

3) Fix partially initialized tuples in netfilter conntrack helper, from
    Ian Wilson.

4) Missing continue; statement in nft_hash walker can lead to crashes,
    from Herbert Xu.

5) tproxy_tg6_check looks for IP6T_INV_PROTO in ->flags instead of
    ->invflags, fix from Pablo Neira Ayuso.

6) Incorrect memory account of TCP FINs can result in negative socket
    memory accounting values.  Fix from Josh Hunt.

7) Don't allow virtual functions to enable VLAN promiscuous mode in
    be2net driver, from Vasundhara Volam.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set
  cx82310_eth: wait for firmware to become ready
  net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom
  net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
  be2net: use PCI MMIO read instead of config read for errors
  be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs
  be2net: Prevent VFs from enabling VLAN promiscuous mode
  tcp: fix tcp fin memory accounting
  ipv6: fix backtracking for throw routes
  net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
  ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment
  netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check()
  netfilter: restore rule tracing via nfnetlink_log
  netfilter: nf_tables: allow to change chain policy without hook if it exists
  netfilter: Fix potential crash in nft_hash walker
  netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"Some perf bug fixes from David Ahern, and the fix for that nasty
  memmove() bug"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix several bugs in memmove().
  sparc: Touch NMI watchdog when walking cpus and calling printk
  sparc: perf: Add support M7 processor
  sparc: perf: Make counting mode actually work
  sparc: perf: Remove redundant perf_pmu_{en|dis}able calls

sparc64: Fix several bugs in memmove().

Firstly, handle zero length calls properly.  Believe it or not there
are a few of these happening during early boot.

Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src.  The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.

For example, considering NG4memcpy, the main unrolled loop begins like
this:

     load   src + 0x00
     load   src + 0x08
     load   src + 0x10
     load   src + 0x18
     load   src + 0x20
     store  dst + 0x00

Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call.  That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.

To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well.  We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.

Reported-by: David Ahern <david.ahern@oracle.com>
Reported-by: Bob Picco <bpicco@meloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.0-rc5

Merge tag 'md/4.0-rc4-fix' of git://neil.brown.name/md

Pull bugfix for md from Neil Brown:
"One fix for md in 4.0-rc4

Regression in recent patch causes crash on error path"

* tag 'md/4.0-rc4-fix' of git://neil.brown.name/md:
md: fix problems with freeing private data after ->run failure.