openwrt/staging/blogic.git
9 years agobpf: split state from prandom_u32() and consolidate {c, e}BPF prngs
Daniel Borkmann [Wed, 7 Oct 2015 23:20:39 +0000 (01:20 +0200)]
bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs

While recently arguing on a seccomp discussion that raw prandom_u32()
access shouldn't be exposed to unpriviledged user space, I forgot the
fact that SKF_AD_RANDOM extension actually already does it for some time
in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode").

Since prandom_u32() is being used in a lot of critical networking code,
lets be more conservative and split their states. Furthermore, consolidate
eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF,
bpf_get_prandom_u32() was only accessible for priviledged users, but
should that change one day, we also don't want to leak raw sequences
through things like eBPF maps.

One thought was also to have own per bpf_prog states, but due to ABI
reasons this is not easily possible, i.e. the program code currently
cannot access bpf_prog itself, and copying the rnd_state to/from the
stack scratch space whenever a program uses the prng seems not really
worth the trouble and seems too hacky. If needed, taus113 could in such
cases be implemented within eBPF using a map entry to keep the state
space, or get_random_bytes() could become a second helper in cases where
performance would not be critical.

Both sides can trigger a one-time late init via prandom_init_once() on
the shared state. Performance-wise, there should even be a tiny gain
as bpf_user_rnd_u32() saves one function call. The PRNG needs to live
inside the BPF core since kernels could have a NET-less config as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorandom32: add prandom_init_once helper for own rngs
Daniel Borkmann [Wed, 7 Oct 2015 23:20:38 +0000 (01:20 +0200)]
random32: add prandom_init_once helper for own rngs

Add a prandom_init_once() facility that works on the rnd_state, so that
users that are keeping their own state independent from prandom_u32() can
initialize their taus113 per cpu states.

The motivation here is similar to net_get_random_once(): initialize the
state as late as possible in the hope that enough entropy has been
collected for the seeding. prandom_init_once() makes use of the recently
introduced prandom_seed_full_state() helper and is generic enough so that
it could also be used on fast-paths due to the DO_ONCE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agorandom32: add prandom_seed_full_state helper
Daniel Borkmann [Wed, 7 Oct 2015 23:20:37 +0000 (01:20 +0200)]
random32: add prandom_seed_full_state helper

Factor out the full reseed handling code that populates the state
through get_random_bytes() and runs prandom_warmup(). The resulting
prandom_seed_full_state() will be used later on in more than the
current __prandom_reseed() user. Fix also two minor whitespace
issues along the way.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoonce: make helper generic for calling functions once
Hannes Frederic Sowa [Wed, 7 Oct 2015 23:20:36 +0000 (01:20 +0200)]
once: make helper generic for calling functions once

Make the get_random_once() helper generic enough, so that functions
in general would only be called once, where one user of this is then
net_get_random_once().

The only implementation specific call is to get_random_bytes(), all
the rest of this *_once() facility would be duplicated among different
subsystems otherwise. The new DO_ONCE() helper will be used by prandom()
later on, but might also be useful for other scenarios/subsystems as
well where a one-time initialization in often-called, possibly fast
path code could occur.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: move net_get_random_once to lib
Hannes Frederic Sowa [Wed, 7 Oct 2015 23:20:35 +0000 (01:20 +0200)]
net: move net_get_random_once to lib

There's no good reason why users outside of networking should not
be using this facility, f.e. for initializing their seeds.

Therefore, make it accessible from there as get_random_once().

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Do not drop to make_route if oif is l3mdev
David Ahern [Wed, 7 Oct 2015 15:40:13 +0000 (08:40 -0700)]
net: Do not drop to make_route if oif is l3mdev

Commit deaa0a6a930 ("net: Lookup actual route when oif is VRF device")
exposed a bug in __ip_route_output_key_hash for VRF devices: on FIB lookup
failure if the oif is specified the current logic drops to make_route on
the assumption that the route tables are wrong. For VRF/L3 master devices
this leads to wrong dst entries and route lookups. For example:
    $ ip route ls table vrf-red
    unreachable default
    broadcast 10.2.1.0 dev eth1  proto kernel  scope link  src 10.2.1.2
    10.2.1.0/24 dev eth1  proto kernel  scope link  src 10.2.1.2
    local 10.2.1.2 dev eth1  proto kernel  scope host  src 10.2.1.2
    broadcast 10.2.1.255 dev eth1  proto kernel  scope link  src 10.2.1.2

    $ ip route get oif vrf-red 1.1.1.1
    1.1.1.1 dev vrf-red  src 10.0.0.2
        cache

With this patch:
    $  ip route get oif vrf-red 1.1.1.1
    RTNETLINK answers: No route to host

which is the correct response based on the default route

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobpf, skb_do_redirect: clear sender_cpu before xmit
Daniel Borkmann [Wed, 7 Oct 2015 08:16:09 +0000 (10:16 +0200)]
bpf, skb_do_redirect: clear sender_cpu before xmit

Similar to commit c29390c6dfee ("xps: must clear sender_cpu before
forwarding"), we also need to clear the skb->sender_cpu when moving
from RX to TX via skb_do_redirect() due to the shared location of
napi_id (used on RX) and sender_cpu (used on TX).

Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: hns: fix 32-bit build warning
Arnd Bergmann [Tue, 6 Oct 2015 21:53:57 +0000 (23:53 +0200)]
net: hns: fix 32-bit build warning

The recently added hns driver causes a build warning in ARM
allmodconfig builds:

drivers/net/ethernet/hisilicon/hns/hnae.c: In function 'handles_show':
drivers/net/ethernet/hisilicon/hns/hnae.c:452:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
          j, (u64)h->qs[i]->io_base);
             ^

This removes the pointless cast and prints the pointer address using
the "%p" format string in all three locations.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Microchip encx24j600 driver
Jon Ringle [Tue, 6 Oct 2015 20:37:46 +0000 (16:37 -0400)]
net: Microchip encx24j600 driver

This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet
controller over a SPI bus interface. This driver makes use of the regmap API to
optimize access to registers by caching registers where possible.

Datasheet:
http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf

Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'broadcom-iproc'
David S. Miller [Thu, 8 Oct 2015 11:46:03 +0000 (04:46 -0700)]
Merge branch 'broadcom-iproc'

Arun Parameswaran says:

====================
Add support for Broadcom's iProc MDIO and Cygnus Ethernet PHY

This patchset adds support for the iProc MDIO interface and the
Broadcom Cygnus SoC's internal Ethernet PHY.

The internal Ethernet PHY(s) in the Cygnus SoC's are accessed
via the MDIO interface found in most of the iProc based chips.

The patch also consolidates the common API's used by the
Broadcom phys to a common library. Existing Broadcom phy
drivers have been modified to use the common library API's.

This patch series is based on Linux v4.3-rc1 and is avaliable in:
https://github.com/Broadcom/cygnus-linux/tree/cygnus-net-phy-mdio-v3

The Ethernet driver for the iProc family will be submitted soon,
as will the device tree configurations for the different iProc
family SoCs.

Changes from v2:
- Modified drivers/net/phy/Kconfig to modify the BCM_CYGNUS_PHY
  driver to 'depends on MDIO_BCM_IPROC' instead of 'select'.
- Added github branch to the cover letter

Changes from v1:
- Updated device tree documentation for the iProc MDIO driver
  based on Florian's feedback.
- Moved the core register defines from the Cygnus PHY driver to
  'include/linux/brcmphy.h' based on Florian's feedback.
- Created a new patch/commit to modify the bcm7xxx phy driver
  to use the new core register defines.
- Modified the Kconfig entry for the Broadcom PHY library to
  'tristate' instead of 'bool'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: bcm7xxx: Modified to use global core register defines
Arun Parameswaran [Tue, 6 Oct 2015 19:25:50 +0000 (12:25 -0700)]
net: phy: bcm7xxx: Modified to use global core register defines

Modified the bcm7xxx phy driver to remove local core register
defines and use the common ones from "include/linux/brcmphy.h"

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: Broadcom Cygnus internal Etherent PHY driver
Arun Parameswaran [Tue, 6 Oct 2015 19:25:49 +0000 (12:25 -0700)]
net: phy: Broadcom Cygnus internal Etherent PHY driver

Add support for the Broadcom Cygnus SoCs internal PHY's.
The PHYs are 1000M/100M/10M capable with support for 'EEE'
and 'APD' (Auto Power Down).

This driver supports the following Broadcom Cygnus SoCs:
 - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305)
 - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360)

The PHY's on these SoC's require some workarounds for
stable operation, both during configuration time and
during suspend/resume. This driver handles the
application of the workarounds.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: Add Broadcom phy library for common interfaces
Arun Parameswaran [Tue, 6 Oct 2015 19:25:48 +0000 (12:25 -0700)]
net: phy: Add Broadcom phy library for common interfaces

This patch adds the Broadcom phy library to consolidate common
interfaces shared by Broadcom phy's.

Moved the common interfaces to the 'bcm-phy-lib.c' and updated
the Broadcom PHY drivers to use the new APIs.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: phy: Broadcom iProc MDIO bus driver
Arun Parameswaran [Tue, 6 Oct 2015 19:25:47 +0000 (12:25 -0700)]
net: phy: Broadcom iProc MDIO bus driver

This patch adds support for the Broadcom iProc MDIO bus interface.
The MDIO interface can be found in the Broadcom iProc family Soc's.

The MDIO bus is accessed using a combination of command and data
registers. This MDIO driver provides access to the Etherent GPHY's
connected to the MDIO bus.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodt-bindings: net: Broadcom iProc MDIO bus driver device tree binding
Arun Parameswaran [Tue, 6 Oct 2015 19:25:46 +0000 (12:25 -0700)]
dt-bindings: net: Broadcom iProc MDIO bus driver device tree binding

Add device tree binding documentation for the Broadcom iProc MDIO
bus driver.

Signed-off-by: Arun Parameswaran <arunp@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'net/rds/4.3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/ssanto...
David S. Miller [Thu, 8 Oct 2015 11:38:37 +0000 (04:38 -0700)]
Merge branch 'net/rds/4.3-v3' of git://git./linux/kernel/git/ssantosh/linux

Santosh Shilimkar says:

====================
RDS: connection scalability and performance improvements

[v4]
Re-sending the same patches from v3 again since my repost of
patch 05/14 from v3 was whitespace damaged.

[v3]
Updated patch "[PATCH v2 05/14] RDS: defer the over_batch work to
send worker" as per David Miller's comment [4] to avoid the magic
value usage. Patch now makes use of already available but unused
send_batch_count module parameter. Rest of the patches are same as
earlier version v2 [3]

[v2]:
Dropped "[PATCH 05/15] RDS: increase size of hash-table to 8K" from
earlier version [1]. I plan to address the hash table scalability using
re-sizable hash tables as suggested by David Laight and David Miller [2]

This series addresses RDS connection bottlenecks on massive workloads and
improve the RDMA performance almost by 3X. RDS TCP also gets a small gain
of about 12%.

RDS is being used in massive systems with high scalability where several
hundred thousand end points and tens of thousands of local processes
are operating in tens of thousand sockets. Being RC(reliable connection),
socket bind and release happens very often and any inefficiencies in
bind hash look ups hurts the overall system performance. RDS bin hash-table
uses global spin-lock which is the biggest bottleneck. To make matter worst,
it uses rcu inside global lock for hash buckets.
This is being addressed by simply using per bucket rw lock which makes the
locking simple and very efficient. The hash table size is still an issue and
I plan to address it by using re-sizable hash tables as suggested on the list.

For RDS RDMA improvement, the completion handling is revamped so that we
can do batch completions. Both send and receive completion handlers are
split logically to achieve the same. RDS 8K messages being one of the
key usecase, mr pool is adapted to have the 8K mrs along with default 1M
mrs. And while doing this, few fixes and couple of bottlenecks seen with
rds_sendmsg() are addressed.

Series applies against 4.3-rc1 as well net-next. Its tested on Oracle
hardware with IB fabric for both bcopy as well as RDMA mode. RDS TCP is
tested with iXGB NIC. Like last time, iWARP transport is untested with
these changes. The patchset is also available at below git repo:

git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds/4.3-v3

As a side note, the IB HCA driver I used for testing misses at least 3
important patches in upstream to see the full blown IB performance and
am hoping to get that in mainline with help of them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'pass_net_through_output_path'
David S. Miller [Thu, 8 Oct 2015 11:27:13 +0000 (04:27 -0700)]
Merge branch 'pass_net_through_output_path'

Eric W. Biederman says:

====================
net: Pass net through the output path v2

This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.

The first patch in this series is a fix for a bug that came in when sk
was passed through the functions in the output path, and as such is
probably a candidate for net.  At the same time my later patches depend
on it so sending the fix separately would be confusing.

The second patch in this series is another fix that for an issue that
came in when sk was passed through the output path.  I don't think it
needs a backport as I don't think anyone uses the path where the code
was incorrect.

The rest of the patchset focuses on the path from xxx_local_out to
dst_output and in the end succeeds in passing sock_net(sk) from the
socket a packet locally originates on to the dst->output function.

Given the size reduction in the code I think this counts as a cleanup as
much as feature work.

There remain a number of helper functions (like ip option processing) to
take care of before the network stack can support destination devices in
other network namespaces but with this set of changes the backbone of
the work is done.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodst: Pass net into dst->output
Eric W. Biederman [Wed, 7 Oct 2015 21:48:47 +0000 (16:48 -0500)]
dst: Pass net into dst->output

The network namespace is already passed into dst_output pass it into
dst->output lwt->output and friends.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4, ipv6: Pass net into ip_local_out and ip6_local_out
Eric W. Biederman [Wed, 7 Oct 2015 21:48:46 +0000 (16:48 -0500)]
ipv4, ipv6: Pass net into ip_local_out and ip6_local_out

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out
Eric W. Biederman [Wed, 7 Oct 2015 21:48:45 +0000 (16:48 -0500)]
ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipvlan: Cache net in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound
Eric W. Biederman [Wed, 7 Oct 2015 21:48:44 +0000 (16:48 -0500)]
ipvlan: Cache net in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound

Compute net once in ipvlan_process_v4_outbound and
ipvlan_process_v6_outbound and store it in a variable so that net does
not need to be recomputed next time it is used.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoppp: Cache net in pptp_xmit
Eric W. Biederman [Wed, 7 Oct 2015 21:48:43 +0000 (16:48 -0500)]
ppp: Cache net in pptp_xmit

Compute net and store it in a variable in pptp_xmit, so that the value
can be reused the next time it is needed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit
Eric W. Biederman [Wed, 7 Oct 2015 21:48:42 +0000 (16:48 -0500)]
ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit

Compute net and store it in a variable in the functions
ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be
recomputed next time it is needed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Cache net in iptunnel_xmit
Eric W. Biederman [Wed, 7 Oct 2015 21:48:41 +0000 (16:48 -0500)]
ipv4: Cache net in iptunnel_xmit

Store net in a variable in ip_tunnel_xmit so it does not need
to be recomputed when it is used again.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Merge ip6_local_out and ip6_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:40 +0000 (16:48 -0500)]
ipv6: Merge ip6_local_out and ip6_local_out_sk

Stop hidding the sk parameter with an inline helper function and make
all of the callers pass it, so that it is clear what the function is
doing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: Merge __ip6_local_out and __ip6_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:39 +0000 (16:48 -0500)]
ipv6: Merge __ip6_local_out and __ip6_local_out_sk

Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk
__ip6_local_out and remove the previous __ip6_local_out.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Merge ip_local_out and ip_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:38 +0000 (16:48 -0500)]
ipv4: Merge ip_local_out and ip_local_out_sk

It is confusing and silly hiding a parameter so modify all of
the callers to pass in the appropriate socket or skb->sk if
no socket is known.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Merge __ip_local_out and __ip_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:37 +0000 (16:48 -0500)]
ipv4: Merge __ip_local_out and __ip_local_out_sk

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodst: Pass a sk into .local_out
Eric W. Biederman [Wed, 7 Oct 2015 21:48:36 +0000 (16:48 -0500)]
dst: Pass a sk into .local_out

For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a
struct net reference into .local_out.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Pass net into dst_output and remove dst_output_okfn
Eric W. Biederman [Wed, 7 Oct 2015 21:48:35 +0000 (16:48 -0500)]
net: Pass net into dst_output and remove dst_output_okfn

Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoxfrm: Only compute net once in xfrm_policy_queue_process
Eric W. Biederman [Wed, 7 Oct 2015 21:48:34 +0000 (16:48 -0500)]
xfrm: Only compute net once in xfrm_policy_queue_process

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:33 +0000 (16:48 -0500)]
ipv4: Fix ip_queue_xmit to pass sk into ip_local_out_sk

After a packet has been encapsulated by a tunnel we should use the
tunnel sockets local multicast loopback flag to control if the
encapsulated packet should be locally loopback back.

Pass sk into ip_local_out_sk so that in the rare case we are dealing
with a tunneled packet whose tunnel destination address is a multicast
address the kernel properly decides to loopback this packet.

In practice I don't think this matters as ip_queue_xmit is used by
tcp, l2tp and sctp none of which I am aware of uses ip level
multicasting as they are all point to point communications protocols.
Let's fix this before someone uses ip_queue_xmit for a tunnel protocol
that does use multicast.

Fixes: aad88724c9d5 ("ipv4: add a sock pointer to dst->output() path.")
Fixes: b0270e91014d ("ipv4: add a sock pointer to ip_queue_xmit()")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk
Eric W. Biederman [Wed, 7 Oct 2015 21:48:32 +0000 (16:48 -0500)]
ipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk

In the rare case where sk != skb->sk ip_local_out_sk arranges
to call dst->output differently if the skb is queued or not.
This is a bug.

Fix this bug by passing the sk parameter of ip_local_out_sk through
from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out).

Fixes: 7026b1ddb6b8 ("netfilter: Pass socket pointer down through okfn().")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 8 Oct 2015 11:21:09 +0000 (04:21 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-07

This series contains updates to i40e and i40evf only.

Paul updates i40e to simply increase the amount of time we wait for a
reset to complete since we have seen in some rare occasions the reset
can take longer to complete.

Shannon updates the driver to turn on Wake-on-LAN by default if it is
enabled in the hardware config to begin with, rather than always disable
it and wait for the user to expressly turn it on.  Added new device id's
and support for future devices.  Fixed a possible type compare problem
between a size and possible negative number.  Also fixed a shift value
that was wrong, which ended up with a bad bitmask.  Did general house
cleaning of the driver to cleanup several low lying fruit in the
driver.  Fixed an issue where new unicast address's would be added to
the VSI list and then immediately removed and would never actually
make it down to the hardware.  Resolved the issue by removing the
separation from unicast and multicast in the search for filters to be
deleted.

Mitch fixes an issue where the hardware would continue to access the
memory formerly used by the rings for a VF which have been removed,
causing memory corruption or DMAR errors.  To relieve this condition,
explicitly stop all rings associated with each VF before releasing its
resources.  Also fixed a panic if the driver is unable to enable MSI-X
or its unable to acquire enough vectors, so propagate interrupt
allocation failure information to the calling function.  Cleaned up
opcode that is not required.

Carolyn extends the size of the test available for the interrupt names
so that all the descriptive data available for the Flow Director
interrupts is not truncated.

Catherine fixes an issue where there was a possibility of speed getting
set to 0 if advertised is set to 0 (which is the case when autoneg is
disabled).

Jesse fixes the checksum on big endian machines, so added code to swap
it correctly.  Also fixed a bug in the return from get_link_status()
where only true or false was being returned, but false could mean
multiple things.  So allow the caller to get all the return values
in the call chain bubbled back to the source so that the reason for
the failure does not get lost.

Anjali adds statistics to keep track of how many times we ask the stack
to linearize the SKB because the hardware cannot handle SKBs with more
than 8 frags per segment/single packet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'regmap-offload-update-bits' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 8 Oct 2015 11:01:28 +0000 (04:01 -0700)]
Merge tag 'regmap-offload-update-bits' of git://git./linux/kernel/git/broonie/regmap

regmap: Allow buses to provide a custom update_bits() operation

Some buses provide a native _update_bits() operation which for uncached
registers is faster than doing a read/modify/write cycle as it is a
single bus transaction.  Add support for implementing this to regmap.

9 years agoi40e/i40evf: remove unused opcode
Mitch Williams [Thu, 27 Aug 2015 15:42:32 +0000 (11:42 -0400)]
i40e/i40evf: remove unused opcode

This opcode is not required. VFs that program RSS through the firmware
do it by interacting directly with the firmware, and do not need to use
the virtual channel for this functionality.

Change-ID: Iaf17d2600e28ff1b6be8653f2fe9df1facd23b0e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40evf: propagate interrupt allocation failure
Mitch Williams [Thu, 27 Aug 2015 15:42:31 +0000 (11:42 -0400)]
i40evf: propagate interrupt allocation failure

Lower level functions are properly reporting errors, and higher-level
functions are correctly responding to errors, but the errors aren't
actually getting through. Typically, the middle-manager function seems
to want to shield its boss from any bad news.

This change fixes a panic if the driver is unable to enable MSI-X or is
unable to acquire enough vectors.

Change-ID: Ifd5787ce92519a5d97e4b465902db930d97b71a1
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Additional checks for CEE APP priority validity
Neerav Parikh [Thu, 27 Aug 2015 15:42:30 +0000 (11:42 -0400)]
i40e: Additional checks for CEE APP priority validity

The firmware has added additional status information to allow software
to determine if the APP priority for FCoE/iSCSI/FIP is valid or not in
CEE DCBX mode.

This patch adds to support those additional checks and will only add
applications to the software table that have oper and sync bits set
without any error.

Change-ID: I0a76c52427dadf97d4dba4538a3068d05e4eb56b
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: Add a stat to keep track of linearization count
Anjali Singhai Jain [Thu, 27 Aug 2015 15:42:29 +0000 (11:42 -0400)]
i40e/i40evf: Add a stat to keep track of linearization count

Keep track of how many times we ask the stack to linearize the
skb because the HW cannot handle skbs with more than 8 frags per
segment/single packet.

Change-ID: If455452060963a769bbe6112cba952e79e944b52
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: fix unicast mac address add
Shannon Nelson [Wed, 26 Aug 2015 19:14:20 +0000 (15:14 -0400)]
i40e/i40evf: fix unicast mac address add

When using something like "ip maddr add ..." to add another unicast mac
address to the netdev, the mac address comes into the set_rx_mode handler
in the multicast list whether it is a unicast or multicast address.
This was confusing the code when it was trying to search for addresses
that needed to be deleted from the VSI, because it was looking for the
VSI unicast address in the netdev unicast list.  The result was that a
new unicast address would get added to the VSI list and then immediately
removed, and would never actually make it down into the hardware.

This patch removes the separation from unicast and multicast in the search
for filters to be deleted.  It also simplifies the logic a little with a
jump to the bottom of the loop when an address is found.  Now it doesn't
matter which netdev list the address is hiding in, we'll check them all.

Change-ID: Ie3685a92427ae7d2212bf948919ce295bc7a874c
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: fix bug in return from get_link_status and avoid spurious link messages
Jesse Brandeburg [Wed, 26 Aug 2015 19:14:19 +0000 (15:14 -0400)]
i40e: fix bug in return from get_link_status and avoid spurious link messages

Previously, the driver could call this function and have only true/false
returned, but false could mean multiple things like failure to read
or link was down. This change allows the caller to get all return values
in the call chain bubbled back to the source, which keeps information about
failures from being lost.

Also, in some unlikely scenarios, the firmware can become slow to respond
to admin queue (AQ) queries for link state.  Should the AQ time out,
the driver can detect the state and avoid a link change when there
may have been none.

Change-ID: Ib2ac38407b7880750fb891b392fa77457fe6c21c
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: add little endian conversion for checksum
Jesse Brandeburg [Wed, 26 Aug 2015 19:14:18 +0000 (15:14 -0400)]
i40e: add little endian conversion for checksum

The checksum is not correct on big endian machines so add code to swap it
correctly.

Change-ID: Ic92b886d172a2cbe49f5d7eee1bc78e447023c7b
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: give up the __func__
Shannon Nelson [Wed, 26 Aug 2015 19:14:17 +0000 (15:14 -0400)]
i40e/i40evf: give up the __func__

During early development, we added the function name to all of the error
strings to make debugging simpler. Now that we've released the driver,
our users should have more comprehensible error messages. So tear the
roof off and give up the __func__. Ow.

Change-ID: I7e1766252c7a032b9af6520da6aff536bdfd533c
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Never let speed get set to 0 in get_settings
Catherine Sullivan [Wed, 26 Aug 2015 19:14:16 +0000 (15:14 -0400)]
i40e: Never let speed get set to 0 in get_settings

In ethtool, there is a possibility of speed getting set to 0
if advertise is set to 0 (which it is when autoneg is disabled).
We never want this to happen as the firmware will actually attempt
to set the speed to 0 sending link down, so add an extra check
to make sure this doesn't happen.

Change-ID: I62e0eeee2cbf043d8e6f5c9c9f0b92794e877f01
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Fix for truncated interrupt name
Carolyn Wyborny [Wed, 26 Aug 2015 19:14:15 +0000 (15:14 -0400)]
i40e: Fix for truncated interrupt name

This patch extends the size of the text available for the interrupt names.
Without this patch, all the descriptive data available for the Flow
Director interrupts is truncated.

Change-ID: I2ac458f23ac3b4ea8f1edf73edc283b1d3704c7f
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: assure clean asq status report
Shannon Nelson [Wed, 26 Aug 2015 19:14:14 +0000 (15:14 -0400)]
i40e/i40evf: assure clean asq status report

There was a possibility where the asq_last_status could get through without
update and thus report a previous error.  I don't think we've actually seen
this happen, but this patch will help make sure it doesn't.

Change-ID: I9e33927052a5ee6ea21f80b66d4c4b76c2760b17
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Christopher Pau <christopher.pau@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: make i40e_init_pf_fcoe to void
Shannon Nelson [Wed, 26 Aug 2015 19:14:13 +0000 (15:14 -0400)]
i40e: make i40e_init_pf_fcoe to void

i40e_init_pf_fcoe() didn't return anything except 0, it prints enough
error info already, and no driver logic depends on the return value,
so this can be void.

Change-ID: Ie6afad849857d87a7064c42c3cce14c74c2f29d8
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: fix bad CEE status shift value
Shannon Nelson [Wed, 26 Aug 2015 19:14:12 +0000 (15:14 -0400)]
i40e: fix bad CEE status shift value

Fix a shift value that was wrong, ending up with a bad bitmask.  Also add
a blank line between two sets of #defines for better readability.

Change-ID: I3e41fa2a2ab904d3a4e6cbf13972ab0036a10601
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: fix a potential type compare issue
Shannon Nelson [Wed, 26 Aug 2015 19:14:11 +0000 (15:14 -0400)]
i40e/i40evf: fix a potential type compare issue

Rework an if expression to assure there is no type compare problem between
a size and a possible negative number.

Change-ID: I4921fcc96abfcf69490efce020a9e4007f251c99
Reported-by: Helin Zhang <helin.zhang@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e/i40evf: add driver support for new device ids
Shannon Nelson [Wed, 26 Aug 2015 19:14:10 +0000 (15:14 -0400)]
i40e/i40evf: add driver support for new device ids

Early addition of new a device id.

Change-ID: I61a8c8556fdf4f5714be4e4089689e374f30293c
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: stop VF rings
Mitch Williams [Wed, 26 Aug 2015 19:14:09 +0000 (15:14 -0400)]
i40e: stop VF rings

Explicitly stop the rings belonging to each VF when disabling SR-IOV.
Even though the VFs were gone, and the associated VSIs were removed, the
rings were not stopped, and in some circumstances the hardware would
continue to access the memory formerly used by the rings, causing memory
corruption or DMAR errors, both of which would lead to general malaise
of the kernel.

To relieve this condition, explicitly stop all the rings associated with
each VF before releasing its resources.

Change-ID: I78c05d562c66e7b594b7e48d67860f49b3e5b6ec
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: enable WoL operation if config bit show WoL capable
Shannon Nelson [Wed, 26 Aug 2015 19:14:08 +0000 (15:14 -0400)]
i40e: enable WoL operation if config bit show WoL capable

The driver was disabling Wake-on-LAN by default and waiting for the user
to expressly turn it on.  This patch has the driver turning on WoL from
the start if enabled in the hardware config, which matches the behavior
of our other drivers.

Change-ID: I43faedb907f8ba4d1a61b72a7c86072b97af12b1
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoi40e: Increase the amount of time we wait for reset to be done
Paul M Stillwell Jr [Wed, 26 Aug 2015 19:14:07 +0000 (15:14 -0400)]
i40e: Increase the amount of time we wait for reset to be done

In some rare cases the reset can take longer to complete so increase the
amount of time we wait.

Change-ID: Ib5628ec54b526a811ee33d1214fe763226406671
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agotcp: ensure prior synack rtx behavior with small backlogs
Eric Dumazet [Tue, 6 Oct 2015 21:49:58 +0000 (14:49 -0700)]
tcp: ensure prior synack rtx behavior with small backlogs

Some applications use a listen() backlog of 1.

Prior kernels were silently enforcing a qlen_log of 4, so that we were
sending up to /proc/sys/net/ipv4/tcp_synack_retries SYNACK messages.

Fixes: ef547f2ac16b ("tcp: remove max_qlen_log")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: ipv4: tcp.c Fixed an assignment coding style issue
Yuvaraja Mariappan [Tue, 6 Oct 2015 17:53:29 +0000 (10:53 -0700)]
net: ipv4: tcp.c Fixed an assignment coding style issue

Fixed an assignment coding style issue

Signed-off-by: Yuvaraja Mariappan <ymariappan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 's390-net'
David S. Miller [Wed, 7 Oct 2015 11:52:14 +0000 (04:52 -0700)]
Merge branch 's390-net'

Ursula Braun says:

====================
s390: qeth patches for net-next

here are some s390 related patches for net-next. The qeth patches
are performance optimizations in the driver. The qdio patch corrects
a warning condition.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agos390/qdio: fix WARN_ON_ONCE condition
Eugene Crosser [Tue, 6 Oct 2015 13:12:29 +0000 (15:12 +0200)]
s390/qdio: fix WARN_ON_ONCE condition

If HiperSockets Completion Queueing is enabled, qdio always
issues a warning, since the condition is always met.
This patch fixes the condition in WARN_ON_ONCE that was always
true.

Signed-off-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agos390/qeth: optimize MAC handling in rx_mode callback
Lakhvich Dmitriy [Tue, 6 Oct 2015 13:12:28 +0000 (15:12 +0200)]
s390/qeth: optimize MAC handling in rx_mode callback

In layer2 mode of the qeth driver, MAC address lists
from struct net_device require mapping to the OSA-card.
The existing implementation is inefficient for lists with
more than several MAC addresses, since for every
ndo_set_rx_mode callback it removes all MAC addresses first,
and then registers the current MAC address list.
This patch changes implementation of ndo_set_rx_mode callback
in qeth, only performing hardware registration/removal for
new/deleted addresses. To shorten lookup of MAC addresses
registered addresses are kept in a hashtable instead of a
linear list.

Signed-off-by: Lakhvich Dmitriy <ldmitriy@ru.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Reviewed-by: Eugene Crosser <Eugene.Crosser@ru.ibm.com>
Reviewed-by: Thomas Richter <tmricht@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agos390/qeth: switch to napi_gro_receive
Thomas Richter [Tue, 6 Oct 2015 13:12:27 +0000 (15:12 +0200)]
s390/qeth: switch to napi_gro_receive

Add support for GRO (generic receive offload) in the layer 2
part of device driver qeth. This results in a performance
improvement when GRO and RX is turned on.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bridge-netlink-port-attrs'
David S. Miller [Wed, 7 Oct 2015 11:49:39 +0000 (04:49 -0700)]
Merge branch 'bridge-netlink-port-attrs'

Nikolay Aleksandrov says:

====================
bridge: netlink: complete port attribute support

This is the second set that completes the bridge port's netlink support and
makes everything from sysfs available via netlink. I've used sysfs as a
guide of what and how to set again. I've tested setting/getting every
option and also this time tested enabling KASAN. Again there're a few long
line warnings about the ifla attribute names in br_port_info_size() but
as the previous set - it's good to know what's been accounted for.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: add support for port's multicast_router attribute
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:12:02 +0000 (14:12 +0200)]
bridge: netlink: add support for port's multicast_router attribute

Add IFLA_BRPORT_MULTICAST_ROUTER to allow setting/getting port's
multicast_router via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: allow to flush port's fdb
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:12:01 +0000 (14:12 +0200)]
bridge: netlink: allow to flush port's fdb

Add IFLA_BRPORT_FLUSH to allow flushing port's fdb similar to sysfs's
flush.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's timer values
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:12:00 +0000 (14:12 +0200)]
bridge: netlink: export port's timer values

Add the following attributes in order to export port's timer values:
IFLA_BRPORT_MESSAGE_AGE_TIMER, IFLA_BRPORT_FORWARD_DELAY_TIMER and
IFLA_BRPORT_HOLD_TIMER.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's topology_change_ack and config_pending
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:11:59 +0000 (14:11 +0200)]
bridge: netlink: export port's topology_change_ack and config_pending

Add IFLA_BRPORT_TOPOLOGY_CHANGE_ACK and IFLA_BRPORT_CONFIG_PENDING to
allow getting port's topology_change_ack and config_pending respectively
via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's id and number
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:11:58 +0000 (14:11 +0200)]
bridge: netlink: export port's id and number

Add IFLA_BRPORT_(ID|NO) to allow getting port's port_id and port_no
respectively via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's designated cost and port
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:11:57 +0000 (14:11 +0200)]
bridge: netlink: export port's designated cost and port

Add IFLA_BRPORT_DESIGNATED_(COST|PORT) to allow getting the port's
designated cost and port respectively via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's bridge id
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:11:56 +0000 (14:11 +0200)]
bridge: netlink: export port's bridge id

Add IFLA_BRPORT_BRIDGE_ID to allow getting the designated bridge id via
netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: export port's root id
Nikolay Aleksandrov [Tue, 6 Oct 2015 12:11:55 +0000 (14:11 +0200)]
bridge: netlink: export port's root id

Add IFLA_BRPORT_ROOT_ID to allow getting the designated root id via
netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Lookup actual route when oif is VRF device
David Ahern [Mon, 5 Oct 2015 17:49:04 +0000 (10:49 -0700)]
net: Lookup actual route when oif is VRF device

If the user specifies a VRF device in a get route query the custom route
pointing to the VRF device is returned:

    $ ip route ls table vrf-red
    unreachable default
    broadcast 10.2.1.0 dev eth1  proto kernel  scope link  src 10.2.1.2
    10.2.1.0/24 dev eth1  proto kernel  scope link  src 10.2.1.2
    local 10.2.1.2 dev eth1  proto kernel  scope host  src 10.2.1.2
    broadcast 10.2.1.255 dev eth1  proto kernel  scope link  src 10.2.1.2

    $ ip route get oif vrf-red 10.2.1.40
    10.2.1.40 dev vrf-red
        cache

Add the flags to skip the custom route and go directly to the FIB. With
this patch the actual route is returned:

    $ ip route get oif vrf-red 10.2.1.40
    10.2.1.40 dev eth1  src 10.2.1.2
        cache

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'mac80211-next-for-davem-2015-10-05' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 7 Oct 2015 11:29:18 +0000 (04:29 -0700)]
Merge tag 'mac80211-next-for-davem-2015-10-05' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For the current cycle, we have the following right now:
 * many internal fixes, API improvements, cleanups, etc.
 * full AP client state tracking in cfg80211/mac80211 from Ayala
 * VHT support (in mac80211) for mesh
 * some A-MSDU in A-MPDU support from Emmanuel
 * show current TX power to userspace (from Rafał)
 * support for netlink dump in vendor commands (myself)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'l3mdev_saddr_op'
David S. Miller [Wed, 7 Oct 2015 11:27:51 +0000 (04:27 -0700)]
Merge branch 'l3mdev_saddr_op'

David Ahern says:

====================
net: Add saddr op to l3mdev and vrf

First 2 patches are re-sends of patches that got lost in the ethosphere
Tuesday; they were part of the first round of l3mdev conversions.
Next 3 handle the source address lookup for raw and datagram sockets
bound to a VRF device.

The conversion to the get_saddr op also fixes locally originated TCP
packets showing up at the VRF device. The use of the FLOWI_FLAG_L3MDEV_SRC
flag in ip_route_connect_init was causing locally generated packets
to skip the VRF device.

v2
- rebased to top of net-next per device delete fix and hash based
  multipath patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add l3mdev saddr lookup to raw_sendmsg
David Ahern [Mon, 5 Oct 2015 15:51:27 +0000 (08:51 -0700)]
net: Add l3mdev saddr lookup to raw_sendmsg

ping originated on box through a VRF device is showing up in tcpdump
without a source address:
    $ tcpdump -n -i vrf-blue
    08:58:33.311303 IP 0.0.0.0 > 10.2.2.254: ICMP echo request, id 2834, seq 1, length 64
    08:58:33.311562 IP 10.2.2.254 > 10.2.2.2: ICMP echo reply, id 2834, seq 1, length 64

Add the call to l3mdev_get_saddr to raw_sendmsg.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add source address lookup op for VRF
David Ahern [Mon, 5 Oct 2015 15:51:26 +0000 (08:51 -0700)]
net: Add source address lookup op for VRF

Add operation to l3mdev to lookup source address for a given flow.
Add support for the operation to VRF driver and convert existing
IPv4 hooks to use the new lookup.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Refactor path selection in __ip_route_output_key_hash
David Ahern [Mon, 5 Oct 2015 15:51:25 +0000 (08:51 -0700)]
net: Refactor path selection in __ip_route_output_key_hash

VRF device needs the same path selection following lookup to set source
address. Rather than duplicating code, move existing code into a
function that is exported to modules.

Code move only; no functional change.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add netif_is_l3_slave
David Ahern [Mon, 5 Oct 2015 15:51:24 +0000 (08:51 -0700)]
net: Add netif_is_l3_slave

IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave.
Add a new private flag and add netif_is_l3_slave function for checking
it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC
David Ahern [Mon, 5 Oct 2015 15:51:23 +0000 (08:51 -0700)]
net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Fix vti use case with oif in dst lookups for IPv6
David Ahern [Mon, 5 Oct 2015 14:32:51 +0000 (08:32 -0600)]
net: Fix vti use case with oif in dst lookups for IPv6

It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add
RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup
needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes
the oif to be considered in lookups which is known to break vti. This
explains why 58189ca7b274 did not the IPv6 change at the time it was
submitted.

Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agogianfar: Add WAKE_UCAST and "wake-on-filer" support
Claudiu Manoil [Mon, 5 Oct 2015 14:19:59 +0000 (17:19 +0300)]
gianfar: Add WAKE_UCAST and "wake-on-filer" support

This enables eTSEC's filer (Rx parser) and the FGPI Rx
interrupt (Filer General Purpose Interrupt) as a wakeup
source event.

Upon entering suspend state, the eTSEC filer is given
a rule to match incoming L2 unicast packets.  A packet
matching the rule will be enqueued in the Rx ring and
a FGPI Rx interrupt will be asserted by the filer to
wakeup the system.  Other packet types will be dropped.
On resume the filer table is restored to the content
before entering suspend state.
The set of rules from gfar_filer_config_wol() could be
extended to implement other WoL capabilities as well.

The "fsl,wake-on-filer" DT binding enables this capability
on certain platforms that feature the necessary power
management infrastructure, targeting mainly printing and
imaging applications.
(refer to Power Management section of the SoC Ref Man)

Cc: Li Yang <leoli@freescale.com>
Cc: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agopowerpc: dts: p1022si: Add fsl,wake-on-filer for eTSEC
Claudiu Manoil [Mon, 5 Oct 2015 14:19:58 +0000 (17:19 +0300)]
powerpc: dts: p1022si: Add fsl,wake-on-filer for eTSEC

Enable the "wake-on-filer" (aka. wake on user defined packet)
wake on lan capability for the eTSEC ethernet nodes.

Cc: Li Yang <leoli@freescale.com>
Cc: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agodoc: dt: net: Add fsl,wake-on-filer for eTSEC
Claudiu Manoil [Mon, 5 Oct 2015 14:19:57 +0000 (17:19 +0300)]
doc: dt: net: Add fsl,wake-on-filer for eTSEC

Add the "fsl,wake-on-filer" property for eTSEC nodes to
indicate that the system has the power management
infrastructure needed to be able to wake up the system
via FGPI (filer, aka. h/w rx parser) interrupt.

Cc: Li Yang <leoli@freescale.com>
Cc: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'ovs-ipv6-tunnel'
David S. Miller [Wed, 7 Oct 2015 11:18:04 +0000 (04:18 -0700)]
Merge branch 'ovs-ipv6-tunnel'

Jiri Benc says:

====================
openvswitch: add IPv6 tunneling support

This builds on the previous work that added IPv6 support to lwtunnels and
adds IPv6 tunneling support to ovs.

To use IPv6 tunneling, there needs to be a metadata based tunnel net_device
created and added to the ovs bridge. Currently, only vxlan is supported by
the kernel, with geneve to follow shortly. There's no need nor intent to add
a support for this into the vport-vxlan (etc.) compat layer.

v3: dropped the last two patches added in v2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: netlink attributes for IPv6 tunneling
Jiri Benc [Mon, 5 Oct 2015 11:09:47 +0000 (13:09 +0200)]
openvswitch: netlink attributes for IPv6 tunneling

Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support
for tunnels.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoopenvswitch: add tunnel protocol to sw_flow_key
Jiri Benc [Mon, 5 Oct 2015 11:09:46 +0000 (13:09 +0200)]
openvswitch: add tunnel protocol to sw_flow_key

Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now
also acts as an indicator whether the flow contains tunnel data (this was
previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses
in an union with IPv4 ones this won't work anymore).

The new field was added to a hole in sw_flow_key.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobridge: netlink: make br_fill_info's frame size smaller
Nikolay Aleksandrov [Mon, 5 Oct 2015 10:11:21 +0000 (12:11 +0200)]
bridge: netlink: make br_fill_info's frame size smaller

When KASAN is enabled the frame size grows > 2048 bytes and we get a
warning, so make it smaller.
net/bridge/br_netlink.c: In function 'br_fill_info':
>> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes
>> is larger than 2048 bytes [-Wframe-larger-than=]

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Add support for filtering neigh dump by device index
David Ahern [Sat, 3 Oct 2015 18:43:46 +0000 (11:43 -0700)]
net: Add support for filtering neigh dump by device index

Add support for filtering neighbor dumps by device by adding the
NDA_IFINDEX attribute to the dump request.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 7 Oct 2015 10:01:53 +0000 (03:01 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-10-03

This series contains updates to i40e and i40evf, some of which are to
resolve more Red Hat bugzilla issues.

Jiang Liu updates the i40e and i40evf drivers to use numa_mem_id()
instead of numa_node_id() to get the nearest node with memory which
better supports memoryless nodes.

Anjali fixes an issue from Dan Carpenter <dan.carpenter@oracle.com>,
to resolve a memory leak in X722 RSS configuration path, where we should
free the memory allocated before exiting.

Shannon modifies the drivers to ensure we have the spinlocks before we
clear the ARQ and ASQ management registers.  In addition, we widen the
locked portion insert a sanity check to ensure we are working with safe
register values.

Mitch fixes an issue where under certain circumstances, we can get an
extra VF_RESOURCES message from the PF driver at runtime.  When this
occurs, we need to parse it because our VSI may have changed and that
will affect the relationship with the PF driver.  But this parsing also
blows away our current MAC address, so resolve the issue by restoring
the current MAC address from the netdev struct after we parse the
resource message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dsa: better error reporting
Russell King [Sat, 3 Oct 2015 17:09:07 +0000 (18:09 +0100)]
net: dsa: better error reporting

Add additional error reporting to the generic DSA code, so it's easier
to debug when things go wrong.  This was useful when initially bringing
up 88e6176 on a new board.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: dsa: mv88e6xxx: remove link polling
Russell King [Sat, 3 Oct 2015 17:09:01 +0000 (18:09 +0100)]
net: dsa: mv88e6xxx: remove link polling

The link status is polled by the generic phy layer, there's no need to
duplicate that polling with additional polling.  This additional polling
adds additional MDIO traffic, and races with the generic phy layer,
resulting in missing or duplicated link status messages.

Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoregmap: Allow installing custom reg_update_bits function
Jon Ringle [Thu, 1 Oct 2015 16:38:07 +0000 (12:38 -0400)]
regmap: Allow installing custom reg_update_bits function

This commit allows installing a custom reg_update_bits function for cases where
the hardware provides a mechanism to set or clear register bits without a
read/modify/write cycle. Such is the case with the Microchip ENCX24J600.

If a custom reg_update_bits function is provided, it will only be used against
volatile registers.

Signed-off-by: Jon Ringle <jringle@gridpoint.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
9 years agoRevert "regmap: Allow installing custom reg_update_bits function"
David S. Miller [Tue, 6 Oct 2015 13:25:43 +0000 (06:25 -0700)]
Revert "regmap: Allow installing custom reg_update_bits function"

This reverts commit 7741c373cf3ea1f5383fa97fb7a640a429d3dd7c.

9 years agoRevert "net: Microchip encx24j600 driver"
David S. Miller [Tue, 6 Oct 2015 13:25:36 +0000 (06:25 -0700)]
Revert "net: Microchip encx24j600 driver"

This reverts commit 04fbfce7a222327b97ca165294ef19f0faa45960.

9 years agoRevert "net: encx24j600_exit() can be static"
David S. Miller [Tue, 6 Oct 2015 13:25:29 +0000 (06:25 -0700)]
Revert "net: encx24j600_exit() can be static"

This reverts commit 9886ce2b9d4e5a8bb3d78d0f7eff3c0f1ed58d67.

9 years agoipv4: Fix compilation errors in fib_rebalance
Peter Nørlund [Tue, 6 Oct 2015 05:24:47 +0000 (07:24 +0200)]
ipv4: Fix compilation errors in fib_rebalance

This fixes

net/built-in.o: In function `fib_rebalance':
fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'

and

net/built-in.o: In function `fib_rebalance':
net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'

Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath")
Signed-off-by: Peter Nørlund <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoRDS: IB: split mr pool to improve 8K messages performance
Santosh Shilimkar [Fri, 11 Sep 2015 04:20:57 +0000 (21:20 -0700)]
RDS: IB: split mr pool to improve 8K messages performance

8K message sizes are pretty important usecase for RDS current
workloads so we make provison to have 8K mrs available from the pool.
Based on number of SG's in the RDS message, we pick a pool to use.

Also to make sure that we don't under utlise mrs when say 8k messages
are dominating which could lead to 8k pull being exhausted, we fall-back
to 1m pool till 8k pool recovers for use.

This helps to at least push ~55 kB/s bidirectional data which
is a nice improvement.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: use max_mr from HCA caps than max_fmr
Santosh Shilimkar [Sat, 19 Sep 2015 17:06:08 +0000 (13:06 -0400)]
RDS: IB: use max_mr from HCA caps than max_fmr

All HCA drivers seems to popullate max_mr caps and few of
them do both max_mr and max_fmr.

Hence update RDS code to make use of max_mr.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: mark rds_ib_fmr_wq static
Santosh Shilimkar [Sat, 19 Sep 2015 21:21:22 +0000 (17:21 -0400)]
RDS: IB: mark rds_ib_fmr_wq static

Fix below warning by marking rds_ib_fmr_wq static

net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: use already available pool handle from ibmr
Santosh Shilimkar [Wed, 16 Sep 2015 01:20:35 +0000 (18:20 -0700)]
RDS: IB: use already available pool handle from ibmr

rds_ib_mr already keeps the pool handle which it associates
with. Lets use that instead of round about way of fetching
it from rds_ib_device.

No functional change.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: fix the rds_ib_fmr_wq kick call
Santosh Shilimkar [Mon, 14 Sep 2015 05:34:37 +0000 (22:34 -0700)]
RDS: IB: fix the rds_ib_fmr_wq kick call

RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need
to use queue_delayed_work() to kick the work. This was hurting
the performance since pool maintenance was less often triggered
from other path.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: handle rds_ibdev release case instead of crashing the kernel
Santosh Shilimkar [Sat, 19 Sep 2015 18:01:09 +0000 (14:01 -0400)]
RDS: IB: handle rds_ibdev release case instead of crashing the kernel

Just in case we are still handling the QP receive completion while the
rds_ibdev is released, drop the connection instead of crashing the kernel.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
9 years agoRDS: IB: split send completion handling and do batch ack
Santosh Shilimkar [Sun, 6 Sep 2015 06:18:51 +0000 (02:18 -0400)]
RDS: IB: split send completion handling and do batch ack

Similar to what we did with receive CQ completion handling, we split
the transmit completion handler so that it lets us implement batched
work completion handling.

We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
identify the send vs receive completion event handler invocation.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>