git.openwrt.org Git - openwrt/staging/blogic.git/log

tcp: switch rtt estimations to usec resolution

Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.

FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'

TCP_CONG_RTT_STAMP is no longer needed.

As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.

In this example ss command displays a srtt of 32 usecs (10Gbit link)

lpk51:~# ./ss -i dst lpk52
Netid  State      Recv-Q Send-Q   Local Address:Port       Peer
Address:Port
tcp    ESTAB      0      1         10.246.11.51:42959
10.246.11.52:64614
         cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559

Updated iproute2 ip command displays :

lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51

Old binary displays :

lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51

With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb_mstamp infrastructure

ktime_get() is too expensive on some cases, and we'd like to get
usec resolution timestamps in TCP stack.

This patch adds a light weight facility using a combination of
local_clock() and jiffies samples.

Instead of :

        u64 t0, t1;

        t0 = ktime_get();
        // stuff
        t1 = ktime_get();
        delta_us = ktime_us_delta(t1, t0);

use :
        struct skb_mstamp t0, t1;

        skb_mstamp_get(&t0);
        // stuff
        skb_mstamp_get(&t1);
        delta_us = skb_mstamp_us_delta(&t1, &t0);

Note : local_clock() might have a (bounded) drift between cpus.

Do not use this infra in place of ktime_get() without understanding the
issues.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: micrel: add of configuration for LED mode

Add support for the led-mode property for the following PHYs
which have a single LED mode configuration value.

KSZ8001 and KSZ8041 which both use register 0x1e bits 15,14 and
KSZ8021, KSZ8031 and KSZ8051 which use register 0x1f bits 5,4
to control the LED configuration.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: fix multiple sleep_on races

The isdn core code uses a couple of wait queues with
interruptible_sleep_on, which is racy and about to get
removed from the kernel. Fortunately, we know for each case
what we are waiting for, so they can all be converted to
the better wait_event_interruptible interface.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: divert, hysdn: fix interruptible_sleep_on race

These two drivers use identical code for their procfs status
file handling, which contains a small race against status
data becoming available while reading the file.

This uses wait_event_interruptible instead to fix this
particular race and eventually get rid of all sleep_on
instances. There seems to be another race involving
multiple concurrent readers of the same procfs file, which
I don't try to fix here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: hisax/elsa: fix sleep_on race in elsa FSM

The state machine code in the elsa driver uses interruptible_sleep_on
to wait for state changes, which is racy. A closer look at the possible
states reveals that it is always used to wait for getting back into
ARCOFI_NOP, so we can use wait_event_interruptible instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: pcbit: fix interruptible_sleep_on race

interruptible_sleep_on is racy and going away. In case of pcbit,
the driver would run into a timeout if the card is initialized
before we start waiting for it. This uses wait_event to fix the
race. In order to do this, the state machine handling for the
timeout case has to get trivially reorganized so we actually know
whether the timeout has occorred or not.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: firestream: fix interruptible_sleep_on race

interruptible_sleep_on is racy and going away. This replaces the one use
in the firestream driver with the appropriate wait_event_interruptible
variant.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'intel-next'

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains updates to ixgbe, igb and documentation.  The
first four have been sent up as part of other series where 1 or more
in the series were rejected and either dropped or still being worked
on for reasons unrelated to these patches.

Don makes recovery from a HW ECC error just schedule a reset as it turns
out the previous behaviour of forcing the user to reload is not necessary.
Mark adds WoL support to port 0 of a new device.  Jacob removes a magic
number from the ptp_caps.name and updates the SubmittingPatches
documentation with details on the Fixed: tag.  And Carolyn updates igb
files to remove the FSF physical mail address.

[ DaveM Note: SubmittingPatches change omitted, will go via LKML ]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Update license text to remove FSF address and update copyright.

This patch updates the license text to remove address of Free Software
Foundation and refer users to www.gnu.org instead. This patch also updates
the copyright dates in appropriate igb driver files.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: make local functions static and remove dead code

Based on Stephen Hemminger's original patch.
Make local functions static, and remove unused functions.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Add WoL support for a new device

Add WoL support for port 0 of a new 82599-based device.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: don't use magic size number to assign ptp_caps.name

Rather than using a magic size number, just use sizeof since that will
work and is more robust to future changes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: modify behavior on receiving a HW ECC error.

Currently when we noticed a HW ECC error we would request the use reload
the driver to force a reset of the part. This was done due to the mistaken
believe that a normal reset would not be sufficient. Well it turns out it
would be so now we just schedule a reset upon seeing the ECC.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT

This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.

Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT

IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.

This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.

As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.

The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.

Fixes: 482fc6094afad5 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment

ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket
is attached to the skb (in case of forwarding) or determines the mtu like
we do in ip_finish_output, which actually checks if we should branch to
ip_fragment. Thus use the same function to determine the mtu here, too.

This is important for the introduction of IP_PMTUDISC_OMIT, where we
want the packets getting cut in pieces of the size of the outgoing
interface mtu. IPv6 already does this correctly.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: probe application via netlink in NUD_PROBE

iproute2 arpd seems to expect this as there's code and comments
to handle netlink probes with NUD_PROBE set. It is used to flush
the arpd cached mappings.

opennhrp instead turns off unicast probes (so it can handle all
neighbour discovery). Without this change it will not see NUD_PROBE
probes and cannot reconfirm the mapping. Thus currently neigh entry
will just fail and can cause few packets dropped until broadcast
discovery is restarted.

Earlier discussion on the subject:
http://marc.info/?t=139305877100001&r=1&w=2

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: fix new function declaration

The commit 8fad346f366a7 ("eee802154: add basic support for RF212 to
at86rf230 driver") introduced the new function is_rf212() with some
minor issues in declaration:

1) Fix the function type by changing it to bool as the function
   definition returns a boolean value. Additionally both callers of
   is_rf212() are expected to return a boolean value.

2) Fix the function specifier by deleting the inline keyword as the
   compiler takes care of that.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: log src and dst along with "udp checksum is 0"

These info messages are rather pointless without any means to identify
the source of the bogus packets. Logging the src and dst addresses and
ports may help a bit.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: remove unused port variable in vxlan_udp_encap_recv()

Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4'

Amir Vadai says:

====================
net, net/mlx4: Add sysfs file for port number

Modern distro's are using biosdevname to rename interface to a name based on
slot/port number.
biosdevname can't get the port number of devices that have multiple ports that
share the same PCI function.
This patch adds a sysfs file under: /sys/devices/.../net/<interface>/dev_port,
that contains the port number (0 based) - to be used by biosdevname.
Also, dev_id was wrongly used in mlx4_en driver - added a patch that fix it.

This patch was tested and applied over commit 51adfcc "net: bcmgenet: remove
unused bh_lock member"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix bad use of dev_id

dev_id should be set for multiple netdev's sharing the same MAC, which
is not the case here.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Expose port number through sysfs

Initialize dev_port with port number (0 based) to be accessed through
sysfs from user space.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add sysfs file for port number

Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x'

Michal Schmidt says:

====================
bnx2x: minimize RAM usage in kdump

kdump kernels usually have only a small amount of memory reserved.
bnx2x can be memory-hungry. Let's minimize its memory usage when
running in kdump.

I detect kdump by looking at the "reset_devices" flag. A couple of
storage drivers (cciss, hpsa) use it for the same purpose. I am not sure
this is the best way to solve the problem, but it works.

Should it be made more generic by, say, looking at the total amount
of lowmem instead? Not using TPA by default when lowmem is small and/or
defaulting to fewer queues would help 32bit systems where a driver for
a multi-function multi-queue NIC can consume a significant amount
of available memory. Or do we want no such heuristics?

Is this something to consider doing for other network drivers too?
====================

Acked-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: save RAM in kdump kernel by disabling TPA

When running in a kdump kernel, disable TPA. This saves memory, which
tends to be scarce in kdump.

TPA, being a receive acceleration, is unlikely to be useful for kdump,
whose purpose is to send the memory image out.

This saves additional 5 MB in the kdump environment.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: save RAM in kdump kernel by using a single queue

When running in a kdump kernel, make sure to use only a single ethernet
queue even if a num_queues option in /etc/modprobe.d/*.conf would specify
otherwise. This saves memory, which tends to be scarce in kdump.

This saves about 40 MB in the kdump environment on a setup with
num_queues=8 in the config file.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: clamp num_queues to prevent passing a negative value

Use the clamp() macro to make the calculation of the number of queues
slightly easier to understand. It also avoids a crash when someone
accidentally passes a negative value in num_queues= module parameter.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tcp: add mib counters to track zero window transitions

Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
but can't because we would shrink current window.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: order MPLS ethertypes numerically

All ethertypes other than ETH_P_MPLS_UC, ETH_P_MPLS_MC and
ETH_P_ATMMPOA were already ordered numerically. This commit moves
those three ETH_P_... values into correct numerical order too.

Signed-off-by: Neil Jerram <Neil.Jerram@metaswitch.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Remove hidden flow control goto from BNX2X_ALLOC macros

BNX2X_ALLOC macros use "goto alloc_mem_err"
so these labels appear unused in some functions.

Expand these macros in-place via coccinelle and
some typing.

Update the macros to use statement expressions
and remove the BNX2X_ALLOC macro.

This adds some > 80 char lines.

$ cat bnx2x_pci_alloc.cocci
@@
expression e1;
expression e2;
expression e3;
@@
- BNX2X_PCI_ALLOC(e1, e2, e3);
+ e1 = BNX2X_PCI_ALLOC(e2, e3); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
expression e3;
@@
- BNX2X_PCI_FALLOC(e1, e2, e3);
+ e1 = BNX2X_PCI_FALLOC(e2, e3); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
@@
- BNX2X_ALLOC(e1, e2);
+ e1 = kzalloc(e2, GFP_KERNEL); if (!e1) goto alloc_mem_err;

@@
expression e1;
expression e2;
expression e3;
@@
- kzalloc(sizeof(e1) * e2, e3)
+ kcalloc(e2, sizeof(e1), e3)

@@
expression e1;
expression e2;
expression e3;
@@
- kzalloc(e1 * sizeof(e2), e3)
+ kcalloc(e1, sizeof(e2), e3)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: remove unused bh_lock member

bh_lock spinlock is unused, remove it from the private driver structure.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: remove commented code in bcmgenet_xmit()

This code is commented since it is unused, left-over from the very first
time this driver was merged.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: drop checks on priv->phydev

Drop all the checks on priv->phydev since we will refuse probing the
driver if we cannot attach to a PHY device. Drop all checks on
priv->phydev. This also fixes some smatch issues reported by Dan
Carpenter.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gianfar'

Claudiu Manoil says:

====================
gianfar: Device reset and reconfig fixes

These patches end up fixing some notable device reset & reconfig
related problems. One issue is on-the-fly (Rx/Tx on) programming
of interrupt coalescing (IC) registers on the processing path,
against HW recommendation. This is an old issue that became visible
after BQL introduction, as under certain conditions (low traffic)
one TX interrupt gets lost and BQL fires Tx timeout as a result.
Another notable issue is a race on the Tx path (xmit, clean_tx)
during device reset (i.e. during Tx timeout watchdog firing)
that leads to NULL access.
Fixing the problematic on-thy-fly register writes (i.e. the IC regs)
required the implementation of a MAC soft reset procedure.
The race leading to NULL access was addressed by fixing the
stop_gfar()/startup_gfar() pair (disable/enable napi a.s.o.)
and adding the device state DOWN to sync with the TX path.

v2: Refactored if() clauses from gfar_set_features(), PATCH 2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix Tx int miss, dont write IC on-the-fly

Programming the interrupt coalescing (IC) registers while
the controller/DMA is on may incur the loss of one Tx
confirmation interrupt, under certain conditions.  This is
a subtle hw race because it does not occur during a burst
of Tx packets.  It has been observed on p2020 devices that,
if just one packet is being xmit'ed, the Tx confirmation
doesn't trigger and BQL evetually blocks the Tx queues,
followed by Tx timeout and an un-responsive device.
This issue was not apparent prior to introducing BQL
support, as a late Tx confirmation was not an issue back then
and the next burst of Tx frames would have triggered the
Tx confirmation/ Tx ring cleanup anyway.

Bottom line, the hw specifications state that the IC registers
should not be programmed while the Rx/Tx blocks (the DMA) are
enabled. Further more, these registers are currently re-written
with the same values on the processing path, over and over again.
To fix this, rewriting the IC registers has been removed from
the processing path (napi poll).  A complete MAC reset procedure
has been implemented for the ethtool -c option instead, to
reliably update these registers while the controller is stopped.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix device reset races (oops) for Tx

The device reset procedure, stop_gfar()/startup_gfar(), has
concurrency issues.
"Kernel access of bad area" oopses show up during Tx timeout
device reset or other reset cases (like changing MTU) that
happen while the interface still has traffic. The oopses
happen in start_xmit and clean_tx_ring when accessing tx_queue->
tx_skbuff which is NULL. The race comes from de-allocating the
tx_skbuff while transmission and napi processing are still
active. Though the Tx queues get temoprarily stopped when Tx
timeout occurs, they get re-enabled as a result of Tx congestion
handling inside the napi context (see clean_tx_ring()). Not
disabling the napi during reset is also a bug, because
clean_tx_ring() will try to access tx_skbuff while it is being
de-alloc'ed and re-alloc'ed.

To fix this, stop_gfar() needs to disable napi processing
after stopping the Tx queues. However, in order to prevent
clean_tx_ring() to re-enable the Tx queue before the napi
gets disabled, the device state DOWN has been introduced.
It prevents the Tx congestion management from re-enabling the
de-congested Tx queue while the device is brought down.
An additional locking state, RESETTING, has been introduced
to prevent simultaneous resets or to prevent configuring the
device while it is resetting.
The bogus 'rxlock's (for each Rx queue) have been removed since
their purpose is not justified, as they don't prevent nor are
suited to prevent device reset/reconfig races (such as this one).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Don't free/request irqs on device reset

Resetting the device (stop_gfar()/startup_gfar()) should
be fast and to the point, in order to timely recover
from an error condition (like Tx timeout) or during
device reconfig. The irq free/ request routines are just
redundant here, and they should be part of the device
close/ open routines instead.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Fix on-the-fly vlan and mtu updates

The RCTRL and TCTRL registers should not be changed
on-the-fly, while the controller is running, otherwise
unexpected behaviour occurs.  But that's exactly what
gfar_vlan_mode() does, updating the VLAN acceleration
bits inside RCTRL/TCTRL.  The attempt to lock these
operations doesn't help, but only adds to the confusion.
There's also a dependency for Rx FCB insertion (activating
/de-activating the TOE offload block on Rx) which might
change the required rx buffer size.  This makes matters
worse as gfar_vlan_mode() ends up calling gfar_change_mtu(),
though the MTU size remains the same.  Note that there are
other situations that may affect the required rx buffer size,
like changing RXCSUM or rx hw timestamping, but errorneously
the rx buffer size is not recomputed/ updated in the process.

To fix this, do the vlan updates properly inside the MAC
reset and reconfiguration procedure, which takes care of
the rx buffer size dependecy and the rx TOE block (PRSDEP)
activation/deactivation as well (in the correct order).
As a consequence, MTU/ rx buff size updates are done now
by the same MAC reset and reconfig procedure, so that out
of context updates to MAXFRM, MRBLR, and MACCFG inside
change_mtu() are no longer needed.  The rx buffer size
dependecy to Rx FCB is now handled for the other cases too
(RXCSUM and rx hw timestamping).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Implement MAC reset and reconfig procedure

The main MAC config registers like: RCTRL/TCTRL, MRBLR,
MAXFRM, RXIC/TXIC, most fields of MACCFG1/2, should not
be changed on-the-fly, but at least after stopping the
DMA and disabling the Rx/Tx blocks and, for increased
reliability, after a MAC soft reset.

Impelement a complete MAC soft reset and reconfig procedure
following the latest HW advisories - gfar_mac_reset() - to
replace gfar_mac_init() and (the confusing) init_registers()
functions.

Factor out separate config functions for RCTRL and TCTRL,
insure programming order of the relevant config regs after
MAC soft reset.

Split gfar_hw_init() into gfar_mac_reset() and the remaining
global regs that don't need to be reconfigured after MAC soft
reset (FIFOCFG, ATTRELI, HW counters a.s.o).

As gfar_hw_init() now makes all the register writes @probe()
time, based on all the device flags and config options, it
must be moved further down, just before register_netdev(),
as the last config step when the config values are comitted
to HW. Also, move netif_carrier_off() after register_netdev(),
because it has no effect if called before.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcmgenet: Deleted unnecessary select_queue method.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Use devm_ioremap_resource()

According to Documentation/driver-model/devres.txt, devm_request_and_ioremap()
is deprecated, so use devm_ioremap_resource() instead.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: netfilter: Use ether_addr_copy

Convert the uses of memcpy to ether_addr_copy because
for some architectures it is smaller and faster.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Use ether_addr_copy and ETH_ALEN

Convert the more obvious uses of memcpy to ether_addr_copy.

There are still uses of memcpy that could be converted but
these addresses are __aligned(2).

Convert a couple uses of 6 in gr_private.h to ETH_ALEN.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cgxb4: Stop using ethtool SPEED_* constants

ethtool speed values are just numbers of megabits and there is no need
to add SPEED_40000. To be consistent, use integer constants directly
for all speeds.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpf_dbg: various misc code cleanups

Lets clean up bpf_dbg a bit and improve its code slightly
in various areas: i) Get rid of some macros as there's no
good reason for keeping them, ii) remove one unused variable
and reduce scope of various variables found by cppcheck,
iii) Close non-default file descriptors when exiting the shell.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

loopback: sctp: add NETIF_F_SCTP_CSUM to device features

Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
hardware crc32c checksumming support for the SCTP protocol.
Currently, NETIF_F_SCTP_CSUM flag is available in igb,
ixgbe, i40e/i40evf drivers and for vlan devices.

If we don't have NETIF_F_SCTP_CSUM then crc32c is done
through CPU instructions, invoked from crypto layer, or
if not available as slow-path fallback in software.

Currently, loopback device propagates checksum offloading
feature flags in dev->features, but is missing SCTP checksum
offloading. Therefore, account for NETIF_F_SCTP_CSUM as
well.

Before patch:

./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

4194304 4194304   4096    10.00    4683.50

After patch:

./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

4194304 4194304   4096    10.00    15348.26

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: document all supported flags

The documentation misses a few of the supported flags. Fix this. Also
respect the dependency to CONFIG_XFRM for the IPSEC flag.

Cc: Fan Du <fan.du@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: simplify error handling in pgctrl_write()

The 'out' label is just a relict from previous times as pgctrl_write()
had multiple error paths. Get rid of it and simply return right away
on errors.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: fix out-of-bounds access in pgctrl_write()

If a privileged user writes an empty string to /proc/net/pktgen/pgctrl
the code for stripping the (then non-existent) '\n' actually writes the
zero byte at index -1 of data[]. The then still uninitialized array will
very likely fail the command matching tests and the pr_warning() at the
end will therefore leak stack bytes to the kernel log.

Fix those issues by simply ensuring we're passed a non-empty string as
the user API apparently expects a trailing '\n' for all commands.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qlcnic-next'

Shahed Shaikh says:

====================
qlcnic: Re-factoring and enhancements

This patch series includes following changes -
* Re-factored firmware minidump template header handling
* Support to make 8 vNIC mode application to work with 16 vNIC mode
* Enhance error message logging when adapter is in failed state and
when adapter lock access fails.
* Allow vlan0 traffic
* update MAINTAINERS

Please apply this series to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Update MAINTAINERS for qlcnic driver

Keep myself as only maintainer for qlcnic driver and update
group email alias to Dept-HSGLinuxNICDev@qlogic.com

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Update version to 5.3.56

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Enhance semaphore lock access failure error message

Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Allow vlan0 traffic

o Adapter allows vlan0 traffic in case of SR-IOV after setting
QLC_SRIOV_ALLOW_VLAN0 bit even though we do not add vlan0 filters.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Enhance driver message in failed state.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Updates to QLogic application/driver interface for virtual NIC configuration

Qlogic application interface in the driver which has larger than 8 vNIC
configuration support has been updated to handle the following cases:

o Only 8 or lower total vNICs were enabled within the vNIC 0-7 range
o vNICs were enabled in the vNIC 0-15 range such that enabled vNICs were
not contiguous and only 8 or lower number of total VNICs were enabled
o Disconnect in the vNIC mapping between application and driver when the
enabled VNICs were dis contiguous

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlcnic: Re-factor firmware minidump template header handling

Treat firmware minidump template headers for 82xx and 83xx/84xx adapters separately,
as it may change for 82xx and 83xx/84xx adapter type independently.

Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4'

Amir Vadai says:

====================
net/mlx4: Mellanox driver update 01-01-2014

This small patchset has a fix to a bogus usage of
netif_get_num_default_rss_queues() in mlx4_en driver.

Changes from V1:
- Removed affinity_hint patch, to make it a generic instead of mlx specific

Changes from V0:
- Instead of reverting the netif_get_num_default_rss_queues() in mlx4_en,
fixing it to limit the actual number of receive queues instead of limiting
the number of IRQ's.

Patchset was applied and tested against commit: cb6e926 "ipv6:fix checkpatch
errors with assignment in if condition"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Fix limiting number of IRQ's instead of RSS queues

This fix a performance bug introduced by commit 90b1ebe "mlx4: set
maximal number of default RSS queues", which limits the numbers of IRQs
opened by core module.
The limit should be on the number of queues in the indirection table -
rx_rings, and not on the number of IRQ's. Also, limiting on mlx4_core
initialization instead of in mlx4_en, prevented using "ethtool -L" to
utilize all the CPU's, when performance mode is prefered, since limiting
this number to 8 reduces overall packet rate by 15%-50% in multiple TCP
streams applications.

For example, after running ethtool -L <ethx> rx 16

          Packet rate
Before the fix  897799
After the fix   1142070

Results were obtained using netperf:

S=200 ; ( for i in $(seq 1 $S) ; do ( \
  netperf -H 11.7.13.55 -t TCP_RR -l 30 &) ; \
  wait ; done | grep "1        1" | awk '{SUM+=$6} END {print SUM}' )

CC: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Set number of RX rings in a utility function

mlx4_en_add() is too long.
Moving set number of RX rings to a utiltity function to improve
readability and modulization of the code.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove no longer needed lock for bond_xxx_info_query()

The bond_xxx_info_query() was already in RTNL, so no need to use
bond lock to protect the bond slave list, so remove it.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: use rcu_dereference() to access curr_active_slave

The bond_info_show_master already in RCU read-side critical section,
and the we access curr_active_slave without the curr_slave_lock, we
could not sure whether the curr_active_slave will be changed during
the processing, so use RCU to protected the pointer.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: netpoll: remove unwanted slave_dev_support_netpoll()

The __netpoll_setup() will check the slave's flag and ndo_poll_controller just
like the slave_dev_support_netpoll() does, and slave_dev_support_netpoll() was
not used by any place, so remove it.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
1) Introduce skb_to_sgvec_nomark function to add further data to the sg list
   without calling sg_unmark_end first. Needed to add extended sequence
   number informations. From Fan Du.

2) Add IPsec extended sequence numbers support to the Authentication Header
   protocol for ipv4 and ipv6. From Fan Du.

3) Make the IPsec flowcache namespace aware, from Fan Du.

4) Avoid creating temporary SA for every packet when no key manager is
   registered. From Horia Geanta.

5) Support filtering of SA dumps to show only the SAs that match a
   given filter. From Nicolas Dichtel.

6) Remove caching of xfrm_policy_sk_bundles. The cached socket policy bundles
   are never used, instead we create a new cache entry whenever xfrm_lookup()
   is called on a socket policy. Most protocols cache the used routes to the
   socket, so this caching is not needed.

7)  Fix a forgotten SADB_X_EXT_FILTER length check in pfkey, from Nicolas
    Dichtel.

8) Cleanup error handling of xfrm_state_clone.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'i40evf'

Aaron Brown says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and (mostly to) i40evf.

Mitch provides most the work for this series. For the vf driver he
requests a reset on a tx hang, removes vlan filtes on close since we
already remove the MAC filters, fixes some crashes, gets rid of PCI DAC
as it does not mean much on virtualized PCIe parts, skips assigning the
device name that just gets renamed anyway, stores the descriptor ring
size in a manner that allows the use of common tx and rx code with the
PF driver and makes a handful of cosmetic fixes. For i40e he removes
a delay left over from debugging and changes a do/while loop to a for
loop to avoid hitting another delay each time.

Catherine fixes inconsistent MSI and MSI-X messages and bumps the
driver version.

v2: Removed unnecessary periods and redundant OOM message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e and i40evf: Bump driver versions

Update the driver versions.

Change-ID: I3fe23024d17da0e614ce126edb365bb2c428d482
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Change MSIX to MSI-X

Fix inconsistent use of MSIX and MSI-X in messages.

Change-ID: Iae9ffb42819677c34544719044ed77632e06147d
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: tighten up ring enable/disable flow

Change the do/while to a for loop, so we don't hit the delay each
time, even when the register is ready for action.
Don't bother to set or clear the QENA_STAT bit as it is
read-only.

Change-ID: Ie464718804dd79f6d726f291caa9b0c872b49978
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: remove unnecessary delay

Ain't nothing gonna break my stride, nobody's gonna slow me down,
oh no. I got to keep on moving.

This was originally put in for debugging just-in-case purposes
and never removed.

Change-ID: Ic12c2e179c3923f54e6ba0a9e4ab05d25c3bab29
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: remove errant space

Remove a bogus space.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: update version and copyright date

A bunch of changes merit a new version number, and since these were
made in the new year, update the copyright date.

Change-ID: Ic3f282bf0c20679b9fb06860211afa7c78055bc2
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: store ring size in ring structs

Keep the descriptor ring size in the actual ring structs instead of in
the adapter struct. This enables us to use common tx and rx code with
the i40e PF driver.

Also update copyrights.

Change-ID: I2861e599b2b4c76441c062ea14400f4750f54d0e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: don't guess device name

We don't need to set an interface name here; the net core will do
that, and then it will get renamed by udev anyway.

Change-ID: I839a17837d19bedd1f490bff32ac5b85b4bfd97f
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: remove bogus comment

This comment is simply not true.

Change-ID: If006b02b60984601a24257a951ae873dff568008
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: fix up strings in init task

Make sure errors are reported at the correct log level, quit printing
the function name every time, and make the messages more consistent in
format.

v2: Removed unnecessary periods and redundant OOM message.

Change-ID: I50e443467519ad3850def131d84626c50612c611
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: get rid of pci_using_dac

PCI DAC doesn't really mean much on a virtualized PCI Express part, so
get rid of that check and just always set the HIGHDMA flag in the net
device.

Change-ID: I2040272be0e7934323f470c2bc73fbdd4f93e2b6
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: fix multiple crashes on remove

Depending upon the state of the driver, there are several potential
pitfalls on remove. Kill the watchdog task so rmmod doesn't hang.
Check the adapter->msix_entries field, not the num_msix_vectors field,
which is never cleared.

Change-ID: I0546048477f09fc19e481bd37efa30daae4faa88
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: remove VLAN filters on close

We remove all the MAC filters, so remove the VLAN filters, too.

Change-ID: I4f7559acdf005dc3f359bf6460ce32d183c8878b
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40evf: request reset on tx hang

If the kernel watchdog bites us, ask the PF to reset us and attempt to
reinit the driver.

Change-ID: Ic97665aeeed71ce712b9c4f057e78ff8372522b9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: Cleanup error handling of xfrm_state_clone

The error pointer passed to xfrm_state_clone() is unchecked,
so remove it and indicate an error by returning a null pointer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

pfkey: fix SADB_X_EXT_FILTER length check

This patch fixes commit d3623099d350 ("ipsec: add support of limited SA dump").

sadb_ext_min_len array should be updated with the new type (SADB_X_EXT_FILTER).

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
Please pull this batch of wireless updates intended for the 3.15
stream!

For the mac80211 bits, Johannes says:

"We have some cleanups and minor fixes as well as userspace API
improvements from a lot of people, extended VHT support for radiotap
from Emmanuel, CSA improvements from Andrei, Luca and Michal. I've also
included my work on hwsim to make dynamic registration of radios
possible."

Along with that, we get the usual round of updates to ath9k,
brcmfmac, mwifiex, wcn36xx, and the ti drivers -- nothing particularly
noteworthy, mostly just random updates and refactoring.

Also included is a pull of the wireless tree, intended to resolve
some potential merge issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next into for-davem

bonding: fix bond_arp_rcv() race of curr_active_slave

bond->curr_active_slave can be changed between its deferences, even to
NULL, and thus we might panic.

We're always holding the rcu (rx_handler->bond_handle_frame()->bond_arp_rcv())
so fix this by rcu_dereferencing() it and using the saved.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Fixes: aeea64a ("bonding: don't trust arp requests unless active slave really works")
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hyperv: Add latest NetVSP versions to auto negotiation

It auto negotiates the highest NetVSP version supported by both guest and host.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use zero-window when free_space is low

Currently the kernel tries to announce a zero window when free_space
is below the current receiver mss estimate.

When a sender is transmitting small packets and reader consumes data
slowly (or not at all), receiver might be unable to shrink the receive
win because

a) we cannot withdraw already-commited receive window, and,
b) we have to round the current rwin up to a multiple of the wscale
   factor, else we would shrink the current window.

This causes the receive buffer to fill up until the rmem limit is hit.
When this happens, we start dropping packets.

Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2]
even if socket is not being read from.

As we cannot avoid the "current_win is rounded up to multiple of mss"
issue [we would violate a) above] at least try to prevent the receive buf
growth towards tcp_rmem[2] limit by attempting to move to zero-window
announcement when free_space becomes less than 1/16 of the current
allowed receive buffer maximum.  If tcp_rmem[2] is large, this will
increase our chances to get a zero-window announcement out in time.

Reproducer:
On server:
$ nc -l -p 12345
<suspend it: CTRL-Z>

Client:
#!/usr/bin/env python
import socket
import time

sock = socket.socket()
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
sock.connect(("192.168.4.1", 12345));
while True:
   sock.send('A' * 23)
   time.sleep(0.005)

socket buffer on server-side will grow until tcp_rmem[2] is hit,
at which point the client rexmits data until -EDTIMEOUT:

tcp_data_queue invokes tcp_try_rmem_schedule which will call
tcp_prune_queue which calls tcp_clamp_window().  And that function will
grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2].

Thanks to Eric Dumazet for running regression tests.

Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: failed transmissions should return error

When a message could not be sent out because the destination node
or link could not be found, the full message size is returned from
sendmsg() as if it had been sent successfully. An application will
then get a false indication that it's making forward progress. This
problem has existed since the initial commit in 2.6.16.

We change this to return -ENETUNREACH if the message cannot be
delivered due to the destination node/link being unavailable. We
also get rid of the redundant tipc_reject_msg call since freeing
the buffer and doing a tipc_port_iovec_reject accomplishes exactly
the same thing.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: solos-pci: make solos_bh() as static

sparse says:

drivers/atm/solos-pci.c:763:6: warning:
symbol 'solos_bh' was not declared. Should it be static?

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: nicstar: use NULL instead of 0 for pointer

sparse says:

drivers/atm/nicstar.c:642:27: warning:
Using plain integer as NULL pointer
drivers/atm/nicstar.c:644:27:
warning: Using plain integer as NULL pointer
drivers/atm/nicstar.c:982:51:
warning: Using plain integer as NULL pointer
drivers/atm/nicstar.c:996:51:
warning: Using plain integer as NULL pointer

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm: ambassador: use NULL instead of 0 for pointer

sparse says:

drivers/atm/ambassador.c:1928:24: warning:
Using plain integer as NULL pointer

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg

In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.

We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.

Reported-by: Gert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Invert test

Make the error case return early.
Make the normal return at the bottom of the function.
Reduces indent for readability.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Remove unnecessary else

It's unnecessary and less readable after a clause ending in a goto.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: More use of ether_addr_copy

It's smaller and faster for some architectures.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pcnet32: add missing check for pci_dma_mapping_error

The pci_map_single calls never checked for error. Add error checking
as requested by someone last year. Tested on 79c972.

Signed-off-by: Don Fry <pcnet32@frontier.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pcnet32: fix reallocation error

pcnet32_realloc_rx_ring() only worked on the first log2 number of
entries in the receive ring instead of the all the entries.
Replaced "1 << size" with more descriptive variable.
This is my original bug from 2006. Found while testing another problem.
Tested on 79C972 and 79C976.

Signed-off-by: Don Fry <pcnet32@frontier.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: Remove caching of xfrm_policy_sk_bundles

We currently cache socket policy bundles at xfrm_policy_sk_bundles.
These cached bundles are never used. Instead we create and cache
a new one whenever xfrm_lookup() is called on a socket policy.

Most protocols cache the used routes to the socket, so let's
remove the unused caching of socket policy bundles in xfrm.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Merge git://git./linux/kernel/git/davem/net

Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>