git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge branch 'phys_port'

Jiri Pirko says:

====================
This patchset is based on patch by Narendra_K@Dell.com
Once device which can change phys port id during its lifetime adopts this,
NETDEV_CHANGEPHYSPORTID event will be added and driver will call
call_netdevice_notifiers(NETDEV_NETDEV_CHANGEPHYSPORTID, dev) to propagate
the change to userspace.

v1->v2: as suggested by Ben, handle -EOPNOTSUPP in rtnl code (wrapped up ndo call)
v2->v3: adjusted patch 1 commit message
v3->v4: used "%phN" for sysfs printf as suggested by DaveM
        added igb/igbvf implementation as requested by Or Gerlitz
v4->v5: used prandom_u32 to generate id in igb_probe
        removed duplicate code in ibgvf_probe
        pushed dev_err string into one line in igbvf_refresh_ppid
v5->v6: use uuid_le_gen for generating 16-byte phys port id for igb/igbvf
as suggested by BenH

1) Why do we need this, and why do existing facilities fail to provide
   a way to accomplish this?

Currenty there's very hard to tell if two netdevs are using the same physical
port. For sr-iov this can be get by sysfs. For other mechanisms, like NPAR
there's very hard to do it (one must learn it from NIC BIOS). But even for
sr-iov there's no way to say if two netdevs are using the same phys port when
these are passed through to virtual guests.

This patchset provides the generic way of letting this information know to
userspace. This info can be used by apps like NetworkManager, teamd, Wicked,
ovs daemon, etc, to do smarter bonding decisions.

2) Why is the physical port ID defined as a 32 byte opaque cookie?
   What formats and layouts need to be accomodated, and which
   influenced the design of the ID?

For user to distinguish if two netdevs are using the same port, he only needs
to compare their phys port ids. Nothing else is needed. This id has no
structure for security reasons. VF should not know anything about PF.

3) Are IDs globally unique?  Why or why not?  If IDs should be
   globally unique, but only in certain cases, what exactly are those
   cases.

Most of the time only uniqueness needed is in scope of single machine.
There might be case when the id should be unique between couple of machines
in virtualization environment. Given that for example for igb/igbvf 16B uuid
is used, there is no problem for this case as well. But each driver can
implement this differently focusing the hw capabilities and needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: export physical port id via sysfs

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnl: export physical port id via RT netlink

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add ndo to get id of physical port of the device

This patch adds a ndo for getting physical port of the device. Driver
which is aware of being virtual function of some physical port should
implement this ndo. This is applicable not only for IOV, but for other
solutions (NPAR, multichannel) as well. Basically if there is possible
to have multiple netdevs on the single hw port.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: support big endian

Use the "swap descriptor" feature of the hardware to properly swap the
descriptors when running in big endian mode. Since the swapping occurs
on 64 bits words, we also need to provide a separate structure layout
for the DMA descriptors between little endian and big endian mode,
like is done in the mv643xx_eth driver.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: move the RX and TX desc macros outside of the structs

The macros used for the various fields of the RX and TX descriptions
are currently declared next to those fields within the structure
definitions of the RX and TX descriptors.

However, in order to support big endian, we'll have to use the "swap
descriptors" features of the hardware, which swaps every byte within
each 64 bits word of the descriptors. This requires a separate
definition of the RX and TX descriptor structures for little and big
endian, as is done in the mv643xx_eth. Those macros can therefore no
longer be defined inside those structures.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: Require CONFIG_INET due to use of IPv4 checksum function

Unlike for IPv6, the IPv4 checksum functions are only available
if CONFIG_INET is set.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies

| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.

Original idea and first implementation by Eric Dumazet.

Cc: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: Add support for set MAC address

Adding support for setting MAC address to cpsw device via ndo_set_mac_address

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tile: handle 64-bit statistics in tilepro network driver

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9p: client: remove unused code and any reference to "cancelled" function

This patch reverts commit

80b45261a0b263536b043c5ccfc4ba4fc27c2acc

which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.

This implementation was not used anywhere and therefore removed.

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: don't use dev_err when AER enabling fails

The driver uses dev_err when enabling of AER fails (e.g. PCIe AER is not
supported). The dev_info is more appropriate to avoid console pollution.

Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Update version to 3.133

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Fix UDP fragments treated as RMCP

The 5762 devices sometimes incorrectly treat udp fragments as RMCP
packets and route to the APE. This patch sets the RX_MODE_IPV4_FRAG_FIX
bit for these devices which enables the proper behaviour.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Enable support for timesync gpio output

The PTP_CAPABLE tg3 devices have a gpio output that is toggled when the
free running counter matches a watchdog value. This patch adds support
to set the watchdog and enable this feature.

Since the output is controlled via bits in the EAV_REF_CLCK_CTL
register, we have to read-modify-write it when we stop/resume.

Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Implement the shutdown handler

Also remove the call to tg3_power_down_prepare() in tg3_power_down()
since tg3_close() calls it.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Allow NVRAM programming when interface is down

Previously, when the interface was brought down, the driver would set
the power state to D3hot. In D3hot, we don't have access to the NVRAM.
This patch removes the call to set the power state to PCI_D3hot in
close. A following patch will implement the shutdown handler to properly
set the D3hot state when the system is going down.

Doing the above means that the TG3_PHYFLG_IS_LOW_POWER should not be
checked to validate access to the NVRAM.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Remove incorrect switch to aux power

During probe, the driver is incorrectly switching the power to Vaux on
the 5717 and later devices. At this point, we are in D0 state and
drawing maximum power. We also definitely have Vmain available. It
doesn't make sense to switch to Vaux since it has a lesser maximum power
draw and we might go over the limit. On a new system, we observe that
not all ports are recognized in some of the slots with this call in
place.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Update version to 2.5.17 and copyright year.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Add missing error checking for RAMROD_CMD_ID_CLOSE

Completion status field should also be checked for non-zero error
condition.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Update TCP options setup for iSCSI.

Update TCP delayed ACK and timestamp options setup to match latest bnx2x
firmware.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Reset tcp_flags during cnic_cm_create().

Without resetting it, the bnx2i driver cannot use different options for
different iSCSI connections.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Simplify cnic_release().

Since unregister_netdevice_notifier() will replay the NETDEV_DOWN and
NETDEV_UNREGISTER_EVENTS, the cnic_dev_list will be cleaned up automatically.
The loop to cleanup the cnic_dev_list can be removed in cnic_release().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic: Simplify netdev events handling.

After this earlier commit to simplify probing:

    commit 4bd9b0fffb193d2e288f67f81821af32df8d4349
    cnic, bnx2x, bnx2: Simplify cnic probing.

we can now reliably receive netdev events and we can simplify the handling
of these events.  We now remove the logic that tries to handle missed
NETDEV_REGISTER events.

This change will allow cleanup to be simplified in the next patch.  We can
now rely on the play back of netdev events during
unregister_netdevice_notifier() to cleanup the structures.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Respond to operation request by firmware

This commit adds new firmware command and new firmware event. The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.

Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix BlueFlame race

Fix a race between BlueFlame flow and stamping in post send flow.
Example:
SW: Build WQE 0 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 0 on the TX buffer
SW: Ring doorbell for WQE 0
SW: Build WQE 1 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 1 on the TX buffer
HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
HW: Produce CQEs for WQE 0 and WQE 1
SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
HW: CQE error with index 0xFFFF - the BF WQE's control segment is STAMPED,
so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.

Solution:
When stamping - do not stamp last completed wqe.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add needed include file

Fixes this on PowerPC (at least):

net/core/pktgen.c: In function 'fill_packet_ipv6':
net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0);
^

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to e100 and e1000e.

The e100 patch from Andy simply updates the netif_printk() to use
%*ph to dump small buffers.

The changes to e1000e include a fix from Dean Nelson to resolve a
issue where a pci_clear_master() was accidentally dropped during a
conflict resolution. Wei Young provides 2 patches, one removes an
assignment of the default ring size because it was a duplicate. The
second changes the packet split receive structure to use
PS_PAGE_BUFFERS macro for the length so that problems won't occur
when the length is changed.

The remaining patches for e1000e are from Bruce Allan, where he
provides a number of fixes and updates for I218. In addition, a
fix for 82583 which can disappear off the PCIe bus, to resolve this,
disable ASPM L1. Bruce also provides a fix to a previous commit
(commit e60b22c5b7 e1000e: fix accessing to suspended device) so that
devices are only taken out of runtime power management for those
ethtool operations that must access device registers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL

v2:
a) Also send ipv4 igmp messages with TC_PRIO_CONTROL

Cc: William Manley <william.manley@youview.com>
Cc: Lukas Tribus <luky-37@hotmail.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: fix I217/I218 PHY initialization flow

The initialization of the PHY on I217/I218, while similar to 82579, must
also check to see if the MAC and PHY are in the same mode (PCIe vs. SMBus)
otherwise the PHY will be inaccessible by the MAC.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: do not resume device from RPM suspend to read PHY status registers

When the device is runtime suspended (e.g. when there is no link), do not
wake it from D3 to read the PHY status; just set the values to typical
power-on defaults as is done when runtime PM is not enabled and there is no
link.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: enable support for new device IDs

The device IDs 0x15a0 and 0x15a1 are new SKUs that contain the same MAC as
I217 and same PHY as I218.

The device IDs 0x15a2 and 0x15a3 are the same as existing I218 SKUs.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: ethtool unnecessarily takes device out of RPM suspend

A previous patch (commit e60b22c5b7 e1000e: fix accessing to suspended
device) added .begin and .complete ethtool driver callbacks so that the
device was resumed from Runtime Power Management (RPM) suspend state for
all ethtool operations. This is overkill for operations which do not need
to access any registers in the device. This patch makes it so that the
device is taken out of RPM suspend only for those ethtool operations that
must access device registers.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps

Tx hang is an unintended consequence of another workaround that is in the
EEPROM for an issue with the firmware at 10Mbps when K1 (a power mode of
the MAC-PHY interconnect) is enabled. The issue is resolved by setting
appropriate Tx re-transmission timeouts in the PHY and associated K1 entry
times in the MAC to allow enough transmissions to occur without triggering
a Tx hang. A similar change is needed when linked at 10Mbps to improve
latency.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: low throughput using 4K jumbos on I218

Alter the packet buffer allocation accordingly.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: iAMT connections drop on driver unload when jumbo frames enabled

The jumbo frame configuration in the MAC/PHY should be reverted on 82579
and newer parts when the interface is brought down (not just when the MTU
is changed back to standard frame size) otherwise iAMT connections (e.g.
SoL, IDE-R) will be dropped and cannot be re-acquired until the MTU is
changed again.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: disable ASPM L1 on 82583

The 82583 can disappear off the PCIe bus. This device is a modified 82574
which had the same problem which was fixed by disabling ASPM L1; disabling
it on 82583 fixes the issue on this device.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split

In structure e1000_rx_desc_packet_split, the size of wb.upper.length is
defined by a digit. This may introduce some problem when the length is
changed.

This patch use the macro PS_PAGE_BUFFERS for the definition. And move the
definition to hw.h.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Remove duplicate assignment of default rx/tx ring size

tx_ring/rx_ring size is assigned in function e1000_alloc_queues(), which is
called by e1000_sw_init() in the early stage of e1000_probe().

This patch just remove the duplicate assignment of this default ring size
value.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Da Yu Qiu <qiudayu@cn.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: restore call to pci_clear_master()

In attempting to resolve a minor merge conflict, commit e5f2ef7ab4690d2e8faa
(Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) accidentally
dropped a call to pci_clear_master() that was intended to remain in place.

Commit 4e0855dff094b0d56d6b (e1000e: fix pci-device enable-counter balance)
replaced a call to pci_disable_device() by one to pci_clear_master(). And then
commit 66148babe728f3e00e13 (e1000e: fix runtime power management transitions)
deleted a number of lines starting two lines following that call.

This patch restores the call to pci_clear_master() in __e1000_shutdown().

v2: added summary lines (enclosed in parens) following commit IDs

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e100: dump small buffers via %*ph

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

bonding: remove bond_resend_igmp_join_requests read_unlock leftover

After commit 4aa5dee4d9 ("net: convert resend IGMP to notifier event") we
have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired
with a read_lock because it's removed by that commit.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: Use ip_send_check() to compute checksum

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: Add UDPCSUM flag to support UDP checksums

UDP checksums are optional, hence pktgen has been omitting them in
favour of performance. The optional flag UDPCSUM enables UDP
checksumming. If the output device supports hardware checksumming
the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
checksum is generated in software.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

VSOCK: Move af_vsock.h and vsock_addr.h to include/net

This is useful for other VSOCK transport implemented outside the
net/vmw_vsock/ directory to use these headers.

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'minnow/net-next' of git://git.infradead.org/users/dvhart/linux-2.6 into minnow

Darren Hart says:

====================
Add support for the MinnowBoard in the pch_gbe driver. This was
originally sent to LKML as part of the MinnowBoard support series. That
is now partially merged and this version of the patch has been isolated
from those changes and is now completely self-contained.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

USBNET: increase max rx/tx qlen for improving USB3 thoughtput

The default RX_QLEN()/TX_QLEN() didn't consider super speed
USB device, so only max 4 URBs are scheduled at the same time
for tx/rx, then USB3 NIC can't perform very well.

With this patch, both rx and tx thoughput are increased more than
100Mbps when doing iperf test on ax88179_178a USB 3.0 NIC.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

USBNET: centralize computing of max rx/tx qlen

This patch centralizes computing of max rx/tx qlen, because:

- RX_QLEN()/TX_QLEN() is called in hot path
- computing depends on device's usb speed, now we have ls/fs, hs, ss,
so more checks need to be involved
- in fact, max rx/tx qlen should not only depend on device USB
speed, but also depend on ethernet link speed, so we need to
consider that in future.
- if SG support is done, max tx qlen may need change too

Generally, hard_mtu and rx_urb_size are changed in bind(), reset()
and link_reset() callback, and change mtu network operation, this
patches introduces the API of usbnet_update_max_qlen(), and calls
it in above path.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tuntap: hardware vlan tx support

Inspired by commit f09e2249c4f5c7c13261ec73f5a7807076af0c8e (macvtap: restore
vlan header on user read). This patch adds hardware vlan tx support for
tuntap. This is done by copying vlan header directly into userspace in
tun_put_user() instead of doing it through __vlan_put_tag() in
dev_hard_start_xmit(). This eliminates one unnecessary memmove() in
vlan_insert_tag() for 802.1ad and 802.1q traffic.

pktgen test shows about 20% improvement for 802.1q traffic:

Before:
662149pps 317Mb/sec (317831520bps) errors: 0
After:
801033pps 384Mb/sec (384495840bps) errors: 0

Cc: Basil Gor <basil.gor@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sctp: Refactor SCTP skb checksum computation

This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: put virtio net header inline with data

For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

bond: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function bond_netpoll_info could just be removed
by rolling it into the enable code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: cleanup netpoll clode

This started out with fixing a sparse warning, then I realized that
the wrapper function team_netpoll_info could just be collapsed away
by rolling it into the enable code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: cleanup netpoll code

This started out with fixing a sparse warning, then I realized that
the wrapper function br_netpoll_info could just be collapsed away
by rolling it into the enable code.

Also, eliminate unnecessary goto's

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: use pre-defined macro in bond_mode_name instead of magic number 0

We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest
mode number.
Use it to check the arg lower bound instead of magic number 0 in
bond_mode_name.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: Add MinnowBoard support

The MinnowBoard uses an AR803x PHY with the PCH GBE which requires
special handling. Use the MinnowBoard PCI Subsystem ID to detect this
and add a pci_device_id.driver_data structure and functions to handle
platform setup.

The AR803x does not implement the RGMII 2ns TX clock delay in the trace
routing nor via strapping. Add a detection method for the board and the
PHY and enable the TX clock delay via the registers.

This PHY will hibernate without link for 10 seconds. Ensure the PHY is
awake for probe and then disable hibernation. A future improvement would
be to convert pch_gbe to using PHYLIB and making sure we can wake the
PHY at the necessary times rather than permanently disabling it.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: netdev@vger.kernel.org

drivers/net/ethernet/stmicro/stmmac: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: Use PCH_GBE_PHY_REGS_LEN instead of 32

Avoid using magic numbers when we have perfectly good defines just lying
around.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: netdev@vger.kernel.org

net: Make devnet_rename_seq static

No users outside net/core/dev.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP_NOTSENT_LOWAT socket option

Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

           412,514 context-switches

     200.034645535 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

         2,675,818 context-switches

     200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add sk_stream_is_writeable() helper

Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: trivial: add uapi/linux/sctp.h into maintainers

After this file has moved to the uapi section, we also need to update
this in the maintainers file.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: trivial: update mailing list address

The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: add support to show hw stats via ethtool

Add support to show CPSW hardware statistics to user via ethtool
so user can find if there were any error reported by hardware or
the system is over loaded duing high data rate transfer.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Fixed up a error "do not initialise statics to 0 or NULL" in bond_main.c

The error is found by the checkpatch.pl tools.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add rtnl protection for bonding_store_fail_over_mac

We need rtnl protection while reading slave_cnt and updating
the .fail_over_mac, and it also follows the logic "don't change
anything slave-related without rtnl". :)

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: bond_sysfs.c checkpatch cleanup

net/bonding/bond_sysfs.c:1302: ERROR: else should follow close brace '}'
net/bonding/bond_sysfs.c:1314: ERROR: else should follow close brace '}'

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: don't call slave_xxx_netpoll under spinlocks

The slave_xxx_netpoll will call synchronize_rcu_bh(),
so the function may schedule and sleep, it should't be
called under spinlocks.

bond_netpoll_setup() and bond_netpoll_cleanup() are always
protected by rtnl lock, it is no need to take the read lock,
as the slave list couldn't be changed outside rtnl lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: enic: Move ethtool code to a separate file

This patch moves all enic ethtool hooks from enic_main.c to a new file
enic_ethtool.c

Signed-off-by: Neel Patel <neepatel@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: trans_rdma: remove unused function

This patch gets rid of the following warning:

net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]
static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

The rdma_cancelled function is not called anywhere in the kernel

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net'

Sathya Perla says:

====================
The following patches are mostly for providing MAC
filtering ability for VFs. Pls apply. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: delete primary MAC address while unloading

Currently the UC-list is being deleted from the HW MAC table, but the primary
MAC is not.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: use SET/GET_MAC_LIST for SH-R

On SH-R and Lancer-R, GET_MAC_LIST cmd is better supported
(instead of NTWK_MAC_QUERY cmd) to query provisioned MAC addresses.
Similiarly, (on SH-R and Lancer-R) SET_MAC_LIST must be used by the PF to
provision a permanent MAC addresses to the VF.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: refactor MAC-addr setup code

The code to configure the permanent MAC in be_setup() has become quite
complicated, with different FW cmds being used for BEx, SH-R and Lancer.
Simplify the logic by moving some of this complexity to be_cmds.c. This
makes the code in be_setup() a little more readable.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix pmac_id for BE3 VFs

For BE3 VFs, the permanent MAC is added by its PF. The VF can retrieve its
pmac_id only via the IFACE_CREATE cmd. This is not true for Lancer and SH-R
VFs which get the pmac_id by issuing a ADD_IFACE_MAC cmd. So, use this
hack only for BE3 VFs.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: allow VFs to program MAC and VLAN filters

In the current design VFs were not allowed to program MAC/VLAN filters.
Only the PF driver was allowed to configure/provision MAC and transparent
VLANs to a VF. Change this to support MAC/VLAN filtering on a VF by a VM.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix MAC address modification for VF

Currently, the VFs by default don't have the privilege to modify MAC address.
This will change in a subsequent fix wherein VFs will have the ability to
modify MAC/VLAN filters.

Fix be_mac_addr_set() logic to support MAC address modification on a
privileged VF too.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Add support for r8a7790 SoC

This is a copy of support for r8a7778/9 with the .rmiimode mode bit
of struct sh_eth_cpu_data set.

Also update R8A7779 to R8A777x.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: add support for RMIIMODE register

This register is prsent on the r8a7790 SoC.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6 eliminate parameter "int addrlen" in function fib6_add_1

The "int addrlen" in fib6_add_1 is rebundant, as we can get it from
parameter "struct in6_addr *addr" once we modified its type.
And also fix some coding style issues in fib6_add_1

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv6: Remove rebundant rt6i_nsiblings initialization

Seting rt->rt6i_nsiblings to zero is rebundant, because above memset
zeroed the rest of rt excluding the first dst memember.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vti: switch to new ip tunnel code

GRE tunnel and IPIP tunnel already switched to the new
ip tunnel code, VTI tunnel can use it too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: change the prototype of ip6_mr_forward().

This patch changes the prototpye of the ip6_mr_forward() method to return void
instead of int.

The ip6_mr_forward() method always returns 0; moreover, the return value of this
method is not checked anywhere.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr: change the prototype of ip_mr_forward().

This patch changes the prototpye of the ip_mr_forward() method to return void
instead of int.

The ip_mr_forward() method always returns 0; moreover, the return value of this
method is not checked anywhere.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'team' ("add support for peer notifications and igmp rejoins for team")

Jiri Pirko says:

====================
The middle patch adjusts core infrastructure so the bonding code can be
generalized and reused by team.

v1->v2: using msecs_to_jiffies() as suggested by Eric

Jiri Pirko (3):
  team: add peer notification
  net: convert resend IGMP to notifier event
  team: add support for sending multicast rejoins
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

team: add support for sending multicast rejoins

Similar to what is implemented in bonding. User is able to ask team
driver to send IGMP rejoins in case port is enabled or disabled. Using
previously introduced netdev notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert resend IGMP to notifier event

Until now, bond_resend_igmp_join_requests() looks for vlans attached to
bonding device, bridge where bonding act as port manually. It does not
care of other scenarios, like stacked bonds or team device above. Make
this more generic and use netdev notifier to propagate the event to
upper devices and to actually call ip_mc_rejoin_groups().

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: add peer notification

When port is enabled or disabled, allow to notify peers by unsolicitated
NAs or gratuitous ARPs. Disabled by default.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan fdb replace support

Add support for iproute2 command 'bridge fdb replace ...'.
The rtnletlink call back function ndo_fdb_add will be called
with the NLM_F_REPLACE flag set.
Simply return -EOPNOTSUP.

Resubmitted because net-next was closed last week.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan fdb replace an existing entry

Add support to replace an existing entry found in the
vxlan fdb database. The entry in question is identified
by its unicast mac address and the destination information
is changed. If the entry is not found, it is added in the
forwarding database. This is similar to changing an entry
in the neighbour table.

Multicast mac addresses can not be changed with the replace
option.

This is useful for virtual machine migration when the
destination of a target virtual machine changes. The replace
feature can be used instead of delete followed by add.

Resubmitted because net-next was closed last week.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp'

Yuchung Cheng says:

====================
This patch series improve RTT sampling in three ways:
1. Sample RTT during fast recovery and reordering events.
2. Favor ack-based RTT to timestamps because of broken TS ECR fields
3. Consolidate the RTT measurement logic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use RTT from SACK for RTO

If RTT is not available because Karn's check has failed or no
new packet is acked, use the RTT measured from SACK to estimate
the RTO. The sender can continue to estimate the RTO during loss
recovery or reordering event upon receiving non-partial ACKs.

This also changes when the RTO is re-armed. Previously it is
only re-armed when some data is cummulatively acknowledged (i.e.,
SND.UNA advances), but now it is re-armed whenever RTT estimator
is updated. This feature is particularly useful to reduce spurious
timeout for buffer bloat including cellular carriers [1], and
RTT estimation on reordering events.

[1] "An In-depth Study of LTE: Effect of Network Protocol and
Application Behavior on Performance", In Proc. of SIGCOMM 2013

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: measure RTT from new SACK

Take RTT sample if an ACK selectively acks some sequences that
have never been retransmitted. The Karn's algorithm does not apply
even if that ACK (s)acks other retransmitted sequences, because it
must been generated by an original but perhaps out-of-order packet.
There is no ambiguity. In case when multiple blocks are newly
sacked because of ACK losses the earliest block is used to
measure RTT, similar to cummulative ACKs.

Such RTT samples allow the sender to estimate the RTO during loss
recovery and packet reordering events. It is still useful even with
TCP timestamps. That's because during these events the SND.UNA may
not advance preventing RTT samples from TS ECR (thus the FLAG_ACKED
check before calling tcp_ack_update_rtt()). Therefore this new
RTT source is complementary to existing ACK and TS RTT mechanisms.

This patch does not update the RTO. It is done in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: prefer packet timing to TS-ECR for RTT

Prefer packet timings to TS-ecr for RTT measurements when both
sources are available. That's because broken middle-boxes and remote
peer can return packets with corrupted TS ECR fields. Similarly most
congestion controls that require RTT signals favor timing-based
sources as well. Also check for bad TS ECR values to avoid RTT
blow-ups. It has happened on production Web servers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: consolidate SYNACK RTT sampling

The first patch consolidates SYNACK and other RTT measurement to use a
central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
RTT measurement happens after PAWS check, potentially reducing the
impact of RTO seeding on bad TCP timestamps values.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fec'

Fabio Estevam says:

====================
This series improves clock handling in the driver by not enabling/disabling
the optional ptp and enet_out clocks unconditionally, check for the return value
of clk_prepare_enable and also handle clk_ptp in suspend/resume.

Remove an unneeded check in platform_get_resource() and also use
devm_request_irq() that can help to simplify the code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Frank Li <lznuaa@gmail.com>

fec: Use devm_request_irq()

Using devm_request_irq() can make the code smaller and cleaner.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Remove unneeded check in platform_get_resource()

As devm_ioremap_resource() is used, there is no need to explicitely check the
return value from platform_get_resource(), as this is something that
devm_ioremap_resource() takes care by itself.

Also, place platform_get_resource() prior to devm_ioremap_resource() for
better code readability.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Enable/disable clk_ptp in suspend/resume

clk_ptp should also be enabled in fec_resume() and disabled in fec_suspend().

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>