Eric Dumazet [Thu, 1 Aug 2013 18:43:08 +0000 (11:43 -0700)]
net: add a temporary sanity check in skb_orphan()
David suggested to add a BUG_ON() to catch if some layer
sets skb->sk pointer without a corresponding destructor.
As skb can sit in a queue, it's mandatory to make sure the
socket cannot disappear, and it's usually done by taking a
reference on the socket, then releasing it from the skb
destructor.
This patch is a follow-up to commit
c34a761231b5
("net: skb_orphan() changes") and will be reverted after
catching all possible offenders if any.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Thu, 1 Aug 2013 06:54:47 +0000 (08:54 +0200)]
ipv6: fib6_rules should return exact return value
With the addition of the suppress operation
(
7764a45a8f1fe74d4f7d301eaca2e558e7e2831a ("fib_rules: add .suppress
operation") we rely on accurate error reporting of the fib_rules.actions.
fib6_rule_action always returned -EAGAIN in case we could not find a
matching route and 0 if a rule was matched. This also included a match
for blackhole or prohibited rule actions which could get suppressed by
the new logic.
So adapt fib6_rule_action to always return the correct error code as
its counterpart fib4_rule_action does. This also fixes a possiblity of
nullptr-deref where we don't find a table, thus rt == NULL. Because
the condition rt != ip6_null_entry still holdes it seems we could later
get a nullptr bug on dereference rt->dst.
v2:
a) Fixed a brain fart in the commit msg (the rule => a table, etc). No
changes to the patch.
Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:39 +0000 (17:31 -0700)]
cls_cgroup.h netprio_cgroup.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:38 +0000 (17:31 -0700)]
checksum: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:37 +0000 (17:31 -0700)]
cfg80211.h/mac80211.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:36 +0000 (17:31 -0700)]
ax25.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:35 +0000 (17:31 -0700)]
arp/neighbour.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:34 +0000 (17:31 -0700)]
af_rxrpc.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:33 +0000 (17:31 -0700)]
af_unix.h: Remove extern from function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 1 Aug 2013 00:31:32 +0000 (17:31 -0700)]
addrconf.h: Remove extern function prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Wed, 31 Jul 2013 19:16:20 +0000 (15:16 -0400)]
Documentation: add networking/netdev-FAQ.txt
A collection of expectations and operational details about how
networking development takes place in the context of the netdev
mailing list.
The content is meant to capture specific items that are unique
to netdev workflow, and not re-document generic linux expectations
that are already captured elsewhere.
This was originally proposed[1] as a regular posting mailing list
FAQ, but it probably is more universally accessible here in tree.
[1] https://lwn.net/Articles/559211/
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Tomanek [Thu, 1 Aug 2013 00:17:15 +0000 (02:17 +0200)]
fib_rules: add .suppress operation
This change adds a new operation to the fib_rules_ops struct; it allows the
suppression of routing decisions if certain criteria are not met by its
results.
The first implemented constraint is a minimum prefix length added to the
structures of routing rules. If a rule is added with a minimum prefix length
>0, only routes meeting this threshold will be considered. Any other (more
general) routing table entries will be ignored.
When configuring a system with multiple network uplinks and default routes, it
is often convinient to reference the main routing table multiple times - but
omitting the default route. Using this patch and a modified "ip" utility, this
can be achieved by using the following command sequence:
$ ip route add table secuplink default via 10.42.23.1
$ ip rule add pref 100 table main prefixlength 1
$ ip rule add pref 150 fwmark 0xA table secuplink
With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By using a minimal prefixlength of 1, the default route (/0) of the
table "main" is hidden to packets processed by rule 100; packets traveling to
destinations with more specific routing entries are processed as usual.
Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Wed, 31 Jul 2013 05:47:13 +0000 (22:47 -0700)]
net: Remove extern from include/net/ scheduling prototypes
There are a mix of function prototypes with and without extern
in the kernel sources. Standardize on not using extern for
function prototypes.
Function prototypes don't need to be written with extern.
extern is assumed by the compiler. Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Reflow modified prototypes to 80 columns.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 30 Jul 2013 23:11:15 +0000 (16:11 -0700)]
net: skb_orphan() changes
It is illegal to set skb->sk without corresponding destructor.
Its therefore safe for skb_orphan() to not clear skb->sk if
skb->destructor is not set.
Also avoid clearing skb->destructor if already NULL.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 31 Jul 2013 00:55:08 +0000 (17:55 -0700)]
netem: Introduce skb_orphan_partial() helper
Commit
547669d483e578 ("tcp: xps: fix reordering issues") added
unexpected reorders in case netem is used in a MQ setup for high
performance test bed.
ETH=eth0
tc qd del dev $ETH root 2>/dev/null
tc qd add dev $ETH root handle 1: mq
for i in `seq 1 32`
do
tc qd add dev $ETH parent 1:$i netem delay 100ms
done
As all tcp packets are orphaned by netem, TCP stack believes it can
set skb->ooo_okay on all packets.
In order to allow producers to send more packets, we want to
keep sk_wmem_alloc from reaching sk_sndbuf limit.
We can do that by accounting one byte per skb in netem queues,
so that TCP stack is not fooled too much.
Tested:
With above MQ/netem setup, scaling number of concurrent flows gives
linear results and no reorders/retransmits
lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100
do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done
n:1 198.46
n:10 2002.69
n:20 4000.98
n:30 6006.35
n:40 8020.93
n:50 10032.3
n:60 12081.9
n:70 13971.3
n:80 16009.7
n:90 17117.3
n:100 17425.5
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fan.du [Tue, 30 Jul 2013 00:33:53 +0000 (08:33 +0800)]
net: split rt_genid for ipv4 and ipv6
Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:
- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
entries even when the policy is only applied for one address family.
Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Laurent Pinchart [Wed, 31 Jul 2013 07:42:11 +0000 (16:42 +0900)]
sh_eth: r8a7790: Handle the RFE (Receive FIFO overflow Error) interrupt
The RFE interrupt is enabled for the r8a7790 but isn't handled,
resulting in the interrupts core noticing unhandled interrupts, and
eventually disabling the ethernet IRQ.
Fix it by adding RFE to the bitmask of error interrupts to be handled
for r8a7790.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 31 Jul 2013 20:37:47 +0000 (13:37 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to ixgbe and pci.
The first patch for ixgbe from Greg Rose is the second submission. The
first submission of "ixgbe: Retain VLAN filtering in promiscuous + VT
mode" had a typo, which Joe Perches pointed out and is fixed in this
submission.
Alex updates the ixgbe driver to use the generic helper pci_vfs_assigned
instead of the driver specific function ixgbe_vfs_are_assigned.
Don Skidmore provides 4 patches for ixgbe, the first being a fix for
flow control ethtool reporting. Originally ixgbe_device_supports_autoneg_fc()
was expected to be called by only copper devices, which lead to false
information being displayed via ethtool. Two other patches add support
for fixed fiber for SFP+ devices and the addition of a quad-port x520
adapter. The last patch simply bumps the driver version.
Emil Tantilov provides 3 fixes for ixgbe, two of which resolve
semaphore lock issues. The third fix resolves several issues in the
previous implementation of the SFF data dumps of SFP+ modules.
The remaining ixgbe and pci patches are from Jacob Keller. The pci
patches exposes bus speed, link speed and bus width so that drivers
can take advantage of this information. In addition, adds a pci function
which obtains minimum link width and speed. Jacob also provides the
ixgbe patch to incorporate the pci function. He provides a patch that
fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This issue was found and reported by Stephen Hemminger.
-v2-
* fix patch 3 to be a bool function based on David Miller's feedback
* fix patch 4 debug message based on David Miller's feedback
* fix patch 8 moved the extern declarations to pci.h based on Bjorn
Helgaas's feedback
* fix patch 11 update the error message to include encoding loss based
* fix patch 8/9/10 title based on Bjorn's feedback
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Popov [Wed, 31 Jul 2013 09:39:45 +0000 (13:39 +0400)]
tcp: Remove unused tcpct declarations and comments
Remove declaration, 4 defines and confusing comment that are no longer used
since
1a2c6181c4 ("tcp: Remove TCPCT").
Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Acked-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Sat, 27 Jul 2013 06:25:38 +0000 (06:25 +0000)]
ixgbe: add support for quad-port x520 adapter
This is a x520 based quad-port (4x10Gbps) NIC with a single QSFP+
connector. Changes were required to our identify functions due to
different eeprom address which is also included here.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Tue, 23 Jul 2013 01:57:03 +0000 (01:57 +0000)]
ixgbe: clear semaphore bits on timeouts
This patch changes the error code path in ixgbe_acquire_swfw_sync() to deal
with cases where acquiring SW semaphore times out.
In cases where the SW/FW semaphore bits were set (i.e. due to a crash) the
driver will hang on load. With this patch the driver will clear
the stuck bits if the semaphore was not acquired in the allotted time.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 16 Jul 2013 07:57:46 +0000 (07:57 +0000)]
ixgbe: rename LL_EXTENDED_STATS to use queue instead of q
This patch renames the stats introduced by the busy poll feature so that they
are more inline with the current statistics naming schemes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 21 Jun 2013 08:14:32 +0000 (08:14 +0000)]
ixgbe: fix lockdep annotation issue for ptp's work item
This patch fixes a lockdep issue created due to ixgbe_ptp_stop always running
cancel_work_sync even if the work item had not been created properly with
INIT_WORK. This is caused because ixgbe_ptp_stop did not check to actually
ensure PTP was running first. The new implementation introduces a state in the
&adapter->state field which is used to indicate that PTP is running. (This
replaces the IXGBE_FLAG2_PTP_ENABLED field). This state will use the atomic
set_bit, test_bit, and test_and_clear_bit functions. ixgbe_ptp_stop will check
to ensure that PTP was enabled, (and if not, it will not attempt to do any
cleanup work from ixgbe_ptp_init). This resolves the lockdep annotation warning
found by Stephen Hemminger
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 31 Jul 2013 06:53:31 +0000 (06:53 +0000)]
ixgbe: call pcie_get_mimimum_link to check if device has enough bandwidth
This patch uses the new pcie_get_minimum_link function to perform a check to
ensure that the adapter is hooked into a slot which is capable of providing the
necessary bandwidth. This check supersedes the original method which only
checked the current pci device. The new method is capable of determining the
minimum speed and link of an entire PCI chain.
-v2-
* update the error message to include encoding loss
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 31 Jul 2013 06:53:26 +0000 (06:53 +0000)]
PCI: Add function to obtain minimum link width and speed
A PCI Express device can potentially report a link width and speed which it will
not properly fulfill due to being plugged into a slower link higher in the
chain. This function walks up the PCI bus chain and calculates the minimum link
width and speed of this entire chain. This can be useful to enable a device to
determine if it has enough bandwidth for optimum functionality.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dan Carpenter [Mon, 29 Jul 2013 19:15:19 +0000 (22:15 +0300)]
net: remove an unneeded check
"ifa->ifa_label" is an array inside the in_ifaddr struct. It can never
be NULL so we can remove this check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 29 Jul 2013 18:07:42 +0000 (11:07 -0700)]
flow_dissector: add support for IPPROTO_IPV6
Support IPPROTO_IPV6 similar to IPPROTO_IPIP
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Mon, 29 Jul 2013 18:07:36 +0000 (11:07 -0700)]
flow_dissector: clean up IPIP case
Explicitly set proto to ETH_P_IP and jump directly to ip processing.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 31 Jul 2013 06:53:21 +0000 (06:53 +0000)]
PCI: move enum pcie_link_width into pci.h
pcie_link_width is the enum used to define the link width values for a pcie
device. This enum should not be contained solely in pci_hotplug.h, and this
patch moves it next to pci_bus_speed in pci.h
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 31 Jul 2013 06:53:16 +0000 (06:53 +0000)]
PCI: expose pcie_link_speed and pcix_bus_speed arrays
pcie_link_speed and pcix_bus_speed are arrays used by probe.c to correctly
convert lnksta register values into the pci_bus_speed enum. These static arrays
are useful outside probe for this purpose. This patch makes these defines into
conist arrays and exposes them with an extern header in drivers/pci/pci.h
-v2-
* move extern declarations to drivers/pci/pci.h
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 29 May 2013 06:23:10 +0000 (06:23 +0000)]
ixgbe: fix SFF data dumps of SFP+ modules
This patch fixes several issues with the previous implementation of the
SFF data dump of SFP+ modules:
- removed the __IXGBE_READ_I2C flag - I2C access locking is handled in the
HW specific routines
- fixed the read loop to read data from ee->offset to ee->len
- the reads fail if __IXGBE_IN_SFP_INIT is set in the process - this is
needed because on some HW I2C operations can take long time and disrupt
the SFP and link detection process
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 29 May 2013 06:23:05 +0000 (06:23 +0000)]
ixgbe: fix semaphore lock for I2C read/writes on 82598
ixgbe_read/write_i2c_phy_82598() does not hold the SWFW_SYNC
semaphore for the entire function. Instead the lock is held only
during the phy.ops.read/write_reg operations. As result when the
function is being called simultaneously the I2C read/writes can
be corrupted.
The following patch introduces the SWFW_SYNC semaphore for the
entire ixgbe_read/write_i2c_phy_82598() function. To accomplish
this I had to create 2 separate functions:
ixgbe_read_phy_reg_mdi()
ixgbe_write_phy_reg_mdi()
Those functions are identical to ixgbe_read/write_phy_reg_generic()
sans the locking, and can be used in ixgbe_read/write_i2c_phy_82598()
with the SWFW_SYNC semaphore being held.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 15 May 2013 07:34:50 +0000 (07:34 +0000)]
ixgbe: bump version number
Bump the version number to better match with a similar version of the
out of tree driver.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 31 Jul 2013 02:17:40 +0000 (02:17 +0000)]
ixgbe: add new media type.
This patch adds support for a new media type fiber_fixed. This is useful
to avoid all the SFP+ hot plug support path on devices who's fix fiber need
not worry about such things. This patch is needed for a following patch
that adds support for "fiber_fixed" devices.
v2: cleaned up logging message based on feedback from David Miller
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 31 Jul 2013 00:32:12 +0000 (17:32 -0700)]
Merge branch 'phys_port'
Jiri Pirko says:
====================
This patchset is based on patch by Narendra_K@Dell.com
Once device which can change phys port id during its lifetime adopts this,
NETDEV_CHANGEPHYSPORTID event will be added and driver will call
call_netdevice_notifiers(NETDEV_NETDEV_CHANGEPHYSPORTID, dev) to propagate
the change to userspace.
v1->v2: as suggested by Ben, handle -EOPNOTSUPP in rtnl code (wrapped up ndo call)
v2->v3: adjusted patch 1 commit message
v3->v4: used "%phN" for sysfs printf as suggested by DaveM
added igb/igbvf implementation as requested by Or Gerlitz
v4->v5: used prandom_u32 to generate id in igb_probe
removed duplicate code in ibgvf_probe
pushed dev_err string into one line in igbvf_refresh_ppid
v5->v6: use uuid_le_gen for generating 16-byte phys port id for igb/igbvf
as suggested by BenH
1) Why do we need this, and why do existing facilities fail to provide
a way to accomplish this?
Currenty there's very hard to tell if two netdevs are using the same physical
port. For sr-iov this can be get by sysfs. For other mechanisms, like NPAR
there's very hard to do it (one must learn it from NIC BIOS). But even for
sr-iov there's no way to say if two netdevs are using the same phys port when
these are passed through to virtual guests.
This patchset provides the generic way of letting this information know to
userspace. This info can be used by apps like NetworkManager, teamd, Wicked,
ovs daemon, etc, to do smarter bonding decisions.
2) Why is the physical port ID defined as a 32 byte opaque cookie?
What formats and layouts need to be accomodated, and which
influenced the design of the ID?
For user to distinguish if two netdevs are using the same port, he only needs
to compare their phys port ids. Nothing else is needed. This id has no
structure for security reasons. VF should not know anything about PF.
3) Are IDs globally unique? Why or why not? If IDs should be
globally unique, but only in certain cases, what exactly are those
cases.
Most of the time only uniqueness needed is in scope of single machine.
There might be case when the id should be unique between couple of machines
in virtualization environment. Given that for example for igb/igbvf 16B uuid
is used, there is no problem for this case as well. But each driver can
implement this differently focusing the hw capabilities and needs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 29 Jul 2013 16:16:51 +0000 (18:16 +0200)]
net: export physical port id via sysfs
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 29 Jul 2013 16:16:50 +0000 (18:16 +0200)]
rtnl: export physical port id via RT netlink
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 29 Jul 2013 16:16:49 +0000 (18:16 +0200)]
net: add ndo to get id of physical port of the device
This patch adds a ndo for getting physical port of the device. Driver
which is aware of being virtual function of some physical port should
implement this ndo. This is applicable not only for IOV, but for other
solutions (NPAR, multichannel) as well. Basically if there is possible
to have multiple netdevs on the single hw port.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Wed, 31 Jul 2013 02:19:24 +0000 (02:19 +0000)]
ixgbe: fix fc autoneg ethtool reporting.
Originally ixgbe_device_supports_autoneg_fc() was only expected to
be called by copper devices. This would lead to false information
to be displayed via ethtool.
v2: changed ixgbe_device_supports_autoneg_fc() to a bool function,
it returns bool. Based on feedback from David Miller
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 26 Mar 2013 00:03:21 +0000 (00:03 +0000)]
ixgbe: Use pci_vfs_assigned instead of ixgbe_vfs_are_assigned
This change makes it so that the ixgbe driver uses the generic helper
pci_vfs_assigned instead of the ixgbe specific function
ixgbe_vfs_are_assigned.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Fri, 22 Feb 2013 02:14:39 +0000 (02:14 +0000)]
ixgbe: Retain VLAN filtering in promiscuous + VT mode
When using the new bridge FDB interface to allow SR-IOV virtual function
network devices to communicate with SW bridged network devices the
physical function is placed into promiscuous mode and hardware VLAN
filtering is disabled. This defeats the ability to use VLAN tagging
to isolate user networks. When the device is in promiscuous mode and
VT mode simultaneously ensure that VLAN hardware filtering remains
enabled.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:28 +0000 (15:21 +0200)]
net: mvneta: support big endian
Use the "swap descriptor" feature of the hardware to properly swap the
descriptors when running in big endian mode. Since the swapping occurs
on 64 bits words, we also need to provide a separate structure layout
for the DMA descriptors between little endian and big endian mode,
like is done in the mv643xx_eth driver.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Mon, 29 Jul 2013 13:21:27 +0000 (15:21 +0200)]
net: mvneta: move the RX and TX desc macros outside of the structs
The macros used for the various fields of the RX and TX descriptions
are currently declared next to those fields within the structure
definitions of the RX and TX descriptors.
However, in order to support big endian, we'll have to use the "swap
descriptors" features of the hardware, which swaps every byte within
each 64 bits word of the descriptors. This requires a separate
definition of the RX and TX descriptor structures for little and big
endian, as is done in the mv643xx_eth. Those macros can therefore no
longer be defined inside those structures.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 29 Jul 2013 11:44:15 +0000 (13:44 +0200)]
pktgen: Require CONFIG_INET due to use of IPv4 checksum function
Unlike for IPv6, the IPv4 checksum functions are only available
if CONFIG_INET is set.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:43:23 +0000 (17:43 +0200)]
tcp: add tcp_syncookies mode to allow unconditionally generation of syncookies
| If you want to test which effects syncookies have to your
| network connections you can set this knob to 2 to enable
| unconditionally generation of syncookies.
Original idea and first implementation by Eric Dumazet.
Cc: Florian Westphal <fw@strlen.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Thu, 25 Jul 2013 18:14:01 +0000 (23:44 +0530)]
drivers: net: cpsw: Add support for set MAC address
Adding support for setting MAC address to cpsw device via ndo_set_mac_address
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Metcalf [Thu, 25 Jul 2013 16:41:15 +0000 (12:41 -0400)]
tile: handle 64-bit statistics in tilepro network driver
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andi Shyti [Thu, 25 Jul 2013 08:54:24 +0000 (10:54 +0200)]
9p: client: remove unused code and any reference to "cancelled" function
This patch reverts commit
80b45261a0b263536b043c5ccfc4ba4fc27c2acc
which was implementing a 'cancelled' functionality to notify that
a cancelled request will not be replied.
This implementation was not used anywhere and therefore removed.
Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 25 Jul 2013 14:10:55 +0000 (16:10 +0200)]
be2net: don't use dev_err when AER enabling fails
The driver uses dev_err when enabling of AER fails (e.g. PCIe AER is not
supported). The dev_info is more appropriate to avoid console pollution.
Cc: sathya.perla@emulex.com
Cc: subbu.seetharaman@emulex.com
Cc: ajit.khaparde@emulex.com
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:40 +0000 (13:58 -0700)]
tg3: Update version to 3.133
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:39 +0000 (13:58 -0700)]
tg3: Fix UDP fragments treated as RMCP
The 5762 devices sometimes incorrectly treat udp fragments as RMCP
packets and route to the APE. This patch sets the RX_MODE_IPV4_FRAG_FIX
bit for these devices which enables the proper behaviour.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:38 +0000 (13:58 -0700)]
tg3: Enable support for timesync gpio output
The PTP_CAPABLE tg3 devices have a gpio output that is toggled when the
free running counter matches a watchdog value. This patch adds support
to set the watchdog and enable this feature.
Since the output is controlled via bits in the EAV_REF_CLCK_CTL
register, we have to read-modify-write it when we stop/resume.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:37 +0000 (13:58 -0700)]
tg3: Implement the shutdown handler
Also remove the call to tg3_power_down_prepare() in tg3_power_down()
since tg3_close() calls it.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:36 +0000 (13:58 -0700)]
tg3: Allow NVRAM programming when interface is down
Previously, when the interface was brought down, the driver would set
the power state to D3hot. In D3hot, we don't have access to the NVRAM.
This patch removes the call to set the power state to PCI_D3hot in
close. A following patch will implement the shutdown handler to properly
set the D3hot state when the system is going down.
Doing the above means that the TG3_PHYFLG_IS_LOW_POWER should not be
checked to validate access to the NVRAM.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Mon, 29 Jul 2013 20:58:35 +0000 (13:58 -0700)]
tg3: Remove incorrect switch to aux power
During probe, the driver is incorrectly switching the power to Vaux on
the 5717 and later devices. At this point, we are in D0 state and
drawing maximum power. We also definitely have Vmain available. It
doesn't make sense to switch to Vaux since it has a lesser maximum power
draw and we might go over the limit. On a new system, we observe that
not all ports are recognized in some of the slots with this call in
place.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 29 Jul 2013 02:04:00 +0000 (19:04 -0700)]
cnic: Update version to 2.5.17 and copyright year.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eddie Wai [Mon, 29 Jul 2013 02:03:59 +0000 (19:03 -0700)]
cnic: Add missing error checking for RAMROD_CMD_ID_CLOSE
Completion status field should also be checked for non-zero error
condition.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eddie Wai [Mon, 29 Jul 2013 02:03:58 +0000 (19:03 -0700)]
cnic: Update TCP options setup for iSCSI.
Update TCP delayed ACK and timestamp options setup to match latest bnx2x
firmware.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eddie Wai [Mon, 29 Jul 2013 02:03:57 +0000 (19:03 -0700)]
cnic: Reset tcp_flags during cnic_cm_create().
Without resetting it, the bnx2i driver cannot use different options for
different iSCSI connections.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 29 Jul 2013 02:03:56 +0000 (19:03 -0700)]
cnic: Simplify cnic_release().
Since unregister_netdevice_notifier() will replay the NETDEV_DOWN and
NETDEV_UNREGISTER_EVENTS, the cnic_dev_list will be cleaned up automatically.
The loop to cleanup the cnic_dev_list can be removed in cnic_release().
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 29 Jul 2013 02:03:55 +0000 (19:03 -0700)]
cnic: Simplify netdev events handling.
After this earlier commit to simplify probing:
commit
4bd9b0fffb193d2e288f67f81821af32df8d4349
cnic, bnx2x, bnx2: Simplify cnic probing.
we can now reliably receive netdev events and we can simplify the handling
of these events. We now remove the logic that tries to handle missed
NETDEV_REGISTER events.
This change will allow cleanup to be simplified in the next patch. We can
now rely on the play back of netdev events during
unregister_netdevice_notifier() to cleanup the structures.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Sun, 28 Jul 2013 15:54:21 +0000 (18:54 +0300)]
net/mlx4_core: Respond to operation request by firmware
This commit adds new firmware command and new firmware event. The firmware
raises the MLX4_EVENT_TYPE_OP_REQUIRED event in order to signal the driver it
needs to perform an administrative operation throughout the MLX4_CMD_GET_OP_REQ
command. At the moment the supported operation is adding/removing multicast
entries which are used by the firmware for handling NCSI traffic in B0
steering mode.
Also, had to swap the order of mlx4_init_mcg_table() and
mlx4_init_eq_table() to make sure that driver will get events only after
resources are initialized to handle it.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eugenia Emantayev [Thu, 25 Jul 2013 16:21:23 +0000 (19:21 +0300)]
net/mlx4_en: Fix BlueFlame race
Fix a race between BlueFlame flow and stamping in post send flow.
Example:
SW: Build WQE 0 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 0 on the TX buffer
SW: Ring doorbell for WQE 0
SW: Build WQE 1 on the TX buffer, except the ownership bit
SW: Set ownership for WQE 1 on the TX buffer
HW: Read WQE 0 and then WQE 1, before doorbell was rung/BF was done for WQE 1
HW: Produce CQEs for WQE 0 and WQE 1
SW: Process the CQEs, and stamp WQE 0 and WQE 1 accordingly (on the TX buffer)
SW: Copy WQE 1 from the TX buffer to the BF register - ALREADY STAMPED!
HW: CQE error with index 0xFFFF - the BF WQE's control segment is STAMPED,
so the BF index is 0xFFFF. Error: Invalid Opcode.
As a result QP enters the error state and no traffic can be sent.
Solution:
When stamping - do not stamp last completed wqe.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Mon, 29 Jul 2013 06:21:26 +0000 (16:21 +1000)]
pktgen: add needed include file
Fixes this on PowerPC (at least):
net/core/pktgen.c: In function 'fill_packet_ipv6':
net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0);
^
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 28 Jul 2013 20:18:49 +0000 (13:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to e100 and e1000e.
The e100 patch from Andy simply updates the netif_printk() to use
%*ph to dump small buffers.
The changes to e1000e include a fix from Dean Nelson to resolve a
issue where a pci_clear_master() was accidentally dropped during a
conflict resolution. Wei Young provides 2 patches, one removes an
assignment of the default ring size because it was a duplicate. The
second changes the packet split receive structure to use
PS_PAGE_BUFFERS macro for the length so that problems won't occur
when the length is changed.
The remaining patches for e1000e are from Bruce Allan, where he
provides a number of fixes and updates for I218. In addition, a
fix for 82583 which can disappear off the PCIe bus, to resolve this,
disable ASPM L1. Bruce also provides a fix to a previous commit
(commit
e60b22c5b7 e1000e: fix accessing to suspended device) so that
devices are only taken out of runtime power management for those
ethtool operations that must access device registers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Fri, 26 Jul 2013 15:05:16 +0000 (17:05 +0200)]
ipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROL
v2:
a) Also send ipv4 igmp messages with TC_PRIO_CONTROL
Cc: William Manley <william.manley@youview.com>
Cc: Lukas Tribus <luky-37@hotmail.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Sat, 29 Jun 2013 07:42:39 +0000 (07:42 +0000)]
e1000e: fix I217/I218 PHY initialization flow
The initialization of the PHY on I217/I218, while similar to 82579, must
also check to see if the MAC and PHY are in the same mode (PCIe vs. SMBus)
otherwise the PHY will be inaccessible by the MAC.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 29 Jun 2013 07:42:25 +0000 (07:42 +0000)]
e1000e: do not resume device from RPM suspend to read PHY status registers
When the device is runtime suspended (e.g. when there is no link), do not
wake it from D3 to read the PHY status; just set the values to typical
power-on defaults as is done when runtime PM is not enabled and there is no
link.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 29 Jun 2013 01:15:16 +0000 (01:15 +0000)]
e1000e: enable support for new device IDs
The device IDs 0x15a0 and 0x15a1 are new SKUs that contain the same MAC as
I217 and same PHY as I218.
The device IDs 0x15a2 and 0x15a3 are the same as existing I218 SKUs.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Thu, 27 Jun 2013 02:44:44 +0000 (02:44 +0000)]
e1000e: ethtool unnecessarily takes device out of RPM suspend
A previous patch (commit
e60b22c5b7 e1000e: fix accessing to suspended
device) added .begin and .complete ethtool driver callbacks so that the
device was resumed from Runtime Power Management (RPM) suspend state for
all ethtool operations. This is overkill for operations which do not need
to access any registers in the device. This patch makes it so that the
device is taken out of RPM suspend only for those ethtool operations that
must access device registers.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 21 Jun 2013 09:07:13 +0000 (09:07 +0000)]
e1000e: Tx hang on I218 when linked at 100Half and slow response at 10Mbps
Tx hang is an unintended consequence of another workaround that is in the
EEPROM for an issue with the firmware at 10Mbps when K1 (a power mode of
the MAC-PHY interconnect) is enabled. The issue is resolved by setting
appropriate Tx re-transmission timeouts in the PHY and associated K1 entry
times in the MAC to allow enough transmissions to occur without triggering
a Tx hang. A similar change is needed when linked at 10Mbps to improve
latency.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 21 Jun 2013 09:07:07 +0000 (09:07 +0000)]
e1000e: low throughput using 4K jumbos on I218
Alter the packet buffer allocation accordingly.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 21 Jun 2013 09:07:02 +0000 (09:07 +0000)]
e1000e: iAMT connections drop on driver unload when jumbo frames enabled
The jumbo frame configuration in the MAC/PHY should be reverted on 82579
and newer parts when the interface is brought down (not just when the MTU
is changed back to standard frame size) otherwise iAMT connections (e.g.
SoL, IDE-R) will be dropped and cannot be re-acquired until the MTU is
changed again.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 21 Jun 2013 09:07:18 +0000 (09:07 +0000)]
e1000e: disable ASPM L1 on 82583
The 82583 can disappear off the PCIe bus. This device is a modified 82574
which had the same problem which was fixed by disabling ASPM L1; disabling
it on 82583 fixes the issue on this device.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Wei Yang [Sat, 25 May 2013 06:23:45 +0000 (06:23 +0000)]
e1000e: Use marco instead of digit for defining e1000_rx_desc_packet_split
In structure e1000_rx_desc_packet_split, the size of wb.upper.length is
defined by a digit. This may introduce some problem when the length is
changed.
This patch use the macro PS_PAGE_BUFFERS for the definition. And move the
definition to hw.h.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Wei Yang [Mon, 20 May 2013 17:31:09 +0000 (17:31 +0000)]
e1000e: Remove duplicate assignment of default rx/tx ring size
tx_ring/rx_ring size is assigned in function e1000_alloc_queues(), which is
called by e1000_sw_init() in the early stage of e1000_probe().
This patch just remove the duplicate assignment of this default ring size
value.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Da Yu Qiu <qiudayu@cn.ibm.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Dean Nelson [Thu, 13 Jun 2013 03:55:44 +0000 (03:55 +0000)]
e1000e: restore call to pci_clear_master()
In attempting to resolve a minor merge conflict, commit
e5f2ef7ab4690d2e8faa
(Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) accidentally
dropped a call to pci_clear_master() that was intended to remain in place.
Commit
4e0855dff094b0d56d6b (e1000e: fix pci-device enable-counter balance)
replaced a call to pci_disable_device() by one to pci_clear_master(). And then
commit
66148babe728f3e00e13 (e1000e: fix runtime power management transitions)
deleted a number of lines starting two lines following that call.
This patch restores the call to pci_clear_master() in __e1000_shutdown().
v2: added summary lines (enclosed in parens) following commit IDs
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Andy Shevchenko [Wed, 29 May 2013 18:40:36 +0000 (18:40 +0000)]
e100: dump small buffers via %*ph
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
nikolay@redhat.com [Sat, 27 Jul 2013 17:10:10 +0000 (19:10 +0200)]
bonding: remove bond_resend_igmp_join_requests read_unlock leftover
After commit
4aa5dee4d9 ("net: convert resend IGMP to notifier event") we
have 1 read_unlock in bond_resend_igmp_join_requests which isn't paired
with a read_lock because it's removed by that commit.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 25 Jul 2013 12:08:04 +0000 (14:08 +0200)]
pktgen: Use ip_send_check() to compute checksum
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 25 Jul 2013 16:12:18 +0000 (18:12 +0200)]
pktgen: Add UDPCSUM flag to support UDP checksums
UDP checksums are optional, hence pktgen has been omitting them in
favour of performance. The optional flag UDPCSUM enables UDP
checksumming. If the output device supports hardware checksumming
the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the
checksum is generated in software.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Asias He [Thu, 25 Jul 2013 09:39:34 +0000 (17:39 +0800)]
VSOCK: Move af_vsock.h and vsock_addr.h to include/net
This is useful for other VSOCK transport implemented outside the
net/vmw_vsock/ directory to use these headers.
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 28 Jul 2013 03:22:56 +0000 (20:22 -0700)]
Merge branch 'minnow/net-next' of git://git.infradead.org/users/dvhart/linux-2.6 into minnow
Darren Hart says:
====================
Add support for the MinnowBoard in the pch_gbe driver. This was
originally sent to LKML as part of the MinnowBoard support series. That
is now partially merged and this version of the patch has been isolated
from those changes and is now completely self-contained.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ming Lei [Thu, 25 Jul 2013 05:47:54 +0000 (13:47 +0800)]
USBNET: increase max rx/tx qlen for improving USB3 thoughtput
The default RX_QLEN()/TX_QLEN() didn't consider super speed
USB device, so only max 4 URBs are scheduled at the same time
for tx/rx, then USB3 NIC can't perform very well.
With this patch, both rx and tx thoughput are increased more than
100Mbps when doing iperf test on ax88179_178a USB 3.0 NIC.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ming Lei [Thu, 25 Jul 2013 05:47:53 +0000 (13:47 +0800)]
USBNET: centralize computing of max rx/tx qlen
This patch centralizes computing of max rx/tx qlen, because:
- RX_QLEN()/TX_QLEN() is called in hot path
- computing depends on device's usb speed, now we have ls/fs, hs, ss,
so more checks need to be involved
- in fact, max rx/tx qlen should not only depend on device USB
speed, but also depend on ethernet link speed, so we need to
consider that in future.
- if SG support is done, max tx qlen may need change too
Generally, hard_mtu and rx_urb_size are changed in bind(), reset()
and link_reset() callback, and change mtu network operation, this
patches introduces the API of usbnet_update_max_qlen(), and calls
it in above path.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 25 Jul 2013 05:00:33 +0000 (13:00 +0800)]
tuntap: hardware vlan tx support
Inspired by commit
f09e2249c4f5c7c13261ec73f5a7807076af0c8e (macvtap: restore
vlan header on user read). This patch adds hardware vlan tx support for
tuntap. This is done by copying vlan header directly into userspace in
tun_put_user() instead of doing it through __vlan_put_tag() in
dev_hard_start_xmit(). This eliminates one unnecessary memmove() in
vlan_insert_tag() for 802.1ad and 802.1q traffic.
pktgen test shows about 20% improvement for 802.1q traffic:
Before:
662149pps 317Mb/sec (317831520bps) errors: 0
After:
801033pps 384Mb/sec (384495840bps) errors: 0
Cc: Basil Gor <basil.gor@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Thu, 25 Jul 2013 01:52:05 +0000 (10:52 +0900)]
net/sctp: Refactor SCTP skb checksum computation
This patch consolidates the SCTP checksum calculation code from various
places to a single new function, sctp_compute_cksum(skb, offset).
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Thu, 25 Jul 2013 00:50:23 +0000 (10:20 +0930)]
virtio-net: put virtio net header inline with data
For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 24 Jul 2013 18:53:57 +0000 (11:53 -0700)]
bond: cleanup netpoll code
This started out with fixing a sparse warning, then I realized that
the wrapper function bond_netpoll_info could just be removed
by rolling it into the enable code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 24 Jul 2013 18:52:44 +0000 (11:52 -0700)]
team: cleanup netpoll clode
This started out with fixing a sparse warning, then I realized that
the wrapper function team_netpoll_info could just be collapsed away
by rolling it into the enable code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Wed, 24 Jul 2013 18:51:41 +0000 (11:51 -0700)]
bridge: cleanup netpoll code
This started out with fixing a sparse warning, then I realized that
the wrapper function br_netpoll_info could just be collapsed away
by rolling it into the enable code.
Also, eliminate unnecessary goto's
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Sheng-Hui [Wed, 24 Jul 2013 06:53:26 +0000 (14:53 +0800)]
bonding: use pre-defined macro in bond_mode_name instead of magic number 0
We have BOND_MODE_ROUNDROBIN pre-defined as 0, and it's the lowest
mode number.
Use it to check the arg lower bound instead of magic number 0 in
bond_mode_name.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Darren Hart [Sat, 18 May 2013 21:46:00 +0000 (14:46 -0700)]
pch_gbe: Add MinnowBoard support
The MinnowBoard uses an AR803x PHY with the PCH GBE which requires
special handling. Use the MinnowBoard PCI Subsystem ID to detect this
and add a pci_device_id.driver_data structure and functions to handle
platform setup.
The AR803x does not implement the RGMII 2ns TX clock delay in the trace
routing nor via strapping. Add a detection method for the board and the
PHY and enable the TX clock delay via the registers.
This PHY will hibernate without link for 10 seconds. Ensure the PHY is
awake for probe and then disable hibernation. A future improvement would
be to convert pch_gbe to using PHYLIB and making sure we can wake the
PHY at the necessary times rather than permanently disabling it.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Joe Perches <joe@perches.com>
Cc: netdev@vger.kernel.org
Wolfram Sang [Tue, 23 Jul 2013 18:01:45 +0000 (20:01 +0200)]
drivers/net/ethernet/stmicro/stmmac: don't check resource with devm_ioremap_resource
devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Darren Hart [Sat, 18 May 2013 21:45:55 +0000 (14:45 -0700)]
pch_gbe: Use PCH_GBE_PHY_REGS_LEN instead of 32
Avoid using magic numbers when we have perfectly good defines just lying
around.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: netdev@vger.kernel.org
Thomas Gleixner [Tue, 23 Jul 2013 14:13:17 +0000 (16:13 +0200)]
net: Make devnet_rename_seq static
No users outside net/core/dev.c.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 23 Jul 2013 03:27:07 +0000 (20:27 -0700)]
tcp: TCP_NOTSENT_LOWAT socket option
Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.
TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :
Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)
For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())
This patch adds two ways to set the limit :
1) Per socket option TCP_NOTSENT_LOWAT
2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.
This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat
Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.
Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)
Tested:
netperf sessions, and watching /proc/net/protocols "memory" column for TCP
With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y
TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y
Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.
A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.
lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
412,514 context-switches
200.
034645535 seconds time elapsed
lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB
Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':
2,675,818 context-switches
200.
029651391 seconds time elapsed
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 23 Jul 2013 03:26:31 +0000 (20:26 -0700)]
net: add sk_stream_is_writeable() helper
Several call sites use the hardcoded following condition :
sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)
Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 23 Jul 2013 12:51:48 +0000 (14:51 +0200)]
net: sctp: trivial: add uapi/linux/sctp.h into maintainers
After this file has moved to the uapi section, we also need to update
this in the maintainers file.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 23 Jul 2013 12:51:47 +0000 (14:51 +0200)]
net: sctp: trivial: update mailing list address
The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>