Joe Perches [Sun, 3 Feb 2013 17:28:11 +0000 (17:28 +0000)]
drivers: net: usb: Remove unnecessary alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 3 Feb 2013 17:43:58 +0000 (17:43 +0000)]
ethernet: Remove unnecessary alloc/OOM messages, alloc cleanups
alloc failures already get standardized OOM
messages and a dump_stack.
Convert kzalloc's with multiplies to kcalloc.
Convert kmalloc's with multiplies to kmalloc_array.
Fix a few whitespace defects.
Convert a constant 6 to ETH_ALEN.
Use parentheses around sizeof.
Convert vmalloc/memset to vzalloc.
Remove now unused size variables.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 3 Feb 2013 17:28:09 +0000 (17:28 +0000)]
can: Remove unnecessary alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 3 Feb 2013 17:28:08 +0000 (17:28 +0000)]
caif: Remove unnecessary alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 1 Feb 2013 04:37:43 +0000 (04:37 +0000)]
sctp: sctp_close: fix release of bindings for deferred call_rcu's
It seems due to RCU usage, i.e. within SCTP's address binding list,
a, say, ``behavioral change'' was introduced which does actually
not conform to the RFC anymore. In particular consider the following
(fictional) scenario to demonstrate this:
do:
Two SOCK_SEQPACKET-style sockets are opened (S1, S2)
S1 is bound to 127.0.0.1, port 1024 [server]
S2 is bound to 127.0.0.1, port 1025 [client]
listen(2) is invoked on S1
From S2 we call one sendmsg(2) with msg.msg_name and
msg.msg_namelen parameters set to the server's
address
S1, S2 are closed
goto do
The first pass of this loop passes successful, while the second round
fails during binding of S1 (address still in use). What is happening?
In the first round, the initial handshake is being done, and, at the
time close(2) is called on S1, a non-graceful shutdown is performed via
ABORT since in S1's receive queue an unprocessed packet is present,
thus stating an error condition. This can be considered as a correct
behavior.
During close also all bound addresses are freed, thus nothing *must*
be active anymore. In reference to RFC2960:
After checking the Verification Tag, the receiving endpoint shall
remove the association from its record, and shall report the
termination to its upper layer. (9.1 Abort of an Association)
Also, no half-open states are supported, thus after an ungraceful
shutdown, we leave nothing behind. However, this seems not to be
happening though. In a real-world scenario, this is exactly where
it breaks the lksctp-tools functional test suite, *for instance*:
./test_sockopt
test_sockopt.c 1 PASS : getsockopt(SCTP_STATUS) on a socket with no assoc
test_sockopt.c 2 PASS : getsockopt(SCTP_STATUS)
test_sockopt.c 3 PASS : getsockopt(SCTP_STATUS) with invalid associd
test_sockopt.c 4 PASS : getsockopt(SCTP_STATUS) with NULL associd
test_sockopt.c 5 BROK : bind: Address already in use
The underlying problem is that sctp_endpoint_destroy() hasn't been
triggered yet while the next bind attempt is being done. It will be
triggered eventually (but too late) by sctp_transport_destroy_rcu()
after one RCU grace period:
sctp_transport_destroy()
sctp_transport_destroy_rcu() ----.
sctp_association_put() [*] <--+--> sctp_packet_free()
sctp_association_destroy() [...]
sctp_endpoint_put() skb->destructor
sctp_endpoint_destroy() sctp_wfree()
sctp_bind_addr_free() sctp_association_put() [*]
Thus, we move out the condition with sctp_association_put() as well as
the sctp_packet_free() invocation and the issue can be solved. We also
better free the SCTP chunks first before putting the ref of the association.
With this patch, the example above (which simulates a similar scenario
as in the implementation of this test case) and therefore also the test
suite run successfully through. Tested by myself.
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vipul Pandya [Fri, 1 Feb 2013 00:03:47 +0000 (00:03 +0000)]
cxgb3: Update VLAN extraction stats in the GRO path
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 31 Jan 2013 16:31:00 +0000 (16:31 +0000)]
netns: bond: allow unprivileged users to control bond device
reduce the permission check of bond device's ioctl.
allow the userns root to control the bond device.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 31 Jan 2013 16:30:59 +0000 (16:30 +0000)]
netns: bridge: allow unprivileged users add/delete mdb entry
since the mdb table is belong to bridge device,and the
bridge device can only be seen in one netns.
So it's safe to allow unprivileged user which is the
creator of userns and netns to modify the mdb table.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 31 Jan 2013 16:30:58 +0000 (16:30 +0000)]
netns: ebtable: allow unprivileged users to operate ebtables
ebt_table is a private resource of netns, operating ebtables
in one netns will not affect other netns, we can allow the
creator user of userns and netns to change the ebtables.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 31 Jan 2013 16:30:57 +0000 (16:30 +0000)]
netns: fdb: allow unprivileged users to add/del fdb entries
Right now,only ixgdb,macvlan,vxlan and bridge implement
fdb_add/fdb_del operations.
these operations only operate the private data of net
device. So allowing the unprivileged users who creates
the userns and netns to add/del fdb entries will do no
harm to other netns.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Khoroshilov [Fri, 1 Feb 2013 12:09:19 +0000 (12:09 +0000)]
stmmac: don't return zero on failure path in stmmac_pci_probe()
If stmmac_dvr_probe() fails in stmmac_pci_probe(), it breaks off initialization,
deallocates all resources, but returns zero.
The patch adds -ENODEV as return value in this case.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 Feb 2013 04:13:00 +0000 (23:13 -0500)]
Merge branch 'delete-wanrouter' of git://git./linux/kernel/git/paulg/linux
Paul Gortmaker says:
====================
The removal of wanrouter code was originally listed in the (now
gone) feature removal file since May 2012, and an RFC of the
deletion was posted[1] in late 2012. The overall concept was given
an OK, but defconfig contamination, build failures, etc. meant that
it didn't quite make it into mainline for 3.8.
Since that time, Dan discovered (via code audit) a runtime bug that
proves nobody has been using this for over four years[2]. With that
new information, I think it makes sense for someone to follow through
on Joe's original RFC and get this done for the 3.9 release.
In addition to resolving the build failures of the RFC by keeping
stub headers, this also splits the change into two parts, just like
the token ring removal did. Part #1 decouples the mainline kernel
from the expired subsystem, and part #2 does the large scale
deletion of the subsystem content.
The advantage of the above, is that a "git blame" will never lead
you to a 4000+ line deletion commit. The large scale deletion will
never show up in a "git blame" and hence the same advantages that we
get from the "--irreversible-delete" in the review stage of "git
format-patch" are also embedded into the git history itself. This
may seem like a moot point to some, but for those who spend a
considerable amount of time data mining in the git history, this is
probably worth doing.
I have done build tests of all[mod/yes]config for both the stage 1
(Makefile and Kconfig) and stage 2 (full driver delete) as a sanity
check, and the issues with the previously posted RFC should be gone.
Speaking of "--irreversible-delete" -- these patches were created
with that option, so if you want to use them locally, you are going
to have to pull (location below) the content instead of doing a
"git am" of the mailed out content.
[1] http://patchwork.ozlabs.org/patch/198794/
[2] http://www.spinics.net/lists/netdev/msg218670.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 30 Jan 2013 22:14:10 +0000 (22:14 +0000)]
qlcnic: silence false positive overflow warning
We actually store the MAC address as well as the board_name here. The
longest board_name is 75 characters so there is more than enough room
to hold the 17 character MAC and the ": " divider. But making this
buffer larger silences a static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-By: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 30 Jan 2013 07:00:12 +0000 (07:00 +0000)]
bnx2x: Force link UP when the interface is in LOOPBACK mode
When the interface does not have carrier but when it's put into
loopback mode (for tests), it does not make sense to not have
the carrier. So force it!
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 3 Feb 2013 03:55:16 +0000 (22:55 -0500)]
Merge branch 'intel'
Jeff Kirsher says:
====================
This series contains updates to ixgbe and e1000e. The ixgbe patches are
a mix of fixes, cleanup and added functionality. The first fix is for
traffic classes, where if the mapping has changed reset the NIC. The other
ixgbe fix resolves an issue where the device lookup neglected to do a
pci_dev_put() to decrement the device reference count.
The ixgbe cleanup was done by Josh, where the auto-negotiation variables
were renamed/cleaned up and refactored.
The remaining patches are from Bruce to do additional cleanup on e1000e as
well as bump the driver version. Most notably is the cleanup to use the
kernel IEEE MII definitions where possible instead of the local MII
definitions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 16 Jan 2013 08:54:35 +0000 (08:54 +0000)]
e1000e: use generic IEEE MII definitions
For standard IEEE MII-compatible transceivers, the kernel has generic
register and bit definitions. Use those instead of redundant local
defines.
Do not replace references of MII_CR_SPEED_10 with BMCR_SPEED10 (0x0000)
when it is not necessary (i.e. when it is bitwise OR'ed with another
value).
Some whitespace issues in the surrounding context of the above changes are
also cleaned up.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Wed, 16 Jan 2013 08:46:49 +0000 (08:46 +0000)]
e1000e: resolve -Wunused-parameter compile warnings
Remove the unused parameter when possible, otherwise use __always_unused
attribute.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 12 Jan 2013 07:28:54 +0000 (07:28 +0000)]
e1000e: update driver version string
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 12 Jan 2013 07:28:24 +0000 (07:28 +0000)]
e1000e: cleanup some whitespace and indentation issues
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 12 Jan 2013 07:27:53 +0000 (07:27 +0000)]
e1000e: cleanup: group OR'ed bit settings with parens
For clarity, wrap OR'ed bit settings with parentheses.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Sat, 12 Jan 2013 07:27:23 +0000 (07:27 +0000)]
e1000e: cleanup defines.h
Remove redundant defines which are defined elsewhere.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Josh Hay [Sat, 15 Dec 2012 03:28:30 +0000 (03:28 +0000)]
ixgbe: autoneg variable refactoring
Removes the autoneg parameter from the setup_link functions.
Adds local variable autoneg to setup_link functions to be passed
to get_link_capabilities functions if needed.
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Josh Hay [Sat, 15 Dec 2012 03:28:24 +0000 (03:28 +0000)]
ixgbe: removed unused variable from setup_link_speed
Removes the autoneg parameter from the setup_link_speed functions. These
functions do nothing with this parameter.
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Josh Hay [Sat, 15 Dec 2012 03:28:19 +0000 (03:28 +0000)]
ixgbe: rename autoneg variables
Renames some autoneg/speed variables to be more consistent with check_link,
get_link_capabilities, and setup_link function calls. Initializes instances
of autoneg.
Signed-off-by: Josh Hay <joshua.a.hay@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 13 Dec 2012 01:14:06 +0000 (01:14 +0000)]
ixgbe: Fix device ref count bug
The device lookup neglected to do a pci_dev_put() to decrement the
device reference count.
Reported-by: Elena Gurevich <elena.gurevich@toganetworks.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Amir Hanania [Tue, 4 Dec 2012 03:03:03 +0000 (03:03 +0000)]
ixgbe: Reset the NIC if up2tc has changed
Check for up2tc change and call ixgbe_dcbnl_devreset() if the mapping has
changed but the number of TC's in use has not changed.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Tested-by: Jack Morgan <jack.morgan@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Paul Gortmaker [Thu, 31 Jan 2013 02:50:08 +0000 (21:50 -0500)]
wanrouter: delete now orphaned header content, files/drivers
The wanrouter support was identified earlier as unused for years,
and so the previous commit totally decoupled it from the kernel,
leaving the related wanrouter files present, but totally inert.
Here we take the final step in that cleanup, by doing a wholesale
removal of these files. The two step process is used so that the
large deletion is decoupled from the git history of files that we
still care about.
The drivers deleted here all were dependent on the Kconfig setting
CONFIG_WAN_ROUTER_DRIVERS.
A stub wanrouter.h header (kernel & uapi) are left behind so that
drivers/isdn/i4l/isdn_x25iface.c continues to compile, and so that
we don't accidentally break userspace that expected these defines.
Cc: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Paul Gortmaker [Thu, 31 Jan 2013 02:49:29 +0000 (21:49 -0500)]
wanrouter: completely decouple obsolete code from kernel.
The original suggestion to delete wanrouter started earlier
with the mainline commit
f0d1b3c2bcc5de8a17af5f2274f7fcde8292b5fc
("net/wanrouter: Deprecate and schedule for removal") in May 2012.
More importantly, Dan Carpenter found[1] that the driver had a
fundamental breakage introduced back in 2008, with commit
7be6065b39c3 ("netdevice wanrouter: Convert directly reference of
netdev->priv"). So we know with certainty that the code hasn't been
used by anyone willing to at least take the effort to send an e-mail
report of breakage for at least 4 years.
This commit does a decouple of the wanrouter subsystem, by going
after the Makefile/Kconfig and similar files, so that these mainline
files that we are keeping do not have the big wanrouter file/driver
deletion commit tied into their history.
Once this commit is in place, we then can remove the obsolete cyclomx
drivers and similar that have a dependency on CONFIG_WAN_ROUTER_DRIVERS.
[1] http://www.spinics.net/lists/netdev/msg218670.html
Originally-by: Joe Perches <joe@perches.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
David S. Miller [Thu, 31 Jan 2013 17:49:10 +0000 (12:49 -0500)]
Merge branch 'mlx4'
Merge mlx4 bug fixes from Amir Vadai.
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Wed, 30 Jan 2013 23:07:11 +0000 (23:07 +0000)]
net/mlx4_en: Fix transmit timeout when driver restarts port
Under heavy CPU load, changing, ring size/mtu/etc. could result in transmit
timeout, since stop-start port might take more than 10 seconds.
Calling netif_detach_device to prevent tx queue transmit timeout.
netif_detach_device() is not called under ndo_stop, because netif_carrier_off
will prevent the timeout, and device should not be marked as not present, or
else user won't be able to start it later on.
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matan Barak [Wed, 30 Jan 2013 23:07:10 +0000 (23:07 +0000)]
net/mlx4_en: Don't reassign port mac address on firmware that supports it
Mac reassignments should only be done when not supported by the firmware. To
accomplish that, checking firmware capability bit to know whether we should
reassign macs in the driver.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:09 +0000 (23:07 +0000)]
net/mlx4_core: Use firmware driven flow steering hash mode
The Firmware dynamically changes flow steering hash configuration from covering
L2 only to "full" L2/L3/L4 mode needed. The dynamic change allows the driver
to set hard coded hash configuration which is changed by the firmware from L2
to L2/L3/L4 when attaching the first L3/L4 flow steering rule and back to L2
when there are no more such rules.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:08 +0000 (23:07 +0000)]
net/mlx4_en: Fix ethtool rules leftovers after module unloaded
As part of the driver unload flow, all steering rules must be deleted,
make sure to remove the rules that were set through ethtool.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:07 +0000 (23:07 +0000)]
net/mlx4_en: Block insertion of ethtool steering rules while the interface is down
Attaching steering rules while the interface is down is an invalid operation, block it.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:06 +0000 (23:07 +0000)]
net/mlx4_en: Fix vlan mask for ethtool steering rules
The vlan mask field should be validated and assigned according to the field
size which is 12 bits. Also replace the numeric 0xfff mask with existing kernel
macro.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:05 +0000 (23:07 +0000)]
net/mlx4_en: Validate VLAN IDs provided in ethtool flow steering rules
When attaching flow steering rules via Ethtool accept only valid vlans IDs e.g
in the range: [0,4095].
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:04 +0000 (23:07 +0000)]
net/mlx4_en: Fix ip/udp steering rules multicast mac when attached via ethtool
Destination mac is a mandatory specification for ip/udp steering rules.
When attaching multicast steering rules via ethtool the unicast mac of the
interface was added to the rule specification instead of the multicast mac.
The following commit sets the corresponding multicast mac for the rule multicast ip.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:03 +0000 (23:07 +0000)]
net/mlx4_core: Set correctly allow_loopback flag
The allow_loopback flag was wrongly set using arithmetic bit operation, change
the code to use logical bit operation.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Wed, 30 Jan 2013 23:07:02 +0000 (23:07 +0000)]
net/mlx4_core: Directly expose fields of HW flow steering rule control segment
Some of the fields for struct mlx4_net_trans_rule_hw_ctrl were packed into u32
and accessed through bit field operations. Expose and access them directly as
u8.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Burman [Tue, 29 Jan 2013 23:43:07 +0000 (23:43 +0000)]
net/vxlan: Add ethtool drvinfo
Implement ethtool get_drvinfo.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Wed, 30 Jan 2013 09:27:58 +0000 (09:27 +0000)]
ipv6 anycast: Convert ipv6_sk_ac_lock to spinlock.
Since all users are write-lock, it does not make sense to use
rwlock here. Use simple spinlock.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Wed, 30 Jan 2013 09:27:52 +0000 (09:27 +0000)]
ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Wed, 30 Jan 2013 09:27:47 +0000 (09:27 +0000)]
ipv6 flowlabel: Convert hash list to RCU.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Wed, 30 Jan 2013 09:26:42 +0000 (09:26 +0000)]
ipv6 flowlabel: Ensure to take lock when modifying np->ip6_sk_fl_list.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 31 Jan 2013 01:51:44 +0000 (17:51 -0800)]
x86: bpf_jit_comp: add pkt_type support
Supporting access to skb->pkt_type is a bit tricky if we want
to have a generic code, allowing pkt_type to be moved in struct sk_buff
pkt_type is a bit field, so compiler cannot really help us to find
its offset. Let's use a helper for this : It will throw a one time
message if pkt_type no longer starts at a byte boundary or is
no longer a 3bit field.
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jitendra Kalsaria [Wed, 30 Jan 2013 12:47:19 +0000 (12:47 +0000)]
qlcnic: Bump up the version to 5.1.33
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 30 Jan 2013 12:47:18 +0000 (12:47 +0000)]
qlcnic: make pci_error_handlers const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish chopra [Wed, 30 Jan 2013 12:47:17 +0000 (12:47 +0000)]
qlcnic: Fix RX/TX checksum setting for some adapter types
Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 30 Jan 2013 12:47:16 +0000 (12:47 +0000)]
qlcnic: Fix minidump in NPAR mode
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish chopra [Wed, 30 Jan 2013 12:47:15 +0000 (12:47 +0000)]
qlcnic: driver LRO bug fix
o ipv4 address was not getting programmed properly because of
improper byte order conversion
Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish chopra [Wed, 30 Jan 2013 12:47:14 +0000 (12:47 +0000)]
qlcnic: Free irq for mailbox interrupts
Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Manish chopra [Wed, 30 Jan 2013 12:47:13 +0000 (12:47 +0000)]
qlcnic: Fix bug in reading HW reset template
Signed-off-by: Manish chopra <manish.chopra@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 30 Jan 2013 12:47:12 +0000 (12:47 +0000)]
qlcnic: Fix sparse check endian warnings
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 30 Jan 2013 10:08:11 +0000 (11:08 +0100)]
bond: have random dev address by default instead of zeroes
Makes more sense to have randomly generated address by default than to
have all zeroes. It also allows user to for example put the bond into
bridge without need to have any slaves in it.
Also note that this changes only behaviour of bonds with no slaves. Once
the first slave device is enslaved, its address will be used (no change
here).
Also, fix dev_assign_type values on the way.
Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 29 Jan 2013 15:14:16 +0000 (15:14 +0000)]
net: disallow drivers with buggy VLAN accel to register_netdevice()
Instead of jumping aroung bugs that are easily fixed just don't let them in:
affected drivers should be either fixed or have NETIF_F_HW_VLAN_FILTER
removed from advertised features.
Quick grep in drivers/net shows two drivers that have NETIF_F_HW_VLAN_FILTER
but not ndo_vlan_rx_add/kill_vid(), but those are false-positives (features
are commented out).
OTOH two drivers have ndo_vlan_rx_add/kill_vid() implemented but don't
advertise NETIF_F_HW_VLAN_FILTER. Those are:
+ethernet/cisco/enic/enic_main.c
+ethernet/qlogic/qlcnic/qlcnic_main.c
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Tue, 29 Jan 2013 12:49:03 +0000 (12:49 +0000)]
netfilter ipset: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Tue, 29 Jan 2013 12:48:58 +0000 (12:48 +0000)]
netfilter ip6table_mangle: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Tue, 29 Jan 2013 12:48:50 +0000 (12:48 +0000)]
xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal().
All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Tue, 29 Jan 2013 12:48:31 +0000 (12:48 +0000)]
xfrm: Use ipv6_addr_equal() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Tue, 29 Jan 2013 12:48:23 +0000 (12:48 +0000)]
ipv6 mcast: Use ipv6_addr_equal() in ip6_mc_source().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jan 2013 20:59:45 +0000 (15:59 -0500)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- fix recently introduced output behaviour
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jan 2013 20:32:13 +0000 (15:32 -0500)]
Merge git://git./linux/kernel/git/davem/net
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Tue, 29 Jan 2013 08:24:25 +0000 (08:24 +0000)]
ipv6: add anti-spoofing checks for 6to4 and 6rd
This patch adds anti-spoofing checks in sit.c as specified in RFC3964
section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the
checks which could easily be implemented with netfilter.
Specifically this patch adds following logic (based loosely on the
pseudocode in RFC3964 section 5.2):
if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default)
and outer_src_v4 != embedded_ipv4 (inner_src_v6)
drop
if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default)
and outer_dst_v4 != embedded_ipv4 (inner_dst_v6)
drop
accept
To accomplish the specified security checks proposed by above RFCs,
it is still necessary to employ uRPF filters with netfilter. These new
checks only kick in if the employed addresses are within the 2002::/16 or
another range specified by the 6rd-prefix (which defaults to 2002::/16).
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Tue, 29 Jan 2013 03:55:12 +0000 (03:55 +0000)]
gianfar: Pack struct gfar_priv_grp into three cachelines
* remove unused members(!): imask, ievent
* move space consuming interrupt name strings (int_name_* members) to
external structures, unessential for the driver's hot path
* keep high priority hot path data within the first 2 cache lines
This reduces struct gfar_priv_grp from 6 to 3 cache lines.
(Also fixed checkpatch warnings for the old code, in the process.)
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Tue, 29 Jan 2013 03:55:11 +0000 (03:55 +0000)]
gianfar: Cleanup gfar_parse_group() code
Factor out redundant code (improve readability, source code size).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Tue, 29 Jan 2013 03:55:10 +0000 (03:55 +0000)]
gianfar: Optimize struct gfar_priv_tx_q for two cache lines
Resize and regroup structure members to eliminate memory holes and
to pack the structure into 2 cache lines (from 3).
tx_ring_size was resized from 4 to 2 bytes and few members were re-grouped
in order to eliminate byte holes and achieve compactness.
Where possible, few members were grouped according to their usage and access
order (i.e. start_xmit vs. clean_tx_ring members), less important members
were pushed at the end.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Tue, 29 Jan 2013 02:16:18 +0000 (02:16 +0000)]
ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled
When attempting to build linux-next with user namespaces enabled I ran
into this fun build error.
CC net/ipv6/inet6_connection_sock.o
.../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
.../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
type ‘kuid_t’
.../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
.../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
make[2]: *** [net/ipv6] Error 2
make[2]: *** Waiting for unfinished jobs....
Using kuid_t instead of int to hold the uid fixes this.
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 28 Jan 2013 19:55:53 +0000 (19:55 +0000)]
pktgen: support net namespace
v3: make pktgen_threads list per-namespace
v2: remove a useless check
This patch add net namespace to pktgen, so that
we can use pktgen in different namespaces.
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Li [Mon, 28 Jan 2013 18:31:42 +0000 (18:31 +0000)]
net: fec: add napi support to improve proformance
Add napi support
Before this patch
iperf -s -i 1
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.192.242.153 port 5001 connected with 10.192.242.138 port 50004
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 41.2 MBytes 345 Mbits/sec
[ 4] 1.0- 2.0 sec 43.7 MBytes 367 Mbits/sec
[ 4] 2.0- 3.0 sec 42.8 MBytes 359 Mbits/sec
[ 4] 3.0- 4.0 sec 43.7 MBytes 367 Mbits/sec
[ 4] 4.0- 5.0 sec 42.7 MBytes 359 Mbits/sec
[ 4] 5.0- 6.0 sec 43.8 MBytes 367 Mbits/sec
[ 4] 6.0- 7.0 sec 43.0 MBytes 361 Mbits/sec
After this patch
[ 4] 2.0- 3.0 sec 51.6 MBytes 433 Mbits/sec
[ 4] 3.0- 4.0 sec 51.8 MBytes 435 Mbits/sec
[ 4] 4.0- 5.0 sec 52.2 MBytes 438 Mbits/sec
[ 4] 5.0- 6.0 sec 52.1 MBytes 437 Mbits/sec
[ 4] 6.0- 7.0 sec 52.1 MBytes 437 Mbits/sec
[ 4] 7.0- 8.0 sec 52.3 MBytes 439 Mbits/sec
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Barry Grussling [Sun, 27 Jan 2013 18:44:36 +0000 (18:44 +0000)]
ethoc: Cleanup driver format
Cleanup the format of ethoc.c to meet network driver style as
per checkpatch.pl.
Signed-off-by: Barry Grussling <barry@grussling.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ward [Sun, 27 Jan 2013 13:04:58 +0000 (13:04 +0000)]
ip_gre: When TOS is inherited, use configured TOS value for non-IP packets
A GRE tunnel can be configured so that outgoing tunnel packets inherit
the value of the TOS field from the inner IP header. In doing so, when
a non-IP packet is transmitted through the tunnel, the TOS field will
always be set to 0.
Instead, the user should be able to configure a different TOS value as
the fallback to use for non-IP packets. This is helpful when the non-IP
packets are all control packets and should be handled by routers outside
the tunnel as having Internet Control precedence. One example of this is
the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are
encapsulated directly by GRE and do not contain an inner IP header.
Under the existing behavior, the IFLA_GRE_TOS parameter must be set to
'1' for the TOS value to be inherited. Now, only the least significant
bit of this parameter must be set to '1', and when a non-IP packet is
sent through the tunnel, the upper 6 bits of this same parameter will be
copied into the TOS field. (The ECN bits get masked off as before.)
This behavior is backwards-compatible with existing configurations and
iproute2 versions.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 24 Jan 2013 09:41:41 +0000 (09:41 +0000)]
ipv4: introduce address lifetime
There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.
This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.
Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.
Also, this should be back-compatible with all current netlink users.
Reported-by: Pavel Šimerda <psimerda@redhat.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jan 2013 18:37:29 +0000 (13:37 -0500)]
Merge branch 'ipfrags'
Jesper Dangaard Brouer says:
====================
This patchset is V2, with some trivial code fixes, which were noticed
by DaveM. It is still a partly respin of my fragmentation optimization
patches: http://thread.gmane.org/gmane.linux.network/250914
This is not the complete patchset, from the gmane link above. In this
patchset, I primarily focus on adjusting cacheline for better SMP/NUMA
performance.
Once this patchset have been agreed upon, I will continue and respin
the rest of my patches.
This time around, I have created a frag DoS generator, via the tool
trafgen (http://netsniff-ng.org/). To create a stable DoS scenario
(no longer relying on frame dropping due to disabled flow-control).
Two 10G interfaces are under-test, and uses Ethernet flow-control. A
third interface is used for generating the DoS attack (this interface
is also 10G, but it does not need to be, as 500Kpps DoS is enough).
Test types summary (netperf):
Test-20G64K == 2x10G with 65K fragments
Test-20G3F == 2x10G with 3x fragments (3*1472 bytes)
Test-20G64K+DoS == Same as 20G64K with frag DoS
Test-20G3F+DoS == Same as 20G3F with frag DoS
Patch list:
Patch-01 - net: cacheline adjust struct netns_frags for better frag performance
Patch-02 - net: cacheline adjust struct inet_frags for better frag performance
Patch-03 - net: cacheline adjust struct inet_frag_queue
Patch-04 - net: frag helper functions for mem limit tracking
Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting
Patch-06 - net: frag, move LRU list maintenance outside of rwlock
Performance table summary:
Test-type: Test-20G64K Test-20G3F 20G64K+DoS 20G3F+DoS
---------- ----------- ---------- ---------- ---------
net-next: 15114.5 Mbit/s 8954.21 2444.28 3918.01 Mbit/s
Patch-01: 16075.8 Mbit/s 8976.18 2621.49 4072.79 Mbit/s
Patch-02: 17806.9 Mbit/s 9280.32 2478.62 4274.59 Mbit/s
Patch-03: 17317.4 Mbit/s 9308.62 2546.05 4336.59 Mbit/s
Patch-04: 17635.9 Mbit/s 9256.16 2535.25 4327.63 Mbit/s
Patch-05: 18027.0 Mbit/s 9918.99 2492.62 3621.68 Mbit/s
Patch-06: 18486.7 Mbit/s 10723.20 3657.85 4560.64 Mbit/s
I cannot explain the under-DoS regression that patch-05/percpu_counter
introduces. But patch-06/LRU-lock corrects the situation again.
Below is a testlab setup description, with links to the trafgen DoS
packet config used.
Testlab
=======
Server setup
------------
The machine acting as a server:
- 2x CPU (E5-2630)
- Thus a NUMA arch/machine
- 4x 10Gbit/s ports
- NICs 2x Intel Dual port 82599 based (driver ixgbe)
Setup:
- Interfaces uses Ethernet flow control
- Flush all iptables
- Remove all iptables related module.
- Kill irqbalance
- Pin each 10G NIC port to a *single* CPU each
Pinning can easily be done by command hacks::
for x in /proc/irq/*/eth8*/../smp_affinity_list ; do echo 1 > $x; done
for x in /proc/irq/*/eth9*/../smp_affinity_list ; do echo 3 > $x; done
for x in /proc/irq/*/eth31*/../smp_affinity_list; do echo 6 > $x; done
for x in /proc/irq/*/eth32*/../smp_affinity_list; do echo 8 > $x; done
Notice NUMA setting: The CPU to NIC tying is carefully choosen
according to the NUMA node setup. Thus, NICs connected to a PCI-e
slot that is connected to a physical CPU socket are tied together.
Choosing only a single CPU per NIC (port) is just to ease provoking
and debugging this performance issue. (In real setups, you can choose
more CPU, just remember the NUMA node in the equation).
Tools
-----
Netperf is used, with option -T to ensure CPU binding.
The netserver processes, are NAPI pinned::
numactl -m0 -c0 netserver
numactl -m1 -c 1 netserver -p 1337
I now have a frag DoS generator, created via the tool:
trafgen (see: http://netsniff-ng.org/)
Trafgen packet config file:
http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf
Notice, I'm using features of trafgen, recently developed by Daniel
Borkmann, thus you need the latest git tree to use my trafgen packet
config.
git://github.com/borkmann/netsniff-ng.git
Command line:
trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2
Tests types
-----------
Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
netperf -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
wait $! && tail -n3 ${LOG}.* && \
tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
export SIZE=$((3*1472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
netperf -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
wait $! && tail -n3 ${LOG}.* && \
tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
Awk script for summming results:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:45:51 +0000 (23:45 +0000)]
net: frag, move LRU list maintenance outside of rwlock
Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock. However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:45:33 +0000 (23:45 +0000)]
net: use lib/percpu_counter API for fragmentation mem accounting
Replace the per network namespace shared atomic "mem" accounting
variable, in the fragmentation code, with a lib/percpu_counter.
Getting percpu_counter to scale to the fragmentation code usage
requires some tweaks.
At first view, percpu_counter looks superfast, but it does not
scale on multi-CPU/NUMA machines, because the default batch size
is too small, for frag code usage. Thus, I have adjusted the
batch size by using __percpu_counter_add() directly, instead of
percpu_counter_sub() and percpu_counter_add().
The batch size is increased to 130.000, based on the largest 64K
fragment memory usage. This does introduce some imprecise
memory accounting, but its does not need to be strict for this
use-case.
It is also essential, that the percpu_counter, does not
share cacheline with other writers, to make this scale.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:45:12 +0000 (23:45 +0000)]
net: frag helper functions for mem limit tracking
This change is primarily a preparation to ease the extension of memory
limit tracking.
The change does reduce the number atomic operation, during freeing of
a frag queue. This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:44:49 +0000 (23:44 +0000)]
net: cacheline adjust struct inet_frag_queue
Fragmentation code cacheline adjusting of struct inet_frag_queue.
Take advantage of the size of struct timer_list, and move all but
spinlock_t lock, below the timer struct. On 64-bit 'lru_list',
'list' and 'refcnt', fits exactly into the next cacheline, and a
new cacheline starts at 'fragments'.
The netns_frags *net pointer is moved to the end of the struct,
because its used in a compare, with "next/close-by" elements of
which this struct is embedded into.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:44:37 +0000 (23:44 +0000)]
net: cacheline adjust struct inet_frags for better frag performance
The globally shared rwlock, of struct inet_frags, shares
cacheline with the 'rnd' number, which is used by the hash
calculations. Fix this, as this obviously is a bad idea, as
unnecessary cache-misses will occur when accessing the 'rnd'
number.
Also small note that, moving function ptr (*match) up in struct,
is to avoid it lands on the next cacheline (on 64-bit).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Mon, 28 Jan 2013 23:44:14 +0000 (23:44 +0000)]
net: cacheline adjust struct netns_frags for better frag performance
This small cacheline adjustment of struct netns_frags improves
performance significantly for the fragmentation code.
Struct members 'lru_list' and 'mem' are both hot elements, and it
hurts performance, due to cacheline bouncing at every call point,
when they share a cacheline. Also notice, how mem is placed
together with 'high_thresh' and 'low_thresh', as they are used in
the compare operations together.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felipe Balbi [Tue, 29 Jan 2013 07:16:30 +0000 (09:16 +0200)]
net: ks8851: convert to threaded IRQ
just as it should have been. It also helps
removing the, now unnecessary, workqueue.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki / 吉藤英明 [Thu, 24 Jan 2013 00:44:23 +0000 (00:44 +0000)]
net neigh: Optimize neighbor entry size calculation.
When allocating memory for neighbour cache entry, if
tbl->entry_size is not set, we always calculate
sizeof(struct neighbour) + tbl->key_len, which is common
in the same table.
With this change, set tbl->entry_size during the table
initialization phase, if it was not set, and use it in
neigh_alloc() and neighbour_priv().
This change also allow us to have both of protocol private
data and device priate data at tha same time.
Note that the only user of prototcol private is DECnet
and the only user of device private is ATM CLIP.
Since those are exclusive, we have not been facing issues
here.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bingtian.ly@taobao.com [Wed, 23 Jan 2013 20:35:28 +0000 (20:35 +0000)]
net: avoid to hang up on sending due to sysctl configuration overflow.
I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.
This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:
net.core.wmem_default
net.core.rmem_default
net.core.rmem_max
net.core.wmem_max
net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min
net.ipv4.tcp_wmem
net.ipv4.tcp_rmem
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Li Yu <raise.sail@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 29 Jan 2013 00:23:07 +0000 (16:23 -0800)]
Merge branch 'merge' of git://git./linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
"Whenever you have a chance between two dives, you might want to
consider pulling my merge branch to pickup a few fixes for 3.8 that
have been accumulating for the last couple of weeks (I was myself
travelling then on vacation).
Nothing major, just a handful of powerpc bug fixes that I consider
worth getting in before 3.8 goes final."
And I'll have everybody know that I'm not diving for several days yet.
Snif.
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Max next_tb to prevent from replaying timer interrupt
powerpc: kernel/kgdb.c: Fix memory leakage
powerpc/book3e: Disable interrupt after preempt_schedule_irq
powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function
powerpc/pasemi: Fix crash on reboot
powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32
Jamie Gloudon [Wed, 23 Jan 2013 18:05:04 +0000 (18:05 +0000)]
via-rhine: add 64bit statistics.
Switch to use ndo_get_stats64 to get 64bit statistics.
Signed-off-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Tested-by: Jamie Gloudon <jamie.gloudon@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David J. Choi [Wed, 23 Jan 2013 14:05:15 +0000 (14:05 +0000)]
drivers/net/phy/micrel_phy: Add support for new PHYs
Summary of changes:
.Newly added phys
-KSZ8081/KSZ8091, which has some phy ids.
-KSZ8061
-KSZ9031, which is Gigabit phy.
-KSZ886X, which has a switch function.
-KSZ8031, which has a same phy ids with KSZ8021.
Signed-off-by: David J. Choi <david.choi@micrel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe CAVALLARO [Wed, 23 Jan 2013 00:30:03 +0000 (00:30 +0000)]
net: phy: realtek: add rtl8211e driver
This patch adds the minimal driver to manage the
Realtek RTL8211E 10/100/1000 Transceivers.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 27 Jan 2013 15:55:21 +0000 (15:55 +0000)]
netpoll: use the net namespace of current process instead of init_net
This will allow us to setup netconsole in a different namespace
rather than where init_net is.
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sun, 27 Jan 2013 15:55:20 +0000 (15:55 +0000)]
netpoll: use ipv6_addr_equal() to compare ipv6 addr
ipv6_addr_equal() is faster.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 22 Jan 2013 21:29:39 +0000 (21:29 +0000)]
netpoll: add RCU annotation to npinfo field
dev->npinfo is protected by RCU.
This fixes the following sparse warnings:
net/core/netpoll.c:177:48: error: incompatible types in comparison expression (different address spaces)
net/core/netpoll.c:200:35: error: incompatible types in comparison expression (different address spaces)
net/core/netpoll.c:221:35: error: incompatible types in comparison expression (different address spaces)
net/core/netpoll.c:327:18: error: incompatible types in comparison expression (different address spaces)
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2013 23:21:38 +0000 (18:21 -0500)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
Included is an NFC pull. Samuel says:
"It brings the following goodies:
- LLCP socket timestamping (To be used e.g with the recently released nfctool
application for a more efficient skb timestamping when sniffing).
- A pretty big pn533 rework from Waldemar, preparing the driver to support
more flavours of pn533 based devices.
- HCI changes from Eric in preparation for the microread driver support.
- Some LLCP memory leak fixes, cleanups and slight improvements.
- pn544 and nfcwilink move to the devm_kzalloc API.
- An initial Secure Element (SE) API.
- An nfc.h license change from the original author, allowing non GPL
application code to safely include it."
Also included are a pair of mac80211 pulls. Johannes says:
"We found two bugs in the previous code, so I'm sending you a pull
request again this soon.
This contains two regulatory bug fixes, some of Thomas's hwsim beacon
timer work and a documentation fix from Bob."
"Another pull request for mac80211-next. This time, I have a number of
things, the patches are mostly self-explanatory. There are a few fixes
from Felix and myself, and random cleanups & improvements. The biggest
thing is the partial patchset from Marco preparing for mesh powersave."
Additionally, there are a pair of iwlwifi pulls. Johannes says:
"For iwlwifi-next, I have a few cleanups/improvements as well as a few
not very important fixes and more preparations for new devices."
"Please pull a few updates for iwlwifi. These are just some cleanups and
a debug improvement."
On top of that, there is a slew of driver updates. This includes
brcmfmac, mwifiex, ath9k, carl9170, and mwl8k as well as a handful
of others. The bcma and ssb busses get some attention as well.
Still, I don't see any big headliners here.
Also included is a pull of the wireless tree, in order to resolve
some merge conflicts.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Jan 2013 23:18:17 +0000 (18:18 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to e1000e, ixgbevf, igb and igbvf.
Majority of the patches are code cleanups of e1000e where code
is removed (Yeah!). The other two e1000e patches are fixes. The
first is to fix the maximum frame size for 82579 devices. The second
fix is to resolve an issue with devices other than 82579 that suffer
from dropped transactions on platforms with deep C-states when
jumbo frames are enabled.
The ixgbevf patch is to ensure that the driver fetches the correct,
refreshed value for link status and speed when the values have changed.
The igb and igbvf patches are a solution to an issue Stefan Assmann
reported, where when the PF is up and igbvf is loaded, the MAC address
is not generated using eth_hw_addr_random().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tiejun Chen [Tue, 15 Jan 2013 17:01:19 +0000 (17:01 +0000)]
powerpc: Max next_tb to prevent from replaying timer interrupt
With lazy interrupt, we always call __check_irq_replaysome with
decrementers_next_tb to check if we need to replay timer interrupt.
So in hotplug case we also need to set decrementers_next_tb as MAX
to make sure __check_irq_replay don't replay timer interrupt
when return as we expect, otherwise we'll trap here infinitely.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cong Ding [Mon, 14 Jan 2013 07:26:32 +0000 (07:26 +0000)]
powerpc: kernel/kgdb.c: Fix memory leakage
the variable backup_current_thread_info isn't freed before existing the
function.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tiejun Chen [Sun, 6 Jan 2013 00:49:34 +0000 (00:49 +0000)]
powerpc/book3e: Disable interrupt after preempt_schedule_irq
In preempt case current arch_local_irq_restore() from
preempt_schedule_irq() may enable hard interrupt but we really
should disable interrupts when we return from the interrupt,
and so that we don't get interrupted after loading SRR0/1.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Carl E. Love [Thu, 29 Nov 2012 06:42:03 +0000 (06:42 +0000)]
powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function
The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
error. The calculation is should be to shift left by (max_cntrs - cntr) times
the width of the pmsel field width. However, the #define OPROFILE_MAX_PMC_NUM
was used instead of OPROFILE_PMSEL_FIELD_WIDTH. This patch fixes the
calculation.
Signed-off-by: Carl Love <cel@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Steven Rostedt [Mon, 21 Jan 2013 17:23:26 +0000 (17:23 +0000)]
powerpc/pasemi: Fix crash on reboot
commit
f96972f2dc "kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()"
added a call to disable_nonboot_cpus() on kernel_restart(), which tries
to shutdown all the CPUs except the first one. The issue with the PA
Semi, is that it does not support CPU hotplug.
When the call is made to __cpu_down(), it calls the notifiers
CPU_DOWN_PREPARE, and then tries to take the CPU down.
One of the notifiers to the CPU hotplug code, is the cpufreq. The
DOWN_PREPARE will call __cpufreq_remove_dev() which calls
cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
that is used by an interrupt that goes off constantly
(system_reset_common, but it goes off during normal system operations
too). I'm not sure exactly what this interrupt does.
Running a simple function trace, you can see it goes off quite a bit:
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [001] 1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [001] 1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
<idle>-0 [000] 1558.861437: .pasemi_system_reset_exception <-.system_reset_exception
When the region is unmapped, the system crashes with:
Disabling non-boot CPUs ...
Error taking CPU1 down: -38
Unable to handle kernel paging request for data at address 0xd0000800903a0100
Faulting instruction address: 0xc000000000055fcc
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in: shpchp
NIP:
c000000000055fcc LR:
c000000000055fb4 CTR:
c0000000000df1fc
REGS:
c0000000012175d0 TRAP: 0300 Not tainted (3.8.0-rc4-test-dirty)
MSR:
9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:
24000088 XER:
00000000
SOFTE: 0
DAR:
d0000800903a0100, DSISR:
42000000
TASK =
c0000000010e9008[0] 'swapper/0' THREAD:
c000000001214000 CPU: 0
GPR00:
d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
GPR04:
0000000000000000 0000000000000724 0000000000000724 0000000000000000
GPR08:
0000000000000000 0000000000000000 0000000000000001 0000000000a70000
GPR12:
0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
GPR16:
ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
GPR20:
00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
GPR24:
0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
GPR28:
c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
NIP [
c000000000055fcc] .set_astate+0x5c/0xa4
LR [
c000000000055fb4] .set_astate+0x44/0xa4
Call Trace:
[
c000000001217850] [
c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
[
c0000000012178f0] [
c00000000005647c] .restore_astate+0x2c/0x34
[
c000000001217980] [
c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
[
c000000001217a00] [
c000000000019ef0] .system_reset_exception+0x48/0x84
[
c000000001217a80] [
c000000000001e40] system_reset_common+0x140/0x180
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Oliver Hartkopp [Mon, 28 Jan 2013 08:33:33 +0000 (08:33 +0000)]
can: rework skb reserved data handling
Added accessor and skb_reserve helpers for struct can_skb_priv.
Removed pointless skb_headroom() check.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
CC: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 28 Jan 2013 23:15:34 +0000 (15:15 -0800)]
Merge tag 'md-3.8-fixes' of git://neil.brown.name/md
Pull dmraid fix from NeilBrown:
"Just one fix for md in 3.8
dmraid assess redundancy and replacements slightly inaccurately which
could lead to some degraded arrays failing to assemble."
* tag 'md-3.8-fixes' of git://neil.brown.name/md:
DM-RAID: Fix RAID10's check for sufficient redundancy
Li Zhong [Sun, 2 Dec 2012 20:19:22 +0000 (20:19 +0000)]
powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32
This patch fixes MAX_STACK_TRACE_ENTRIES too low warning for ppc32,
which is similar to commit
12660b17.
Reported-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Mon, 28 Jan 2013 19:53:49 +0000 (11:53 -0800)]
Merge git://git./linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fix from Steven Whitehouse.
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: fix skip unlock condition