Eric Dumazet [Mon, 4 Oct 2010 06:15:44 +0000 (06:15 +0000)]
net neigh: RCU conversion of neigh hash table
David
This is the first step for RCU conversion of neigh code.
Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.
Thanks
[PATCH net-next] net neigh: RCU conversion of neigh hash table
Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :
struct neigh_hash_table {
struct neighbour **hash_buckets;
unsigned int hash_mask;
__u32 hash_rnd;
struct rcu_head rcu;
};
And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.
This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 4 Oct 2010 04:27:36 +0000 (04:27 +0000)]
net neigh: neigh_delete() and neigh_add() changes
neigh_delete() and neigh_add() dont need to touch device refcount,
we hold RTNL when calling them, so device cannot disappear under us.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 21:06:55 +0000 (21:06 +0000)]
net: add a core netdev->rx_dropped counter
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)
We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
This is a generalization of commit
8990f468a (net: rx_dropped
accounting), thus reverting it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Oct 2010 08:36:52 +0000 (01:36 -0700)]
ppp: Use a real SKB control block in fragmentation engine.
Do this instead of subverting fields in skb proper.
The macros that could very easily match variable or function
names were also just asking for trouble.
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 4 Oct 2010 20:17:53 +0000 (20:17 +0000)]
ipv6: make __ipv6_isatap_ifid static
Another exported symbol only used in one file
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 4 Oct 2010 20:14:17 +0000 (20:14 +0000)]
fib: fib_rules_cleanup can be static
fib_rules_cleanup_ups is only defined and used in one place.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 4 Oct 2010 20:00:18 +0000 (20:00 +0000)]
fib: cleanups
Code style cleanups before upcoming functional changes.
C99 initializer for fib_props array.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 4 Oct 2010 19:59:59 +0000 (19:59 +0000)]
wimax: make functions local
Make wimax variables and functions local if possible.
Compile tested only.
This also removes a couple of unused EXPORT_SYMBOL.
If this breaks some out of tree code, please fix that
by putting the code in the kernel tree.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Mon, 4 Oct 2010 15:44:30 +0000 (15:44 +0000)]
qlcnic: remove dead code
This driver has several pieces of dead code (found by running
make namespacecheck). This patch removes them.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Kaiser [Mon, 4 Oct 2010 04:35:39 +0000 (04:35 +0000)]
caif: remove duplicated include
Remove duplicated include.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Sun, 3 Oct 2010 23:43:33 +0000 (23:43 +0000)]
don't let BCM63XX_PHY depend on non-existant symbol
The kernel doesn't have a symbol called BCM63XX. There is a symbol
BCM63XX_ENET (introduced in
9b1fc55a0500, 6 weeks after
09bb9aa0ed that
introduced BCM63XX_PHY), but the driver compiles without that, too.
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Sun, 3 Oct 2010 23:43:32 +0000 (23:43 +0000)]
net/phy: fix many "defined but unused" warnings
MODULE_DEVICE_TABLE only expands to something if it's compiled
for a module. So when building-in support for the phys, the
mdio_device_id tables are unused. Marking them with __maybe_unused
fixes the following warnings:
drivers/net/phy/bcm63xx.c:134: warning: 'bcm63xx_tbl' defined but not used
drivers/net/phy/broadcom.c:933: warning: 'broadcom_tbl' defined but not used
drivers/net/phy/cicada.c:162: warning: 'cicada_tbl' defined but not used
drivers/net/phy/davicom.c:222: warning: 'davicom_tbl' defined but not used
drivers/net/phy/et1011c.c:114: warning: 'et1011c_tbl' defined but not used
drivers/net/phy/icplus.c:137: warning: 'icplus_tbl' defined but not used
drivers/net/phy/lxt.c:226: warning: 'lxt_tbl' defined but not used
drivers/net/phy/marvell.c:724: warning: 'marvell_tbl' defined but not used
drivers/net/phy/micrel.c:234: warning: 'micrel_tbl' defined but not used
drivers/net/phy/national.c:154: warning: 'ns_tbl' defined but not used
drivers/net/phy/qsemi.c:141: warning: 'qs6612_tbl' defined but not used
drivers/net/phy/realtek.c:82: warning: 'realtek_tbl' defined but not used
drivers/net/phy/smsc.c:257: warning: 'smsc_tbl' defined but not used
drivers/net/phy/ste10Xp.c:135: warning: 'ste10Xp_tbl' defined but not used
drivers/net/phy/vitesse.c:195: warning: 'vitesse_tbl' defined but not used
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Oct 2010 07:29:48 +0000 (00:29 -0700)]
net: relax rtnl_dereference()
rtnl_dereference() is used in contexts where RTNL is held, to fetch an
RCU protected pointer.
Updates to this pointer are prevented by RTNL, so we dont need
smp_read_barrier_depends() and the ACCESS_ONCE() provided in
rcu_dereference_check().
rtnl_dereference() is mainly a macro to document the locking invariant.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 5 Oct 2010 07:27:05 +0000 (00:27 -0700)]
ipvs: Use frag walker helper in SCTP proto support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Simon Horman <horms@verge.net.au>
Eric Dumazet [Sat, 2 Oct 2010 06:11:55 +0000 (06:11 +0000)]
net: dynamic ingress_queue allocation
ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)
dev_ingress_queue(dev) helper should be used only with RTNL taken.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sritej Velaga [Mon, 4 Oct 2010 04:20:16 +0000 (04:20 +0000)]
qlcnic: set mtu lower limit
Setting mtu < 68 is not supported.
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sritej Velaga [Mon, 4 Oct 2010 04:20:15 +0000 (04:20 +0000)]
qlcnic: cleanup port mode setting
Port mode setting is not required for Qlogic CNA adapters.
Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:14 +0000 (04:20 +0000)]
qlcnic: sparse warning fixes
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:13 +0000 (04:20 +0000)]
qlcnic: fix vlan TSO on big endian machine
o desc->vlan_tci is in __le16 format. Doing htons and
cpu_to_le64 again on vlan_tci, result in invalid value on ppc.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:12 +0000 (04:20 +0000)]
qlcnic: fix endianess for lro
ipaddress in ifa->ifa_address field are in big endian format.
Also device requires ip address in big endian only.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:11 +0000 (04:20 +0000)]
qlcnic: fix diag register
regs_buff[i] and diag_registers[j] array should use different index
variable.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:10 +0000 (04:20 +0000)]
qlcnic: fix eswitch stats
Some of the counters are not implemented in fw.
Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff).
Adding these counters, result in invalid value.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:09 +0000 (04:20 +0000)]
qlcnic: fix internal loopback test
o Loop 10 times with delay of 1 ms to rcv packet.
o Print garbage packet.
o Try send/receive MAX(16) packet, instead of exit from test,
if a packet is not received.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Oct 2010 18:56:38 +0000 (11:56 -0700)]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6
Conflicts:
net/ipv4/Kconfig
net/ipv4/tcp_timer.c
Eric Dumazet [Mon, 4 Oct 2010 05:17:54 +0000 (22:17 -0700)]
net: introduce DST_NOCACHE flag
While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.
When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())
But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.
Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.
With the stress test bench, sending
160000000 frames on one neighbour,
results are :
Before patch:
real 2m28.406s
user 0m11.781s
sys 36m17.964s
After patch:
real 1m26.532s
user 0m12.185s
sys 20m3.903s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Oct 2010 05:14:37 +0000 (22:14 -0700)]
sctp: Fix break indentation in sctp_ioctl().
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 4 Oct 2010 05:12:27 +0000 (22:12 -0700)]
be2net: add multiple RX queue support
This patch adds multiple RX queue support to be2net. There are
upto 4 extra rx-queues per port into which TCP/UDP traffic can be hashed into.
Some of the ethtool stats are now displayed on a per queue basis.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 4 Oct 2010 05:09:32 +0000 (22:09 -0700)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next-2.6
Ursula Braun [Fri, 1 Oct 2010 02:51:13 +0000 (02:51 +0000)]
qeth: tagging with VLAN-ID 0
This patch adapts qeth to handle tagged frames with VLAN-ID 0 and
with or without priority information in the tag. It enables qeth to
receive priority-tagged frames on a base interface, for example from
z/OS, without configuring an additional VLAN interface.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dimitris Michailidis [Thu, 30 Sep 2010 09:17:12 +0000 (09:17 +0000)]
cxgb4: remove a bogus PCI function number check
Remove a bogus PCI function number check from the driver's .remove
method that causes pci_release_regions not to be called for function 0
if additional functions are attached and one of them is used as primary.
Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 2 Oct 2010 04:37:07 +0000 (04:37 +0000)]
drivers/atm/idt77252.c: Remove unnecessary error check
This code does not call deinit_card(card); in an error case, as done in
other error-handling code in the same function. But actually, the called
function init_sram can only return 0, so there is no need for the error
check at all.
init_sram is also given a void return type, and its single return statement
at the end of the function is dropped.
A simplified version of the sematic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r exists@
@r@
statement S1,S2,S3;
constant C1,C2,C3;
@@
*if (...)
{... S1 return -C1;}
...
*if (...)
{... when != S1
return -C2;}
...
*if (...)
{... S1 return -C3;}
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Fri, 1 Oct 2010 11:17:12 +0000 (11:17 +0000)]
drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes
ERROR: trailing statements should be on next line
#23: FILE: drivers/net/tulip/de4x5.c:5477:
+ if (copy_to_user(ioc->data, tmp.lval, ioc->len)) return -EFAULT;
total: 1 errors, 0 warnings, 8 lines checked
./patches/drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl.patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Rosenberg [Fri, 1 Oct 2010 11:51:47 +0000 (11:51 +0000)]
sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Rosenberg [Fri, 1 Oct 2010 11:16:58 +0000 (11:16 +0000)]
sctp: prevent reading out-of-bounds memory
Two user-controlled allocations in SCTP are subsequently dereferenced as
sockaddr structs, without checking if the dereferenced struct members fall
beyond the end of the allocated chunk. There doesn't appear to be any
information leakage here based on how these members are used and
additional checking, but it's still worth fixing.
[akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Stevens [Thu, 30 Sep 2010 14:29:40 +0000 (14:29 +0000)]
ipv4: correct IGMP behavior on v3 query during v2-compatibility mode
A recent patch to allow IGMPv2 responses to IGMPv3 queries
bypasses length checks for valid query lengths, incorrectly
resets the v2_seen timer, and does not support IGMPv1.
The following patch responds with a v2 report as required
by IGMPv2 while correcting the other problems introduced
by the patch.
Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 Oct 2010 16:15:29 +0000 (16:15 +0000)]
ipmr: cleanups
Various code style cleanups
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 Oct 2010 16:15:08 +0000 (16:15 +0000)]
ipmr: RCU protection for mfc_cache_array
Use RCU & RTNL protection for mfc_cache_array[]
ipmr_cache_find() is called under rcu_read_lock();
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 Oct 2010 16:15:01 +0000 (16:15 +0000)]
ipmr: RCU conversion of mroute_sk
Use RCU and RTNL to protect (struct mr_table)->mroute_sk
Readers use RCU, writers use RTNL.
ip_ra_control() already use an RCU grace period before
ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in
mrtsock_destruct()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 1 Oct 2010 16:14:55 +0000 (16:14 +0000)]
ipmr: __pim_rcv() is called under rcu_read_lock
No need to get a reference on reg_dev and release it, we are in a
rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 1 Oct 2010 13:58:00 +0000 (13:58 +0000)]
gre: protocol table can be static
This table is only used in gre.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 3 Oct 2010 15:42:05 +0000 (15:42 +0000)]
netdev: Depend on INET before selecting INET_LRO
Since 'select' ignores dependencies, drivers that select INET_LRO must
depend on INET. This fixes the broken configuration reported in
<http://article.gmane.org/gmane.linux.kernel/825646>.
Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 3 Oct 2010 15:37:42 +0000 (15:37 +0000)]
Revert "ipv4: Make INET_LRO a bool instead of tristate."
This reverts commit
e81963b180ac502fda0326edf059b1e29cdef1a2.
LRO is now deprecated in favour of GRO, and only a few drivers use it,
so it is desirable to build it as a module in distribution kernels.
The original change to prevent building it as a module was made in an
attempt to avoid the case where some dependents are set to y and some
to m, and INET_LRO can be set to m rather than y. However, the
Kconfig system will reliably set INET_LRO=y in this case.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nagendra Tomar [Sat, 2 Oct 2010 23:45:06 +0000 (23:45 +0000)]
net: Fix the condition passed to sk_wait_event()
This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel: [<
ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel: [<
ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<
What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.
[ Bug introduced by commit
c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088
("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Sun, 3 Oct 2010 21:49:00 +0000 (14:49 -0700)]
net: Fix IPv6 PMTU disc. w/ asymmetric routes
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Fri, 1 Oct 2010 15:12:36 +0000 (11:12 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6 into for-davem
Vasanthy Kolluri [Thu, 30 Sep 2010 13:36:05 +0000 (13:36 +0000)]
enic: Update MAINTAINERS
Update MAINTAINERS list
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 30 Sep 2010 13:35:45 +0000 (13:35 +0000)]
enic: Make local functions static
Make functions used locally in a file as static
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasanthy Kolluri [Thu, 30 Sep 2010 13:35:34 +0000 (13:35 +0000)]
enic: Remove dead code
Removed code that is unused
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 05:36:29 +0000 (05:36 +0000)]
neigh: reorder fields in struct neighbour
On 64bit arches, there are two 32bit holes that we can remove.
sizeof(struct neighbour) shrinks from 0xf8 to 0xf0 bytes
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:52 +0000 (13:35 +0000)]
isdn/gigaset: improve bas_gigaset USB error reporting
Rephrase some USB error messages to make them clearer and more consistent.
Downgrade some warning messages that may occur during normal operation to
debug messages.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:42 +0000 (13:35 +0000)]
isdn/gigaset: fix bas_gigaset interrupt read error handling
Rework the handling of USB errors in interrupt input reads
to clear halts correctly, delay URB resubmission after errors,
limit retries, and improve error recovery.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:31 +0000 (13:35 +0000)]
isdn/gigaset: unclog bas_gigaset AT response pipe
Recover from a lost HD_RECEIVEATDATA_ACK message by sending a
zero-length HD_READ_ATMESSAGE command when ev_layer sends "+++".
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:21 +0000 (13:35 +0000)]
isdn/gigaset: try USB reset for bas_gigaset error recovery
In error_reset(), if sending HD_RESET_INTERRUPT_PIPE to the device
fails, try performing an USB reset.
Also correct an error in the leading comment.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:11 +0000 (13:35 +0000)]
isdn/gigaset: bas_gigaset timer cleanup
Use setup_timer() and mod_timer() instead of direct assignment to
timer structure members, simplify the argument of one timer routine,
and make extra sure all timers are stopped during suspend.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:35:01 +0000 (13:35 +0000)]
isdn/gigaset: drop obsolete debug option
Remove the debug flag DEBUG_DRIVER and associated code.
It doesn't serve any useful purpose anymore.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:34:51 +0000 (13:34 +0000)]
isdn/gigaset: correct bas_gigaset rx buffer handling
In transparent data reception, avoid a NULL pointer dereference
in case an skbuff cannot be allocated, remove an inappropriate
call to the HDLC flush routine, and correct the accounting of
received bytes for continued buffers.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:34:40 +0000 (13:34 +0000)]
isdn/gigaset: fix bas_gigaset AT read error handling
Rework the handling of USB errors in AT response reads
to fix a possible infinite retry loop and a memory leak,
and silence a few overly verbose kernel messages.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Thu, 30 Sep 2010 13:34:30 +0000 (13:34 +0000)]
isdn/gigaset: bas_gigaset locking fix
Unlock cs->lock before calling error_hangup() which is marked
"cs->lock must not be held".
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:37 +0000 (10:34 +0000)]
tg3: Update version to 3.114
This patch updates the tg3 version to 3.114.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:36 +0000 (10:34 +0000)]
tg3: Add extend rx ring sizes for 5717 and 5719
This patch increases the rx ring sizes for those asic revs that support
them.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:35 +0000 (10:34 +0000)]
tg3: Prepare for larger rx ring sizes
This patch adds two new variables to track the size of the standard and
jumbo rx producer ring sizes. The code is then pivoted to these
variables from preprocessor constants.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:34 +0000 (10:34 +0000)]
tg3: Futureproof the loopback test
There are other multiqueue modes 5717 and 5719 devices can assume. This
patch makes sure that the loopback test is safe, should those other
modes be enabled in the future.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:33 +0000 (10:34 +0000)]
tg3: Cleanup missing VPD partno section
This patch cleans up the default VPD partno section. New entries for
5717 asic rev devices were also added.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:32 +0000 (10:34 +0000)]
tg3: Remove 5724 device ID
This product was never released to the public.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:31 +0000 (10:34 +0000)]
tg3: 5719: Prevent tx data corruption
This patch enables a bit that prevents read DMA overflows and adjusts
the txmbuf margin from the hardware default. The combination of these
modifications prevents a tx data corruption issue we were seeing on the
5719.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Thu, 30 Sep 2010 10:34:30 +0000 (10:34 +0000)]
tg3: Fix potential netpoll crash
Up until now the tg3 driver would call netif_napi_add() for the maximum
number of NAPI instances the driver could use. The problem is that
netpoll could call tg3_poll() on instances that are not active. The net
effect is that the driver will crash attempting to dereference
uninitialized pointers.
The fix is to only allocate as many NAPI instances as the driver would
use in tg3_open() and deleted them in tg3_close().
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 03:33:58 +0000 (03:33 +0000)]
ipv4: rcu conversion in ip_route_output_slow
ip_route_output_slow() is enclosed in an rcu_read_lock() protected
section, so that no references are taken/released on device, thanks to
__ip_dev_find() & dev_get_by_index_rcu()
Tested with ip route cache disabled, and a stress test :
Before patch:
elapsed time :
real 1m38.347s
user 0m11.909s
sys 23m51.501s
Profile:
13788.00 22.7% ip_route_output_slow [kernel]
7875.00 13.0% dst_destroy [kernel]
3925.00 6.5% fib_semantic_match [kernel]
3144.00 5.2% fib_rules_lookup [kernel]
3061.00 5.0% dst_alloc [kernel]
2276.00 3.7% rt_set_nexthop [kernel]
1762.00 2.9% fib_table_lookup [kernel]
1538.00 2.5% _raw_read_lock [kernel]
1358.00 2.2% ip_output [kernel]
After patch:
real 1m28.808s
user 0m13.245s
sys 20m37.293s
10950.00 17.2% ip_route_output_slow [kernel]
10726.00 16.9% dst_destroy [kernel]
5170.00 8.1% fib_semantic_match [kernel]
3937.00 6.2% dst_alloc [kernel]
3635.00 5.7% rt_set_nexthop [kernel]
2900.00 4.6% fib_rules_lookup [kernel]
2240.00 3.5% fib_table_lookup [kernel]
1427.00 2.2% _raw_read_lock [kernel]
1157.00 1.8% kmem_cache_alloc [kernel]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 03:31:56 +0000 (03:31 +0000)]
ipv4: introduce __ip_dev_find()
ip_dev_find(net, addr) finds a device given an IPv4 source address and
takes a reference on it.
Introduce __ip_dev_find(), taking a third argument, to optionally take
the device reference. Callers not asking the reference to be taken
should be in an rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 30 Sep 2010 02:16:44 +0000 (02:16 +0000)]
vlan: dont drop packets from unknown vlans in promiscuous mode
Roger Luethi noticed packets for unknown VLANs getting silently dropped
even in promiscuous mode.
Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common()
before drops.
As suggested by Patrick, mark such packets to have skb->pkt_type set to
PACKET_OTHERHOST to make sure they are dropped by IP stack.
Reported-by: Roger Luethi <rl@hellgate.ch>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Wed, 29 Sep 2010 21:39:37 +0000 (21:39 +0000)]
e1000e: 82579 performance improvements
The initial support for 82579 was tuned poorly for performance. Adjust the
packet buffer allocation appropriately for both standard and jumbo frames;
and for jumbo frames increase the receive descriptor pre-fetch, disable
adaptive interrupt moderation and set the DMA latency tolerance.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Wed, 29 Sep 2010 21:38:49 +0000 (21:38 +0000)]
e1000e: use hardware writeback batching
Most e1000e parts support batching writebacks. The problem with this is
that when some of the TADV or TIDV timers are not set, Tx can sit forever.
This is solved in this patch with write flushes using the Flush Partial
Descriptors (FPD) bit in TIDV and RDTR.
This improves bus utilization and removes partial writes on e1000e,
particularly from 82571 parts in S5500 chipset based machines.
Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
testing load.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Wed, 29 Sep 2010 21:35:23 +0000 (21:35 +0000)]
ixgbe: fix link issues and panic with shared interrupts for 82598
Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.
Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()
This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 29 Sep 2010 11:53:50 +0000 (11:53 +0000)]
ipv4: __mkroute_output() speedup
While doing stress tests with a disabled IP route cache, I found
__mkroute_output() was touching three times in_device atomic refcount.
Use RCU to touch it once to reduce cache line ping pongs.
Before patch
time to perform the test
real 1m42.009s
user 0m12.545s
sys 25m0.726s
Profile :
16109.00 26.4% ip_route_output_slow vmlinux
7434.00 12.2% dst_destroy vmlinux
3280.00 5.4% fib_rules_lookup vmlinux
3252.00 5.3% fib_semantic_match vmlinux
2622.00 4.3% fib_table_lookup vmlinux
2535.00 4.1% dst_alloc vmlinux
1750.00 2.9% _raw_read_lock vmlinux
1532.00 2.5% rt_set_nexthop vmlinux
After patch
real 1m36.503s
user 0m12.977s
sys 23m25.608s
14234.00 22.4% ip_route_output_slow vmlinux
8717.00 13.7% dst_destroy vmlinux
4052.00 6.4% fib_rules_lookup vmlinux
3951.00 6.2% fib_semantic_match vmlinux
3191.00 5.0% dst_alloc vmlinux
1764.00 2.8% fib_table_lookup vmlinux
1692.00 2.7% _raw_read_lock vmlinux
1605.00 2.5% rt_set_nexthop vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rémi Denis-Courmont [Wed, 29 Sep 2010 22:33:50 +0000 (22:33 +0000)]
Phonet: restore flow control credits when sending fails
This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.
commit
1a98214feef2221cd7c24b17cd688a5a9d85b2ea
Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date: Mon Aug 30 12:57:03 2010 +0000
Phonet: restore flow control credits when sending fails
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Rakity [Tue, 28 Sep 2010 04:26:30 +0000 (04:26 +0000)]
net: pxa168_etc.c recognize additional contributors
Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Sachin Sanap <ssanap@marvell.com>
Signed-off-by: Mark Brown <markb@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 30 Sep 2010 19:02:22 +0000 (12:02 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6
Eric Dumazet [Thu, 30 Sep 2010 06:35:10 +0000 (23:35 -0700)]
ip_gre: comments change
HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Tue, 28 Sep 2010 08:46:17 +0000 (08:46 +0000)]
de2104x: remove experimental status
It should be ready after 8 years...remove the experimental dependency.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary [Tue, 28 Sep 2010 08:18:55 +0000 (08:18 +0000)]
de2104x: disable media debug messages by default
Print media debug messages only when HW debug is enabled.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Gallatin [Tue, 28 Sep 2010 08:13:12 +0000 (08:13 +0000)]
myri10ge: DCA update (resubmit)
This patch contains the following DCA improvements to myri10ge:
1) Finally move myri10ge to use dca3 API
2) Disable PCIe relaxed ordering when enabling DCA on
myri10ge. This provides a performance boost on Nehalem
based Xeons
3) Make sure to properly initialize NIC's DCA state when it is enabled,
rather than giving the NIC a bogus tag (0) and waiting for
the first received packet to trigger an update. Not using a
real tag can cause hardware exceptions on some motherboards
when a CPU socket is empty.
3) Always update the cached CPU when our interrupt affinity changes
so as to avoid excessive calls to dca3_get_tag()
Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: Loic Prylli <loic@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Wed, 29 Sep 2010 01:05:37 +0000 (01:05 +0000)]
bnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()
Moved enabling of MSI to the bnx2x_set_num_queues() - the same functions that
handles the initialization of the MSI-X.
From: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 28 Sep 2010 19:30:14 +0000 (19:30 +0000)]
tcp: tcp_enter_quickack_mode can be static
Function only used in tcp_input.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Tue, 28 Sep 2010 17:08:02 +0000 (17:08 +0000)]
arp: remove unnecessary export of arp_broken_ops
arp_broken_ops is only used in arp.c
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kumar Sanghvi [Mon, 27 Sep 2010 23:10:42 +0000 (23:10 +0000)]
Phonet: Correct header retrieval after pskb_may_pull
Retrieve the header after doing pskb_may_pull since, pskb_may_pull
could change the buffer structure.
This is based on the comment given by Eric Dumazet on Phonet
Pipe controller patch for a similar problem.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Boaz Harrosh [Wed, 29 Sep 2010 08:34:27 +0000 (08:34 +0000)]
um: Proper Fix for
f25c80a4: remove duplicate structure field initialization
uml_net_set_mac() was broken and luckily it was never used, before.
What it was trying to do is spin_lock before memcopy the mac address.
Linus attempted to fix it in assumption that someone decided the
lock was needed. But since it was never ever used at all, and was
just dead code, I think we can assume that it is not needed, after
all.
On the other hand patch [
f25c80a4] was trying to use eth_mac_addr()
in eth_configure(), *which was the real fallout*. Because of state
checks done inside eth_mac_addr() the address was never set. I have
not reintroduced the memcpy wrapper, but I've put a comment for future
cats.
The code now is back to exactly as it was before [
f25c80a4]. With
the cleanup applied. If the spin_lock is indeed needed then a contender
should supply a test case that fails, then fix it with the proper
locking, as a separate unrelated patch.
CC: Julia Lawall <julia@diku.dk>
CC: David S. Miller <davem@davemloft.net>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Al Viro <viro@ZenIV.linux.org.uk>
Tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 05:58:37 +0000 (05:58 +0000)]
net: rename netdev rx_queue to ingress_queue
There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.
I suggest to rename "struct net_device"->rx_queue to ingress_queue.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 03:23:34 +0000 (03:23 +0000)]
ip6tnl: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 00:17:17 +0000 (00:17 +0000)]
ipip: enable lockless xmits
IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)
Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s
After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s
Last problem to solve is the contention on dst :
16118.00 28.3% __ip_route_output_key vmlinux
6135.00 10.8% dst_release vmlinux
3220.00 5.6% ip_finish_output vmlinux
2149.00 3.8% ip_route_output_flow vmlinux
1575.00 2.8% ip_append_data vmlinux
1481.00 2.6% ip_push_pending_frames vmlinux
1349.00 2.4% __xfrm_lookup vmlinux
1216.00 2.1% csum_partial_copy_generic vmlinux
1208.00 2.1% udp_sendmsg vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 23:05:47 +0000 (23:05 +0000)]
ip_gre: lockless xmit
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s
After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s
Last problem to solve is the contention on dst :
38660.00 21.4% __ip_route_output_key vmlinux
20786.00 11.5% dst_release vmlinux
14191.00 7.8% __xfrm_lookup vmlinux
12410.00 6.9% ip_finish_output vmlinux
4540.00 2.5% ip_push_pending_frames vmlinux
4427.00 2.4% ip_append_data vmlinux
4265.00 2.4% __alloc_skb vmlinux
4140.00 2.3% __ip_local_out vmlinux
3991.00 2.2% dev_queue_xmit vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 02:53:41 +0000 (02:53 +0000)]
sit: enable lockless xmits
SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one sit tunnel (size:220 bytes per frame)
Before patch :
real 3m15.399s
user 0m9.185s
sys 51m55.403s
75029.00 87.5% _raw_spin_lock vmlinux
1090.00 1.3% dst_release vmlinux
902.00 1.1% dev_queue_xmit vmlinux
627.00 0.7% sock_wfree vmlinux
613.00 0.7% ip6_push_pending_frames ipv6.ko
505.00 0.6% __ip_route_output_key vmlinux
After patch:
real 1m1.387s
user 0m12.489s
sys 15m58.868s
28239.00 23.3% dst_release vmlinux
13570.00 11.2% ip6_push_pending_frames ipv6.ko
13118.00 10.8% ip6_append_data ipv6.ko
7995.00 6.6% __ip_route_output_key vmlinux
7924.00 6.5% sk_dst_check vmlinux
5015.00 4.1% udpv6_sendmsg ipv6.ko
3594.00 3.0% sock_alloc_send_pskb vmlinux
3135.00 2.6% sock_wfree vmlinux
3055.00 2.5% ip6_sk_dst_lookup ipv6.ko
2473.00 2.0% ip_finish_output vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 28 Sep 2010 02:17:58 +0000 (02:17 +0000)]
sit: fix percpu stats accounting
commit
15fc1f7056ebd (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 23:56:46 +0000 (23:56 +0000)]
ipip: fix percpu stats accounting
commit
3c97af99a5aa1 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Sep 2010 20:50:33 +0000 (20:50 +0000)]
dummy: percpu stats and lockless xmit
Converts dummy network device driver to :
- percpu stats
- 64bit stats
- lockless xmit (NETIF_F_LLTX)
- performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 29 Sep 2010 20:23:09 +0000 (13:23 -0700)]
net: add a recursion limit in xmit path
As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.
Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Mon, 27 Sep 2010 00:07:02 +0000 (00:07 +0000)]
ipv6: Implement Any-IP support for IPv6.
AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.
An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.
Can be setup as follows:
ip -6 rule from all iif eth0 lookup 200
ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 23 May 2010 19:54:12 +0000 (19:54 +0000)]
ipv4: Allow configuring subnets as local addresses
This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface. This is done through routing
table configuration. For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:
ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route). On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback). Presumably, external routing can be
configured to make sense out of this.
To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find). We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.
This patch is useful to implement IP-anycast for subnets of virtual
addresses.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 29 Sep 2010 05:37:56 +0000 (22:37 -0700)]
ip_gre: Fix dependencies wrt. ipv6.
The GRE tunnel driver needs to invoke icmpv6 helpers in the
ipv6 stack when ipv6 support is enabled.
Therefore if IPV6 is enabled, we have to enforce that GRE's
enabling (modular or static) matches that of ipv6.
Reported-by: Patrick McHardy <kaber@trash.net>
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Damian Lukowski [Tue, 28 Sep 2010 20:08:32 +0000 (13:08 -0700)]
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
Fixes kernel Bugzilla Bug 18952
This patch adds a syn_set parameter to the retransmits_timed_out()
routine and updates its callers. If not set, TCP_RTO_MIN is taken
as the calculation basis as before. If set, TCP_TIMEOUT_INIT is
used instead, so that sysctl_syn_retries represents the actual
amount of SYN retransmissions in case no SYNACKs are received when
establishing a new connection.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Tue, 28 Sep 2010 20:01:47 +0000 (16:01 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6
Conflicts:
drivers/net/wireless/iwlwifi/iwl-agn-lib.c
drivers/net/wireless/iwlwifi/iwl3945-base.c
Juuso Oikarinen [Tue, 28 Sep 2010 11:39:32 +0000 (14:39 +0300)]
mac80211: Fix WMM driver queue configuration
The WMM parameter configuration function (ieee80211_sta_wmm_params) only
configures the WMM parameters to the driver is the wmm_last_param_set
counter value is changed by the AP.
The wmm_last_param_set is initialized to -1 on association in order to ensure
the configuration is made to the driver at least once on association, but
currently this initialization is done *after* the WMM parameter configuration
function was called.
This leads to unreliability in the driver getting properly configured on first
association (depending on what counter value the AP happens to use.) When
disassociating (the wmm default parameters are configured to the driver) and
then reassociating, due to the above the WMM configuration is not set to the
driver at all.
On drivers without beacon filtering the problem is corrected by later beacons,
but on drivers with beacon filtering the WMM will remain permanently incorrectly
configured.
Fix this by moving the initialization of wmm_last_param_set to -1 before
ieee80211_sta_wmm_params is called on association.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>