Linus Torvalds [Fri, 9 May 2008 15:01:19 +0000 (08:01 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
sit: Add missing kfree_skb() on pskb_may_pull() failure.
tipc: Increase buffer header to support worst-case device
Rusty Russell [Fri, 9 May 2008 06:25:28 +0000 (16:25 +1000)]
module: don't ignore vermagic string if module doesn't have modversions
Linus found a logic bug: we ignore the version number in a module's
vermagic string if we have CONFIG_MODVERSIONS set, but modversions
also lets through a module with no __versions section for modprobe
--force (with tainting, but still).
We should only ignore the start of the vermagic string if the module
actually *has* crcs to check. Rather than (say) having an
entertaining hissy fit and creating a config option to work around the
buggy code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Fri, 9 May 2008 06:24:21 +0000 (16:24 +1000)]
module: be more picky about allowing missing module versions
We allow missing __versions sections, because modprobe --force strips
it. It makes less sense to allow sections where there's no version
for a specific symbol the module uses, so disallow that.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Fri, 9 May 2008 06:23:17 +0000 (16:23 +1000)]
module: put modversions in vermagic
Don't allow a module built without versions altogether to be inserted
into a kernel which expects modversions.
modprobe --force will strip vermagic as well as modversions, so it
won't be effected, but this will make sure that a
non-CONFIG_MODVERSIONS module won't be accidentally inserted into a
CONFIG_MODVERSIONS kernel.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Fri, 9 May 2008 06:40:26 +0000 (23:40 -0700)]
sit: Add missing kfree_skb() on pskb_may_pull() failure.
Noticed by Paul Marks <paul@pmarks.net>.
Signed-off-by: David S. Miller <davem@davemloft.net>
Allan Stephens [Fri, 9 May 2008 04:38:24 +0000 (21:38 -0700)]
tipc: Increase buffer header to support worst-case device
This patch increases the headroom TIPC reserves in each sk_buff
to accommodate the largest possible link level device header.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 9 May 2008 02:03:26 +0000 (19:03 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
net: Added ASSERT_RTNL() to dev_open() and dev_close().
can: Fix can_send() handling on dev_queue_xmit() failures
netns: Fix arbitrary net_device-s corruptions on net_ns stop.
netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values
netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request
macvlan: Fix memleak on device removal/crash on module removal
net/ipv4: correct RFC 1122 section reference in comment
tcp FRTO: SACK variant is errorneously used with NewReno
e1000e: don't return half-read eeprom on error
ucc_geth: Don't use RX clock as TX clock.
cxgb3: Use CAP_SYS_RAWIO for firmware
pcnet32: delete non NAPI code from driver.
fs_enet: Fix a memory leak in fs_enet_mdio_probe
[netdrvr] eexpress: IPv6 fails - multicast problems
3c59x: use netstats in net_device structure
3c980-TX needs EXTRA_PREAMBLE
fix warning in drivers/net/appletalk/cops.c
e1000e: Add support for BM PHYs on ICH9
uli526x: fix endianness issues in the setup frame
uli526x: initialize the hardware prior to requesting interrupts
...
Linus Torvalds [Fri, 9 May 2008 02:03:19 +0000 (19:03 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Fix SA_ONSTACK signal handling.
Linus Torvalds [Fri, 9 May 2008 01:41:48 +0000 (18:41 -0700)]
Revert "PCI: remove default PCI expansion ROM memory allocation"
This reverts commit
9f8daccaa05c14e5643bdd4faf5aed9cc8e6f11e, which was
reported to break X startup (xf86-video-ati-6.8.0). See
http://bugs.freedesktop.org/show_bug.cgi?id=15523
for details.
Reported-by: Laurence Withers <l@lwithers.me.uk>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 8 May 2008 18:31:07 +0000 (11:31 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mingo/linux-2.6-sched-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes:
sched: fix weight calculations
semaphore: fix
Linus Torvalds [Thu, 8 May 2008 17:58:45 +0000 (10:58 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
[ALSA] soc at91 minor bug fixes
[ALSA] soc - at91-pcm - Fix line wrapping
pcspkr: fix dependancies
Huang Weiyi [Thu, 8 May 2008 14:48:31 +0000 (22:48 +0800)]
Remove duplicated include in net/sunrpc/svc.c
<linux/sched.h> we included twice.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Weiyi [Thu, 8 May 2008 14:36:27 +0000 (22:36 +0800)]
fs/proc/task_mmu.c: remove duplicated include files
Removed duplicated include files <linux/ptrace.h> and <linux/seq_file.h> in
fs/proc/task_mmu.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Wed, 30 Apr 2008 07:48:07 +0000 (09:48 +0200)]
Fix drivers/media build for modular builds
Fix allmodconfig build bug introduced in latest -git by commit
7c91f0624a9 ("V4L/DVB(7767): Move tuners to common/tuners"):
LD kernel/built-in.o
LD drivers/built-in.o
ld: drivers/media/built-in.o: No such file: No such file or directory
which happens if all media drivers are modular:
http://redhat.com/~mingo/misc/config-Wed_Apr_30_09_24_48_CEST_2008.bad
In that case there's no obj-y rule connecting all the built-in.o files and
the link tree breaks.
The fix is to add a guaranteed obj-y rule for the core vmlinux to build.
(which results in an empty object file if all media drivers are modular)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 8 May 2008 17:50:34 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/ehca: Wait for async events to finish before destroying QP
IB/ipath: Fix SDMA error recovery in absence of link status change
IB/ipath: Need to always request and handle PIO avail interrupts
IB/ipath: Fix count of packets received by kernel
IB/ipath: Return the correct opcode for RDMA WRITE with immediate
IB/ipath: Fix bug that can leave sends disabled after freeze recovery
IB/ipath: Only increment SSN if WQE is put on send queue
IB/ipath: Only warn about prototype chip during init
RDMA/cxgb3: Fix severe limit on userspace memory registration size
RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks
David Howells [Wed, 7 May 2008 14:31:54 +0000 (15:31 +0100)]
MN10300: Make cpu_relax() invoke barrier()
Make cpu_relax() invoke barrier() to be the same as other arches.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 8 May 2008 17:48:36 +0000 (10:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "relay: fix splice problem"
docbook: fix bio missing parameter
block: use unitialized_var() in bio_alloc_bioset()
block: avoid duplicate calls to get_part() in disk stat code
cfq-iosched: make io priorities inherit CPU scheduling class as well as nice
block: optimize generic_unplug_device()
block: get rid of likely/unlikely predictions in merge logic
vfs: splice remove_suid() cleanup
cfq-iosched: fix RCU race in the cfq io_context destructor handling
block: adjust tagging function queue bit locking
block: sysfs store function needs to grab queue_lock and use queue_flag_*()
Linus Torvalds [Thu, 8 May 2008 17:48:03 +0000 (10:48 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: Fix memory corruption when fs mounted with noadinicb option
udf: Make udf exportable
udf: fs/udf/partition.c:udf_get_pblock() mustn't be inline
Linus Torvalds [Thu, 8 May 2008 17:47:39 +0000 (10:47 -0700)]
Merge branch 'for-linus' of git://git390.osdl.marist.edu/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] guest page hinting light
[S390] tty3270: fix put_char fail/success conversion.
[S390] compat ptrace cleanup
[S390] s390mach compile warning
[S390] cio: Fix parsing mechanism for blacklisted devices.
[S390] cio: Remove cio_msg kernel parameter.
[S390] s390-kvm: leave sie context on work. Removes preemption requirement
[S390] s390: Optimize user and work TIF check
Andrew Morton [Wed, 7 May 2008 03:42:42 +0000 (20:42 -0700)]
drivers/scsi/dpt_i2o.c: fix build on alpha
alpha:
drivers/scsi/dpt_i2o.c:1997: error: implicit declaration of function 'adpt_alpha_info'
drivers/scsi/dpt_i2o.c: At top level:
drivers/scsi/dpt_i2o.c:2032: warning: conflicting types for 'adpt_alpha_info'
drivers/scsi/dpt_i2o.c:2032: error: static declaration of 'adpt_alpha_info' follows non-static declaration
drivers/scsi/dpt_i2o.c:1997: error: previous implicit declaration of 'adpt_alpha_info' was here
Due to a copy-n-paste error in drivers/scsi/dpti.h.
Fix that up and remove some of the many daft static-declarations-in-a-header
which this driver enjoys.
Cc: Miquel van Smoorenburg <miquels@cistron.nl>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Wed, 7 May 2008 03:42:41 +0000 (20:42 -0700)]
Fix cpuset sched_relax_domain_level control file
Due to a merge conflict, the sched_relax_domain_level control file was marked
as being handled by cpuset_read/write_u64, but the code to handle it was
actually in cpuset_common_file_read/write.
Since the value being written/read is in fact a signed integer, it should be
treated as such; this patch adds cpuset_read/write_s64 functions, and uses
them to handle the sched_relax_domain_level file.
With this patch, the sched_relax_domain_level can be read and written, and the
correct contents seen/updated.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Wed, 7 May 2008 03:42:39 +0000 (20:42 -0700)]
slub: fix atomic usage in any_slab_objects()
any_slab_objects() does an atomic_read on an atomic_long_t, this
fixes it to use atomic_long_read instead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ulrich Drepper [Wed, 7 May 2008 03:42:38 +0000 (20:42 -0700)]
sys_pipe(): fix file descriptor leaks
Remember to close the files if copy_to_user() failed.
Spotted by dm.n9107@gmail.com.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: DM <dm.n9107@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Wed, 7 May 2008 03:42:37 +0000 (20:42 -0700)]
vt: fix canonical input in UTF-8 mode
For e.g. proper TTY canonical support, IUTF8 termios flag has to be set as
appropriate. Linux used to not care about setting that flag for VT TTYs.
This patch fixes that by activating it according to the current mode of the
VT, and sets the default value according to the vt.default_utf8 parameter.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mattia Dongili [Wed, 7 May 2008 03:42:35 +0000 (20:42 -0700)]
usb/asix: add Buffalo LUA-U2-GT 10/100/1000
The USB net adapter Buffalo LUA-U2-GT (0411:006e) carries a AX88178 chip.
Tested on the above HW.
Signed-off-by: Mattia Dongili <malattia@linux.it>
Acked-off-by: David Hollis <dhollis@davehollis.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 7 May 2008 03:42:35 +0000 (20:42 -0700)]
sx.c: fix printk warnings on sparc32
drivers/char/sx.c: In function 'sx_set_real_termios':
drivers/char/sx.c:973: warning: format '%u' expects type 'unsigned int', but argument 2 has type 'long unsigned int'
drivers/char/sx.c:999: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'tcflag_t'
drivers/char/sx.c:1012: warning: format '%x' expects type 'unsigned int', but argument 2 has type 'tcflag_t'
sparc32 seems to use weird types for its tty things.
[ Fine by me but this is ancient debug and most of the debug in sx just
wants deleting eventually. - Alan ]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Wed, 7 May 2008 03:42:33 +0000 (20:42 -0700)]
uml: fix inconsistence due to tty_operation change
'put_char' of 'struct tty_operations' has changed from 'void' into 'int'.
This can also shut up compiler warnings.
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Wed, 7 May 2008 03:42:32 +0000 (20:42 -0700)]
misc: fix integer as NULL pointer warnings
drivers/md/raid10.c:889:17: warning: Using plain integer as NULL pointer
drivers/media/video/cx18/cx18-driver.c:616:12: warning: Using plain integer as NULL pointer
sound/oss/kahlua.c:70:12: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 7 May 2008 03:42:31 +0000 (20:42 -0700)]
fix irq flags for iuu_phoenix.c
The file drivers/usb/serial/iuu_phoenix.c uses "int" for flags. This can
cause hard to find bugs on some architectures. This patch converts the flags
to use "long" instead.
This bug was discovered by doing an allyesconfig make on the -rt kernel where
checks are done to ensure all flags are of size sizeof(long).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 7 May 2008 03:42:30 +0000 (20:42 -0700)]
fix irq flags in rtc-ds1511
The file in drivers/rtc/rtc-ds1551.c uses "int" for flags. This can cause
hard to find bugs on some architectures. This patch converts the flags to use
"long" instead.
This bug was discovered by doing an allyesconfig make on the -rt kernel where
checks are done to ensure all flags are of size sizeof(long).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 7 May 2008 03:42:29 +0000 (20:42 -0700)]
fix irq flags in saa7134
Some files in the drivers/media/video/saa7134 directory uses "int" for flags.
This can cause hard to find bugs on some architectures. This patch converts
the flags to use "long" instead.
This bug was discovered by doing an allyesconfig make on the -rt kernel where
checks are done to ensure all flags are of size sizeof(long).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Rostedt [Wed, 7 May 2008 03:42:28 +0000 (20:42 -0700)]
fix irq flags in mac80211 code
A file in the net/mac80211 directory uses "int" for flags. This can cause
hard to find bugs on some architectures. This patch converts the flags to use
"long" instead.
This bug was discovered by doing an allyesconfig make on the -rt kernel where
checks are done to ensure all flags are of size sizeof(long).
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Wed, 7 May 2008 03:42:27 +0000 (20:42 -0700)]
serial: access after NULL check in uart_flush_buffer()
I noticed that
static void uart_flush_buffer(struct tty_struct *tty)
{
struct uart_state *state = tty->driver_data;
struct uart_port *port = state->port;
unsigned long flags;
/*
* This means you called this function _after_ the port was
* closed. No cookie for you.
*/
if (!state || !state->info) {
WARN_ON(1);
return;
}
is too late for checking state != NULL.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Wed, 7 May 2008 03:42:26 +0000 (20:42 -0700)]
Kconfig: improved help for CONFIG_ACCESSIBILITY
Add a small explanation of what accessibility is.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Galbraith [Thu, 8 May 2008 15:00:42 +0000 (17:00 +0200)]
sched: fix weight calculations
The conversion between virtual and real time is as follows:
dvt = rw/w * dt <=> dt = w/rw * dvt
Since we want the fair sleeper granularity to be in real time, we actually
need to do:
dvt = - rw/w * l
This bug could be related to the regression reported by Yanmin Zhang:
| Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has lots
| of regressions with 2.6.26-rc1:
|
| 1) 8-core stoakley: 28%;
| 2) 16-core tigerton: 20%;
| 3) Itanium Montvale: 50%.
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Thu, 8 May 2008 09:53:48 +0000 (11:53 +0200)]
semaphore: fix
Yanmin Zhang reported:
| Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
| regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
| and Itanium Montecito. Bisect located the patch below:
|
|
64ac24e738823161693bf791f87adc802cf529ff is first bad commit
| commit
64ac24e738823161693bf791f87adc802cf529ff
| Author: Matthew Wilcox <matthew@wil.cx>
| Date: Fri Mar 7 21:55:58 2008 -0500
|
| Generic semaphore implementation
|
| After I manually reverted the patch against 2.6.26-rc1 while fixing
| lots of conflicts/errors, aim7 regression became less than 2%.
i reproduced the AIM7 workload and can confirm Yanmin's findings that
-.26-rc1 regresses over .25 - by over 67% here.
Looking at the workload i found and fixed what i believe to be the real
bug causing the AIM7 regression: it was inefficient wakeup / scheduling
/ locking behavior of the new generic semaphore code, causing suboptimal
performance.
The problem comes from the following code. The new semaphore code does
this on down():
spin_lock_irqsave(&sem->lock, flags);
if (likely(sem->count > 0))
sem->count--;
else
__down(sem);
spin_unlock_irqrestore(&sem->lock, flags);
and this on up():
spin_lock_irqsave(&sem->lock, flags);
if (likely(list_empty(&sem->wait_list)))
sem->count++;
else
__up(sem);
spin_unlock_irqrestore(&sem->lock, flags);
where __up() does:
list_del(&waiter->list);
waiter->up = 1;
wake_up_process(waiter->task);
and where __down() does this in essence:
list_add_tail(&waiter.list, &sem->wait_list);
waiter.task = task;
waiter.up = 0;
for (;;) {
[...]
spin_unlock_irq(&sem->lock);
timeout = schedule_timeout(timeout);
spin_lock_irq(&sem->lock);
if (waiter.up)
return 0;
}
the fastpath looks good and obvious, but note the following property of
the contended path: if there's a task on the ->wait_list, the up() of
the current owner will "pass over" ownership to that waiting task, in a
wake-one manner, via the waiter->up flag and by removing the waiter from
the wait list.
That is all and fine in principle, but as implemented in
kernel/semaphore.c it also creates a nasty, hidden source of contention!
The contention comes from the following property of the new semaphore
code: the new owner owns the semaphore exclusively, even if it is not
running yet.
So if the old owner, even if just a few instructions later, does a
down() [lock_kernel()] again, it will be blocked and will have to wait
on the new owner to eventually be scheduled (possibly on another CPU)!
Or if another task gets to lock_kernel() sooner than the "new owner"
scheduled, it will be blocked unnecessarily and for a very long time
when there are 2000 tasks running.
I.e. the implementation of the new semaphores code does wake-one and
lock ownership in a very restrictive way - it does not allow
opportunistic re-locking of the lock at all and keeps the scheduler from
picking task order intelligently.
This kind of scheduling, with 2000 AIM7 processes running, creates awful
cross-scheduling between those 2000 tasks, causes reduced parallelism, a
throttled runqueue length and a lot of idle time. With increasing number
of CPUs it causes an exponentially worse behavior in AIM7, as the chance
for a newly woken new-owner task to actually run anytime soon is less
and less likely.
Note that it takes just a tiny bit of contention for the 'new-semaphore
catastrophy' to happen: the wakeup latencies get added to whatever small
contention there is, and quickly snowball out of control!
I believe Yanmin's findings and numbers support this analysis too.
The best fix for this problem is to use the same scheduling logic that
the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
wanted because we do not want to over-schedule), but also allow
opportunistic locking of the lock even if a wakee is already "in
flight".
The patch below implements this new logic. With this patch applied the
AIM7 regression is largely fixed on my quad testbox:
# v2.6.25 vanilla:
..................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 56096.4 91 207.5 789.7 0.4675
2000 55894.4 94 208.2 792.7 0.4658
#
v2.6.26-rc1-166-gc0a1811 vanilla:
...................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 33230.6 83 350.3 784.5 0.2769
2000 31778.1 86 366.3 783.6 0.2648
#
v2.6.26-rc1-166-gc0a1811 + semaphore-speedup:
...............................................
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
2000 55707.1 92 209.0 795.6 0.4642
2000 55704.4 96 209.0 796.0 0.4642
i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
performance levels and have zero idle time during the test, as expected.
Btw., interactivity also improved dramatically with the fix - for
example console-switching became almost instantaneous during this
workload (which after all is running 2000 tasks at once!), without the
patch it was stuck for a minute at times.
There's another nice side-effect of this speedup patch, the new generic
semaphore code got even smaller:
text data bss dec hex filename
1241 0 0 1241 4d9 semaphore.o.before
1207 0 0 1207 4b7 semaphore.o.after
(because the waiter.up complication got removed.)
Longer-term we should look into using the mutex code for the generic
semaphore code as well - but i's not easy due to legacies and it's
outside of the scope of v2.6.26 and outside the scope of this patch as
well.
Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jens Axboe [Thu, 8 May 2008 12:06:19 +0000 (14:06 +0200)]
Revert "relay: fix splice problem"
This reverts commit
c3270e577c18b3d0e984c3371493205a4807db9d.
Patrik Sevallius [Thu, 8 May 2008 12:04:08 +0000 (14:04 +0200)]
[ALSA] soc at91 minor bug fixes
Found these two bugs while browsing through the code. The first one is
a cut-n-paste bug, instead of disabling the clock when request_irq()
fails, it enabled it once more. The second one fixes a debug printout,
AT91_SSC_IER is write only, AT91_SSC_IMR is readable (the printed string
actually says imr).
Frank Mandarino was busy so he asked me to send these to this list.
/Patrik
Signed-off-by: Patrik Sevallius <patrik.sevallius@enea.com>
Acked-by: Frank Mandarino <fmandarino@endrelia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Mark Brown [Thu, 8 May 2008 12:03:30 +0000 (14:03 +0200)]
[ALSA] soc - at91-pcm - Fix line wrapping
There's more checkpatch stuff to fix in the driver, this just fixes the
minimum required for the following patch to be clean.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Ben Hutchings [Thu, 8 May 2008 09:53:17 +0000 (02:53 -0700)]
net: Added ASSERT_RTNL() to dev_open() and dev_close().
dev_open() and dev_close() must be called holding the RTNL, since they
call device functions and netdevice notifiers that are promised the RTNL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Thu, 8 May 2008 09:49:55 +0000 (02:49 -0700)]
can: Fix can_send() handling on dev_queue_xmit() failures
The tx packet counting and the local loopback of CAN frames should
only happen in the case that the CAN frame has been enqueued to the
netdevice tx queue successfully.
Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue.
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Urs Thuermann <urs@isnogud.escape.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 May 2008 09:35:54 +0000 (02:35 -0700)]
Merge branch 'upstream-davem' of /linux/kernel/git/jgarzik/netdev-2.6
Pavel Emelyanov [Thu, 8 May 2008 08:24:25 +0000 (01:24 -0700)]
netns: Fix arbitrary net_device-s corruptions on net_ns stop.
When a net namespace is destroyed, some devices (those, not killed
on ns stop explicitly) are moved back to init_net.
The problem, is that this net_ns change has one point of failure -
the __dev_alloc_name() may be called if a name collision occurs (and
this is easy to trigger). This allocator performs a likely-to-fail
GFP_ATOMIC allocation to find a suitable number. Other possible
conditions that may cause error (for device being ns local or not
registered) are always false in this case.
So, when this call fails, the device is unregistered. But this is
*not* the right thing to do, since after this the device may be
released (and kfree-ed) improperly. E. g. bridges require more
actions (sysfs update, timer disarming, etc.), some other devices
want to remove their private areas from lists, etc.
I. e. arbitrary use-after-free cases may occur.
The proposed fix is the following: since the only reason for the
dev_change_net_namespace to fail is the name generation, we may
give it a unique fall-back name w/o %d-s in it - the dev<ifindex>
one, since ifindexes are still unique.
So make this change, raise the failure-case printk loglevel to
EMERG and replace the unregister_netdevice call with BUG().
[ Use snprintf() -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 8 May 2008 08:16:04 +0000 (01:16 -0700)]
netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values
When conntrack and DCCP/SCTP protocols are enabled, chances are good
that people also want DCCP/SCTP conntrack and NAT support.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 8 May 2008 08:15:21 +0000 (01:15 -0700)]
netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request
Some Inovaphone PBXs exhibit very stange behaviour: when dialing for
example "123", the device sends INVITE requests for "1", "12" and
"123" back to back. The first requests will elicit error responses
from the receiver, causing the SIP helper to flush the RTP
expectations even though we might still see a positive response.
Note the sequence number of the last INVITE request that contained a
media description and only flush the expectations when receiving a
negative response for that sequence number.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Thu, 8 May 2008 08:13:31 +0000 (01:13 -0700)]
macvlan: Fix memleak on device removal/crash on module removal
As noticed by Ben Greear, macvlan crashes the kernel when unloading the
module. The reason is that it tries to clean up the macvlan_port pointer
on the macvlan device itself instead of the underlying device. A non-NULL
pointer is taken as indication that the macvlan_handle_frame_hook is
valid, when receiving the next packet on the underlying device it tries
to call the NULL hook and crashes.
Clean up the macvlan_port on the correct device to fix this.
Signed-off-by; Patrick McHardy <kaber@trash.net>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
J.H.M. Dassen (Ray) [Thu, 8 May 2008 08:11:04 +0000 (01:11 -0700)]
net/ipv4: correct RFC 1122 section reference in comment
RFC 1122 does not have a section 3.1.2.2. The requirement to silently
discard datagrams with a bad checksum is in section 3.2.1.2 instead.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611
Signed-off-by: J.H.M. Dassen (Ray) <jdassen@debian.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 8 May 2008 08:09:11 +0000 (01:09 -0700)]
tcp FRTO: SACK variant is errorneously used with NewReno
Note: there's actually another bug in FRTO's SACK variant, which
is the causing failure in NewReno case because of the error
that's fixed here. I'll fix the SACK case separately (it's
a separate bug really, though related, but in order to fix that
I need to audit tp->snd_nxt usage a bit).
There were two places where SACK variant of FRTO is getting
incorrectly used even if SACK wasn't negotiated by the TCP flow.
This leads to incorrect setting of frto_highmark with NewReno
if a previous recovery was interrupted by another RTO.
An eventual fallback to conventional recovery then incorrectly
considers one or couple of segments as forward transmissions
though they weren't, which then are not LOST marked during
fallback making them "non-retransmittable" until the next RTO.
In a bad case, those segments are really lost and are the only
one left in the window. Thus TCP needs another RTO to continue.
The next FRTO, however, could again repeat the same events
making the progress of the TCP flow extremely slow.
In order for these events to occur at all, FRTO must occur
again in FRTOs step 3 while the key segments must be lost as
well, which is not too likely in practice. It seems to most
frequently with some small devices such as network printers
that *seem* to accept TCP segments only in-order. In cases
were key segments weren't lost, things get automatically
resolved because those wrongly marked segments don't need to be
retransmitted in order to continue.
I found a reproducer after digging up relevant reports (few
reports in total, none at netdev or lkml I know of), some
cases seemed to indicate middlebox issues which seems now
to be a false assumption some people had made. Bugzilla
#10063 _might_ be related. Damon L. Chesser <damon@damtek.com>
had a reproducable case and was kind enough to tcpdump it
for me. With the tcpdump log it was quite trivial to figure
out.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 May 2008 01:54:05 +0000 (18:54 -0700)]
sparc: Fix SA_ONSTACK signal handling.
We need to be more liberal about the alignment of the buffer given to
us by sigaltstack(). The user should not need to be mindful of all of
the alignment constraints we have for the stack frame.
This mirrors how we handle this situation in clone() as well.
Also, we align the stack even in non-SA_ONSTACK cases so that signals
due to bad stack alignment can be delivered properly. This makes such
errors easier to debug and recover from.
Finally, add the sanity check x86 has to make sure we won't overflow
the signal stack.
This fixes glibc testcases nptl/tst-cancel20.c and
nptl/tst-cancelx20.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 8 May 2008 00:04:49 +0000 (17:04 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Fix fork/clone/vfork system call restart.
sparc: Fix mmap VA span checking.
David S. Miller [Wed, 7 May 2008 23:21:28 +0000 (16:21 -0700)]
sparc: Fix fork/clone/vfork system call restart.
We clobber %i1 as well as %i0 for these system calls,
because they give two return values.
Therefore, on error, we have to restore %i1 properly
or else the restart explodes since it uses the wrong
arguments.
This fixes glibc's nptl/tst-eintr1.c testcase.
Signed-off-by: David S. Miller <davem@davemloft.net>
Auke Kok [Wed, 7 May 2008 20:42:33 +0000 (13:42 -0700)]
[MAINTAINERS] New maintainer for Intel ethernet adapters
I'm handing over maintainership to Jeff Kirsher and moving on
to other Linux/Open Source work within Intel. Good luck to Jeff ;)
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan Roscher [Wed, 7 May 2008 18:35:06 +0000 (11:35 -0700)]
IB/ehca: Wait for async events to finish before destroying QP
This is necessary because, in a multicore environment, a race between
uverbs async handler and destroy QP could occur.
Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
John Gregor [Wed, 7 May 2008 18:01:10 +0000 (11:01 -0700)]
IB/ipath: Fix SDMA error recovery in absence of link status change
What's fixed:
in ipath_cancel_sends()
We need to unconditionally set ABORTING. So, swap the tests
so the set_bit() isn't shadowed by the &&.
If we've disarmed the piobufs, then we need to unconditionally
set DISARMED. So, move it out from the overly protective if
at the bottom.
in sdma_abort_task()
Abort_task was written knowing that the SDMA engine would always
be reset (and restarted) on error. A recent change broke that
fundamental assumption by taking the restart portion and making
it conditional on a link status change. But, SDMA can go boom
without a link status change in some conditions.
Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dave Olson [Wed, 7 May 2008 18:00:15 +0000 (11:00 -0700)]
IB/ipath: Need to always request and handle PIO avail interrupts
Now that we always use PIO for vl15 on 7220, we could get stuck forever
if we happened to run out of PIO buffers from the verbs code, because
the setup code wouldn't run; the interrupt was also ignored if SDMA was
supported. We also have to reduce the pio update threshold if we have
fewer kernel buffers than the existing threshold.
Clean up the initialization a bit to get ordering safer and more
sensible, and use the existing ipath_chg_kernavail call to do init,
rather than doing it separately.
Drop unnecessary clearing of pio buffer on pio parity error.
Drop incorrect updating of pioavailshadow when exitting freeze mode
(software state may not match chip state if buffer has been allocated
and not yet written).
If we couldn't get a kernel buffer for a while, make sure we are
in sync with hardware, mainly to handle the exitting freeze case.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Michael Albaugh [Wed, 7 May 2008 17:59:23 +0000 (10:59 -0700)]
IB/ipath: Fix count of packets received by kernel
The loop in ipath_kreceive() that processes packets increments the
loop-index 'i' once too often, because the exit condition does not
depend on it, and is checked after the increment. By adding a check for
!last to the iterator in the for loop, we correct that in a way that is
not so likely to be re-broken by changes in the loop body.
Signed-off-by: Michael Albaugh <micheal.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Wed, 7 May 2008 17:58:50 +0000 (10:58 -0700)]
IB/ipath: Return the correct opcode for RDMA WRITE with immediate
This patch fixes a bug in the RC responder which generates a completion
entry with the wrong opcode when an RDMA WRITE with immediate is received.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Dave Olson [Wed, 7 May 2008 17:57:48 +0000 (10:57 -0700)]
IB/ipath: Fix bug that can leave sends disabled after freeze recovery
The semantics of cancel_sends changed, but the code using it was missed.
Don't leave sends and pioavail updates disabled, and add a comment as to
why the force update is needed.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ralph Campbell [Wed, 7 May 2008 17:57:14 +0000 (10:57 -0700)]
IB/ipath: Only increment SSN if WQE is put on send queue
If a send work request has immediate errors and is not put on the
send queue, we shouldn't update any of the QP state.
The increment of the SSN wasn't obeying this.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Michael Albaugh [Wed, 7 May 2008 17:56:47 +0000 (10:56 -0700)]
IB/ipath: Only warn about prototype chip during init
We warn about prototype chips, but the function that checks for
support is also called as a result of a get_portinfo request, which
can clutter the logs.
Restrict warning to only appear during initialization.
Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Randy Dunlap [Wed, 30 Apr 2008 07:08:54 +0000 (09:08 +0200)]
docbook: fix bio missing parameter
Fix fs/bio.c kernel-doc parameter warning:
Warning(linux-2.6.25-git14//fs/bio.c:972): No description found for parameter 'reading'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 7 May 2008 11:26:27 +0000 (13:26 +0200)]
block: use unitialized_var() in bio_alloc_bioset()
Better than setting idx to some random value and it silences the
same bogus gcc warning.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Stas Sergeev [Wed, 7 May 2008 10:39:56 +0000 (12:39 +0200)]
pcspkr: fix dependancies
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.
Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
David S. Miller [Wed, 7 May 2008 09:24:28 +0000 (02:24 -0700)]
sparc: Fix mmap VA span checking.
We should not conditionalize VA range checks on MAP_FIXED.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Wed, 7 May 2008 08:15:46 +0000 (10:15 +0200)]
block: avoid duplicate calls to get_part() in disk stat code
get_part() is fairly expensive, as it O(N) loops over partitions
to find the right one. In lots of normal IO paths we end up looking
up the partition twice, to make matters even worse. Change the
stat add code to accept a passed in partition instead.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 7 May 2008 07:51:23 +0000 (09:51 +0200)]
cfq-iosched: make io priorities inherit CPU scheduling class as well as nice
We currently set all processes to the best-effort scheduling class,
regardless of what CPU scheduling class they belong to. Improve that
so that we correctly track idle and rt scheduling classes as well.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jan Kara [Tue, 6 May 2008 16:26:17 +0000 (18:26 +0200)]
udf: Fix memory corruption when fs mounted with noadinicb option
When UDF filesystem is mounted with noadinicb mount option, it
happens that we extend an empty directory with a block. A code in
udf_add_entry() didn't count with this possibility and used
uninitialized data leading to memory and filesystem corruption.
Add a check whether file already has some extents before operating
on them.
Signed-off-by: Jan Kara <jack@suse.cz>
Rasmus Rohde [Wed, 30 Apr 2008 15:22:06 +0000 (17:22 +0200)]
udf: Make udf exportable
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Rasmus Rohde <rohde@duff.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
Jens Axboe [Wed, 7 May 2008 07:48:17 +0000 (09:48 +0200)]
block: optimize generic_unplug_device()
Original patch from Mikulas Patocka <mpatocka@redhat.com>
Mike Anderson was doing an OLTP benchmark on a computer with 48 physical
disks mapped to one logical device via device mapper.
He found that there was a slowdown on request_queue->lock in function
generic_unplug_device. The slowdown is caused by the fact that when some
code calls unplug on the device mapper, device mapper calls unplug on all
physical disks. These unplug calls take the lock, find that the queue is
already unplugged, release the lock and exit.
With the below patch, performance of the benchmark was increased by 18%
(the whole OLTP application, not just block layer microbenchmarks).
So I'm submitting this patch for upstream. I think the patch is correct,
because when more threads call simultaneously plug and unplug, it is
unspecified, if the queue is or isn't plugged (so the patch can't make
this worse). And the caller that plugged the queue should unplug it
anyway. (if it doesn't, there's 3ms timeout).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 7 May 2008 07:33:55 +0000 (09:33 +0200)]
block: get rid of likely/unlikely predictions in merge logic
They tend to depend a lot on the workload, so not a clear-cut
likely or unlikely fit.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Miklos Szeredi [Wed, 7 May 2008 07:22:39 +0000 (09:22 +0200)]
vfs: splice remove_suid() cleanup
generic_file_splice_write() duplicates remove_suid() just because it
doesn't hold i_mutex. But it grabs i_mutex inside splice_from_pipe()
anyway, so this is rather pointless.
Move locking to generic_file_splice_write() and call remove_suid() and
__splice_from_pipe() instead.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 7 May 2008 07:17:12 +0000 (09:17 +0200)]
cfq-iosched: fix RCU race in the cfq io_context destructor handling
put_io_context() drops the RCU read lock before calling into cfq_dtor(),
however we need to hold off freeing there before grabbing and
dereferencing the first object on the list.
So extend the rcu_read_lock() scope to cover the calling of cfq_dtor(),
and optimize cfq_free_io_context() to use a new variant for
call_for_each_cic() that assumes the RCU read lock is already held.
Hit in the wild by Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 7 May 2008 07:27:43 +0000 (09:27 +0200)]
block: adjust tagging function queue bit locking
For most initialization purposes, calling blk_queue_init_tags() without
the queue lock held is OK. Only if called for resizing an existing map
must the lock be held. Ditto for tag cleanup, the maps are reference
counted.
So switch the general queue flag setting to the unlocked variant, but
retain the locked variant for resizing.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Martin Schwidefsky [Wed, 7 May 2008 07:22:59 +0000 (09:22 +0200)]
[S390] guest page hinting light
Use the existing arch_alloc_page/arch_free_page callbacks to do
the guest page state transitions between stable and unused.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Wed, 7 May 2008 07:22:58 +0000 (09:22 +0200)]
[S390] tty3270: fix put_char fail/success conversion.
The wrong function got coverted ;)
CC drivers/s390/char/tty3270.o
drivers/s390/char/tty3270.c:1747:
warning: initialization from incompatible pointer type
Acked-by: Alan Cox <alan@redhat.com>
Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Roland McGrath [Wed, 7 May 2008 07:22:57 +0000 (09:22 +0200)]
[S390] compat ptrace cleanup
This removes redundant arch code for generic ptrace requests
already handled by ptrace_request and compat_ptrace_request.
It simplifies things to just have the standard entry points,
and use the generic compat_sys_ptrace.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Wed, 7 May 2008 07:22:56 +0000 (09:22 +0200)]
[S390] s390mach compile warning
Fix the following compile warning:
drivers/s390/s390mach.c: In function 's390_collect_crw_info':
drivers/s390/s390mach.c:77: warning: ignoring return value of 'down_interruptibl
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Ernst [Wed, 7 May 2008 07:22:55 +0000 (09:22 +0200)]
[S390] cio: Fix parsing mechanism for blacklisted devices.
New format cssid.ssid.devno is now parsed correctly.
Signed-off-by: Michael Ernst <mernst@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Ernst [Wed, 7 May 2008 07:22:54 +0000 (09:22 +0200)]
[S390] cio: Remove cio_msg kernel parameter.
The only sporadically used CIO_DEBUG messages are replaced by ordinary
CIO_MSG_EVENT messages. The CIO_MSG_EVENT messages debug levels are
consolidated.
Signed-off-by: Michael Ernst <mernst@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christian Borntraeger [Wed, 7 May 2008 07:22:53 +0000 (09:22 +0200)]
[S390] s390-kvm: leave sie context on work. Removes preemption requirement
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a bug with cpu bound guest on kvm-s390. Sometimes it
was impossible to deliver a signal to a spinning guest. We used
preemption as a circumvention. The preemption notifiers called
vcpu_load, which checked for pending signals and triggered a host
intercept. But even with preemption, a sigkill was not delivered
immediately.
This patch changes the low level host interrupt handler to check for the
SIE instruction, if TIF_WORK is set. In that case we change the
instruction pointer of the return PSW to rerun the vcpu_run loop. The kvm
code sees an intercept reason 0 if that happens. This patch adds accounting
for these types of intercept as well.
The advantages:
- works with and without preemption
- signals are delivered immediately
- much better host latencies without preemption
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Wed, 7 May 2008 07:22:52 +0000 (09:22 +0200)]
[S390] s390: Optimize user and work TIF check
On return from syscall or interrupt, we have to check if we return to
userspace (likely) and if there is work todo (less likely) to decide
if we handle the work. We can optimize this check: we first check for
the less likely work case and then check for userspace.
This patch is also a preparation for an additional patch, that fixes a bug
in KVM dealing with cpu bound guests.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jens Axboe [Wed, 7 May 2008 07:09:39 +0000 (09:09 +0200)]
block: sysfs store function needs to grab queue_lock and use queue_flag_*()
Concurrency isn't a big deal here since we have requests in flight
at this point, but do the locked variant to set a better example.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Linus Torvalds [Wed, 7 May 2008 01:18:43 +0000 (18:18 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Fix initrd regression.
usb: Sparc build fix, make USB_ISP1760_OF depend on PPC_OF
sparc64: remove online_page()
sparc64: use compat_sys_utimes instead of home-grown local copy.
sbus: Fix bpp driver build.
sparc video: make blank use proper constant
Revert "[SPARC64]: Wrap SMP IPIs with irq_enter()/irq_exit()."
sparc: tcx.c remove unnecessary function
Linus Torvalds [Wed, 7 May 2008 00:09:27 +0000 (17:09 -0700)]
Revert "uml: fix gcc problem"
This reverts commit
22eecde2f9034764a3fd095eecfa3adfb8ec9a98. Uli
reports that it breaks UML on x86-64 with the Fedora 8 gcc (gcc 4.1.2),
causing a crash on startup. See
http://marc.info/?l=linux-kernel&m=
121011722806093&w=2
for a trace.
Reported-by: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland Dreier [Tue, 6 May 2008 22:56:22 +0000 (15:56 -0700)]
RDMA/cxgb3: Fix severe limit on userspace memory registration size
Currently, iw_cxgb3 is severely limited on the amount of userspace
memory that can be registered in in a single memory region, which
causes big problems for applications that expect to be able to
register 100s of MB.
The problem is that the driver uses a single kmalloc()ed buffer to
hold the physical buffer list (PBL) for the entire memory region
during registration, which means that 8 bytes of contiguous memory are
required for each page of memory being registered. For example, a 64
MB registration will require 128 KB of contiguous memory with 4 KB
pages, and it unlikely that such an allocation will succeed on a busy
system.
This is purely a driver problem: the temporary page list buffer is not
needed by the hardware, so we can fix this by writing the PBL to the
hardware in page-sized chunks rather than all at once. We do this by
splitting the memory registration operation up into several steps:
- Allocate PBL space in adapter memory for the full registration
- Copy PBL to adapter memory in chunks
- Allocate STag and enable memory region
This also allows several other cleanups to the __cxio_tpt_op()
interface and related parts of the driver.
This change leaves the reregister memory region and memory window
operations broken, but they already didn't work due to other
longstanding bugs, so fixing them will be left to a later patch.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
David S. Miller [Tue, 6 May 2008 22:19:54 +0000 (15:19 -0700)]
sparc64: Fix initrd regression.
We die because we forget to convert initrd_start and
initrd_end to virtual addresses.
Reported by Mikael Pettersson
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 6 May 2008 22:15:12 +0000 (15:15 -0700)]
usb: Sparc build fix, make USB_ISP1760_OF depend on PPC_OF
Sparc doesn't have some of the OF interfaces this driver
wants to use.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roland Dreier [Tue, 6 May 2008 22:03:38 +0000 (15:03 -0700)]
RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks
Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
chunks. This limits the largest single allocation that can be done to
the same size, which means that with 4 KB pages, each of which takes 8
bytes of PBL memory, the largest memory region that can be allocated
is 1 GB (256K PBL entries * 4 KB/entry).
Remove this limit by adding all the PBL memory in a single gen_pool
chunk, if possible. Add code that falls back to smaller chunks if
gen_pool_add() fails, which can happen if there is not sufficient
contiguous lowmem for the internal gen_pool bitmap.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
OGAWA Hirofumi [Tue, 6 May 2008 19:02:53 +0000 (04:02 +0900)]
Fix bogus warning in sysdev_driver_register()
if ((drv->entry.next != drv->entry.prev) ||
(drv->entry.next != NULL)) {
warns list_empty(&drv->entry).
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Greg KH <gregkh@suse.de>
Cc: Len Brown <lenb@kernel.org>
[ Version 2 totally redone based on suggestions from Linus & Greg ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 6 May 2008 20:13:37 +0000 (13:13 -0700)]
VFS: fix unused variable warning
Commit
33dcdac2df54e66c447ae03f58c95c7251aa5649 ("kill ->put_inode")
removed the final use of i_op->put_inode, but left the now totally
unused "op" variable in iput().
Get rid of it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 6 May 2008 19:49:23 +0000 (20:49 +0100)]
x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.
That came from
9fc34113f6880b215cbea4e7017fc818700384c2 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.
And revert that
cded932b75ab0a5f9181ee3da34a0a488d1a14fd x86: fix pmd_bad
and pud_bad to support huge pages. It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.
Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.
Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.
However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 6 May 2008 18:39:57 +0000 (11:39 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] fix SMP ordering hole in fcntl_setlk()
[PATCH] kill ->put_inode
[PATCH] fix reservation discarding in affs
Al Viro [Tue, 6 May 2008 17:58:34 +0000 (13:58 -0400)]
[PATCH] fix SMP ordering hole in fcntl_setlk()
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.
As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).
Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Tue, 29 Apr 2008 15:46:26 +0000 (17:46 +0200)]
[PATCH] kill ->put_inode
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.
(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)
Also remove a very misleading comment above the defintion of
struct super_operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Roman Zippel [Tue, 29 Apr 2008 15:02:20 +0000 (17:02 +0200)]
[PATCH] fix reservation discarding in affs
- remove affs_put_inode, so preallocations aren't discared unnecessarily
often.
- remove affs_drop_inode, it's called with a spinlock held, so it can't
use a mutex.
- make i_opencnt atomic
- avoid direct b_count manipulations
- a few allocation failure fixes, so that these are more gracefully
handled now.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kok, Auke [Tue, 29 Apr 2008 18:18:55 +0000 (11:18 -0700)]
e1000e: don't return half-read eeprom on error
On a read error, e1000e might have returned uninitialized block of
eeprom data back to userspace. The convention is that 0xff is "empty",
so mark the entire eeprom as empty in case of an error.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Joakim Tjernlund [Tue, 29 Apr 2008 11:03:57 +0000 (13:03 +0200)]
ucc_geth: Don't use RX clock as TX clock.
Commit
9fb1e350e16164d56990dde036ae9c0a2fd3f634,
ucc_geth: use rx-clock-name and tx-clock-name device tree properties
Introduced a typo that made the driver use the RX clock
as TX clock, causing massive TX errors.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Alan Cox [Tue, 29 Apr 2008 13:29:30 +0000 (14:29 +0100)]
cxgb3: Use CAP_SYS_RAWIO for firmware
Otherwise theoretically at least
CAP_NET_ADMIN
Reload new firmware
Wait..
Firmware patches kernel
So it should be CAY_SYS_RAWIO - not that I suspect this is in fact a
credible attack vector!
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Don Fry [Tue, 29 Apr 2008 20:49:58 +0000 (13:49 -0700)]
pcnet32: delete non NAPI code from driver.
Delete the non-napi code from the driver and Kconfig.
Tested x86_64. Apply at next open opportunity.
Signed-off-by: Don Fry <pcnet32@verizon.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Scott Wood [Fri, 2 May 2008 18:42:41 +0000 (13:42 -0500)]
fs_enet: Fix a memory leak in fs_enet_mdio_probe
There are more memory leaks in the !PPC_CPM_NEW_BINDING case, but that code
will disappear soon along with arch/ppc.
Reported by Daniel Marjamki <danielm77@spray.se> at
http://bugzilla.kernel.org/show_bug.cgi?id=10591
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>