Linus Torvalds [Fri, 11 Sep 2009 16:17:05 +0000 (09:17 -0700)]
Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
writeback: check for registered bdi in flusher add and inode dirty
writeback: add name to backing_dev_info
writeback: add some debug inode list counters to bdi stats
writeback: get rid of pdflush completely
writeback: switch to per-bdi threads for flushing data
writeback: move dirty inodes from super_block to backing_dev_info
writeback: get rid of generic_sync_sb_inodes() export
Linus Torvalds [Fri, 11 Sep 2009 16:16:39 +0000 (09:16 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (54 commits)
[S390] tape: Use pr_xxx instead of dev_xxx in shared driver code
[S390] Wire up page fault events for software perf counters.
[S390] Remove smp_cpu_not_running.
[S390] Get rid of cpuid.h header file.
[S390] Limit cpu detection to 256 physical cpus.
[S390] tape: Fix device online messages
[S390] Enable guest page hinting by default.
[S390] use generic scatterlist.h
[S390] s390dbf: Add description for usage of "%s" in sprintf events
[S390] Initialize __LC_THREAD_INFO early.
[S390] fix recursive locking on page_table_lock
[S390] kvm: use console_initcall() to initialize s390 virtio console
[S390] tape: reversed order of labels
[S390] hypfs: Use "%u" instead of "%d" for unsigned ints in snprintf
[S390] kernel: Print an error message if kernel NSS cannot be defined
[S390] zcrypt: Free ap_device if dev_set_name fails.
[S390] zcrypt: Use spin_lock_bh in suspend callback
[S390] xpram: Remove checksum validation for suspend/resume
[S390] vmur: Invalid allocation sequence for vmur class
[S390] hypfs: remove useless variable qname
...
Linus Torvalds [Fri, 11 Sep 2009 16:16:22 +0000 (09:16 -0700)]
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Improve the "Early log buffer exceeded" error message
kmemleak: fix sparse warning for static declarations
kmemleak: fix sparse warning over overshadowed flags
kmemleak: move common painting code together
kmemleak: add clear command support
kmemleak: use bool for true/false questions
kmemleak: Do no create the clean-up thread during kmemleak_disable()
kmemleak: Scan all thread stacks
kmemleak: Don't scan uninitialized memory when kmemcheck is enabled
kmemleak: Ignore the aperture memory hole on x86_64
kmemleak: Printing of the objects hex dump
kmemleak: Do not report alloc_bootmem blocks as leaks
kmemleak: Save the stack trace for early allocations
kmemleak: Mark the early log buffer as __initdata
kmemleak: Dump object information on request
kmemleak: Allow rescheduling during an object scanning
Linus Torvalds [Fri, 11 Sep 2009 15:58:32 +0000 (08:58 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
RDMA/iwcm: Reject the connection when the cm_id is destroyed
RDMA/cxgb3: Clean up properly on FW mismatch failures
RDMA/cxgb3: Don't ignore insert_handle() failures
MAINTAINERS: InfiniBand/RDMA mailing list transition to vger
IB/mad: Allow tuning of QP0 and QP1 sizes
IB/mad: Fix possible lock-lock-timer deadlock
RDMA/nes: Map MTU to IB_MTU_* and correctly report link state
RDMA/nes: Rework the disconn routine for terminate and flushing
RDMA/nes: Use the flush code to fill in cqe error
RDMA/nes: Make poll_cq return correct number of wqes during flush
RDMA/nes: Use flush mechanism to set status for wqe in error
RDMA/nes: Implement Terminate Packet
RDMA/nes: Add CQ error handling
RDMA/nes: Clean out CQ completions when QP is destroyed
RDMA/nes: Change memory allocation for cqp request to GFP_ATOMIC
RDMA/nes: Allocate work item for disconnect event handling
RDMA/nes: Update refcnt during disconnect
IB/mthca: Don't allow userspace open while recovering from catastrophic error
IB/mthca: Distinguish multiple devices in /proc/interrupts
IB/mthca: Annotate CQ locking
...
Linus Torvalds [Fri, 11 Sep 2009 15:55:49 +0000 (08:55 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits)
binfmt_elf: fix PT_INTERP bss handling
TPM: Fixup boot probe timeout for tpm_tis driver
sysfs: Add labeling support for sysfs
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.
VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
KEYS: Add missing linux/tracehook.h #inclusions
KEYS: Fix default security_session_to_parent()
Security/SELinux: includecheck fix kernel/sysctl.c
KEYS: security_cred_alloc_blank() should return int under all circumstances
IMA: open new file for read
KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]
KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6]
KEYS: Do some whitespace cleanups [try #6]
KEYS: Make /proc/keys use keyid not numread as file position [try #6]
KEYS: Add garbage collection for dead, revoked and expired keys. [try #6]
KEYS: Flag dead keys to induce EKEYREVOKED [try #6]
KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6]
KEYS: Deal with dead-type keys appropriately [try #6]
CRED: Add some configurable debugging [try #6]
selinux: Support for the new TUN LSM hooks
...
Catalin Marinas [Fri, 11 Sep 2009 09:42:09 +0000 (10:42 +0100)]
kmemleak: Improve the "Early log buffer exceeded" error message
Based on a suggestion from Jaswinder, clarify what the user would need
to do to avoid this error message from kmemleak.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Michael Holzheu [Fri, 11 Sep 2009 08:29:07 +0000 (10:29 +0200)]
[S390] tape: Use pr_xxx instead of dev_xxx in shared driver code
For messages from the tape core that is shared between the 3590 and 34xx
tape disciplines, we want to have the "tape" prefix instead of "tape_3590"
or "tape_34xx". In order to fix this, we now use the pr_xxx printk macros.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:06 +0000 (10:29 +0200)]
[S390] Wire up page fault events for software perf counters.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:05 +0000 (10:29 +0200)]
[S390] Remove smp_cpu_not_running.
smp_cpu_not_running() and cpu_stopped() are doing the same.
Remove one and also get rid of the last hard_smp_processor_id() leftover.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:04 +0000 (10:29 +0200)]
[S390] Get rid of cpuid.h header file.
Merge cpuid.h header file into cpu.h.
While at it convert from typedef to struct declaration and also
convert cio code to use proper lowcore structure instead of casts.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:03 +0000 (10:29 +0200)]
[S390] Limit cpu detection to 256 physical cpus.
Saves us more than 65k pointless IPIs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Fri, 11 Sep 2009 08:29:02 +0000 (10:29 +0200)]
[S390] tape: Fix device online messages
Currently, when a tape device is set online and no cartridge is loaded, we
get the messages "The tape cartridge has been successfully unloaded" and
"Determining the size of the recorded area". These messages are not correct.
To fix this, we now print the "cartridge loaded/unloaded" messages only,
when the load/unload event really occurs. In addition to that, the message
"Determining the size of the recorded area" is only printed, if a cartridge
is loaded.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:01 +0000 (10:29 +0200)]
[S390] Enable guest page hinting by default.
Get rid of the PAGE_STATES config option and enable guest page hinting
by default.
It can be disabled by specifying "cmma=off" at the command line.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:29:00 +0000 (10:29 +0200)]
[S390] use generic scatterlist.h
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Fri, 11 Sep 2009 08:28:59 +0000 (10:28 +0200)]
[S390] s390dbf: Add description for usage of "%s" in sprintf events
Using "%s" in sprintf event functions is dangerous. This patch adds a short
description for this issue to the s390 debug feature documentation.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:58 +0000 (10:28 +0200)]
[S390] Initialize __LC_THREAD_INFO early.
"lockdep: Fix backtraces" reveales a bug in early setup code: when
lockdep tries to save a stack backtrace before setup_arch has been
called the lowcore pointer for the current thread info pointer isn't
initialized yet.
However our save stack backtrace code relies on it. If the pointer
isn't initialized the saved backtrace will have zero entries.
lockdep however relies (correctly) on the fact that that cannot
happen.
A write access to some random memory region is the result.
Fix this by initializing the thread info pointer early.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 11 Sep 2009 08:28:57 +0000 (10:28 +0200)]
[S390] fix recursive locking on page_table_lock
Suzuki Poulose reported the following recursive locking bug on s390:
Here is the stack trace : (see Appendix I for more info)
[<
0000000000406ed6>] _spin_lock+0x52/0x94
[<
0000000000103bde>] crst_table_free+0x14e/0x1a4
[<
00000000001ba684>] __pmd_alloc+0x114/0x1ec
[<
00000000001be8d0>] handle_mm_fault+0x2cc/0xb80
[<
0000000000407d62>] do_dat_exception+0x2b6/0x3a0
[<
0000000000114f8c>] sysc_return+0x0/0x8
[<
00000200001642b2>] 0x200001642b2
The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and
it tries to populate the pud/pgd with a new pmd allocated. If another
thread populates it before we get a chance, we free the pmd using
pmd_free().
On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(),
which acquires the page_table_lock to protect the crst_table index updates.
Hence this ends up in a recursive locking of the page_table_lock.
The solution suggested by Dave Hansen is to use a new spin lock in the mmu
context to protect the access to the crst_list and the pgtable_list.
Reported-by: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hendrik Brueckner [Fri, 11 Sep 2009 08:28:56 +0000 (10:28 +0200)]
[S390] kvm: use console_initcall() to initialize s390 virtio console
Use a console_initcall() to initialize the s390 virtio console and
clean up s390 console initialization in setup.c.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Roel Kluin [Fri, 11 Sep 2009 08:28:55 +0000 (10:28 +0200)]
[S390] tape: reversed order of labels
Fix the order of goto labels in tape_generic_online.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Michael Holzheu [Fri, 11 Sep 2009 08:28:54 +0000 (10:28 +0200)]
[S390] hypfs: Use "%u" instead of "%d" for unsigned ints in snprintf
For printing unsigned integers hypfs uses "%d" in snprintf(). This is wrong.
With this patch "%u" is used instead.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hendrik Brueckner [Fri, 11 Sep 2009 08:28:53 +0000 (10:28 +0200)]
[S390] kernel: Print an error message if kernel NSS cannot be defined
If a named saved system (NSS) cannot be defined or saved, print out an
error message with the return code of the underlying z/VM CP command.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Felix Beck [Fri, 11 Sep 2009 08:28:52 +0000 (10:28 +0200)]
[S390] zcrypt: Free ap_device if dev_set_name fails.
If dev_set_name fails during scanning the AP bus, the reserved memory
has to be freed.
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Felix Beck [Fri, 11 Sep 2009 08:28:51 +0000 (10:28 +0200)]
[S390] zcrypt: Use spin_lock_bh in suspend callback
Fix lock dependency warning.
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
bash/1442 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&ap_dev->lock){+.?...}, at: [<
000003e001280404>] __ap_poll_device+0x40/0x3e8 [ap]
{IN-SOFTIRQ-W} state was registered at:
[<
000000000017f094>] __lock_acquire+0xb78/0x182c
[<
000000000017fe8e>] lock_acquire+0x146/0x178
[<
0000000000549cf2>] _spin_lock+0x5a/0x98
[<
000003e001280404>] __ap_poll_device+0x40/0x3e8 [ap]
[<
000003e001280afe>] ap_poll_all+0xaa/0x1a4 [ap]
[<
000000000014fa82>] tasklet_action+0xfe/0x1f4
[<
0000000000150a56>] __do_softirq+0x116/0x284
[<
0000000000111058>] do_softirq+0xe4/0xe8
[<
00000000001504ba>] irq_exit+0xba/0xd8
[<
00000000003dd04a>] do_IRQ+0x176/0x1fc
[<
000000000011823c>] io_return+0x0/0x8
[<
0000004bfbfd2c0e>] 0x4bfbfd2c0e
Signed-off-by: Felix Beck <felix.beck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Fri, 11 Sep 2009 08:28:50 +0000 (10:28 +0200)]
[S390] xpram: Remove checksum validation for suspend/resume
Currently in the suspend process checksums for the XPRAM partitions are
created and stored. During the resume process it is checked,
if the checksums are still the same. If this is not the case, a kernel panic
is triggered. Unfortunately this prevents XPRAM from beeing used as suspend
device, because in this case after the checksum has been created, the
memory image is written to XPRAM and therefore the contents of the suspend
partition is changed. In order to allow XPRAM to be used as suspend device,
this patch removes the checksum validation.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Fri, 11 Sep 2009 08:28:49 +0000 (10:28 +0200)]
[S390] vmur: Invalid allocation sequence for vmur class
The vmur class is allocated after the CCW driver is registered
and it is destroyed before the CCW driver is unregistered.
This is not the correct sequence, because the vmur class can be used
via driver core callbacks that are triggered during the CCW driver
deregistration. For Example:
1. vmur device is online
2. vmur module is unloaded
This leads to the following function call stack:
<4> [<
0000000000387286>] device_destroy+0x36/0x5c
<4> [<
000003e000209714>] ur_set_offline_force+0x9c/0x10c [vmur]
<4> [<
000003e00020a928>] ur_remove+0x64/0xbc [vmur]
<4> [<
00000000003e4d2e>] ccw_device_remove+0x42/0x1ac
<4> [<
000000000038a1aa>] __device_release_driver+0x9a/0xe4
<4> [<
000000000038a2da>] driver_detach+0xe6/0xec
<4> [<
0000000000388ee4>] bus_remove_driver+0xc0/0x108
<4> [<
000003e00020ad5a>] ur_exit+0x52/0x84 [vmur]
In device_destroy() the vmur class is used. Since it is already freed,
this can lead to a kernel panic.
To fix the problem, the vmur class has to be allocated before the CCW
driver is registered and destroyed after the CCW driver has ben unregistered.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Vitaliy Gusev [Fri, 11 Sep 2009 08:28:48 +0000 (10:28 +0200)]
[S390] hypfs: remove useless variable qname
Local variable 'qname' in the function hypfs_create_file() really is not
used for any purpose.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Cc: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hans-Joachim Picht [Fri, 11 Sep 2009 08:28:47 +0000 (10:28 +0200)]
[S390] add call home support
Signed-off-by: Hans-Joachim Picht <hans@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christoph Hellwig [Fri, 11 Sep 2009 08:28:46 +0000 (10:28 +0200)]
[S390] remove unused irq_cpustat_t defintion
No need to defined a irq_cpustat_t type if __ARCH_IRQ_STAT is defined.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:45 +0000 (10:28 +0200)]
[S390] kernel: always keep machine flags in lowcore
Eleminate the local variable machine_flags and always change machine
flags directly in the lowcore.
This avoids confusion about when and why the two variables have to be
synchronized.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Nelson Elhage [Fri, 11 Sep 2009 08:28:44 +0000 (10:28 +0200)]
[S390] clean up linker script using new linker script macros.
Note that this patch moves .data.init_task inside _edata. In
addition, the alignment of .init.ramfs changes: It is now PAGE_ALIGNED
and __initramfs_end is arbitrarily aligned; Previously it was
only aligned to a 0x100-byte boundary, and always ended on an even
byte.
This change results in fewer output sections and in some data being
reordered, but should have no functional effect.
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tim Abbott [Fri, 11 Sep 2009 08:28:43 +0000 (10:28 +0200)]
[S390] Use macros for .data.page_aligned.
.data.page_aligned should not need a separate output section, so as
part of this cleanup I moved into the .data output section in the
linker scripts in order to eliminate unnecessary references to the
section name.
Remove the reference to .data.idt, since nothing is put into the
.data.idt section on the s390 architecture. It looks like Cyrill
Gorcunov posted a patch to remove the .data.idt code on s390
previously:
<http://lkml.indiana.edu/hypermail/linux/kernel/0802.2/2536.html>
CCing him and the people who acked that patch in case there's a reason
it wasn't applied.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hendrik Brueckner [Fri, 11 Sep 2009 08:28:42 +0000 (10:28 +0200)]
[S390] move (io|sysc)_restore_trace_psw into .data section
The sysc_restore_trace_psw and io_restore_trace_psw storage locations
are created in the .text section. When creating and IPLing from a named
saved system (NSS), writing to these locations causes a protection exception
(because the .text section is mapped as shared read-only in the NSS).
To permit write access, move the storage locations into the .data section.
The problem occurs only when CONFIG_TRACE_IRQFLAGS is set.
The git commmit that has introduced these variables is:
411788ea7fca01ee803af8225ac35807b4d02050
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hendrik Brueckner [Fri, 11 Sep 2009 08:28:41 +0000 (10:28 +0200)]
[S390] kernel: Convert upper case scpdata to lower case
If the CP SET LOADDEV on the 3215 console has been used to specify
SCPdata, all data is converted to upper case letters.
When scpdata contains upper case letters only, convert all letters
to lower case.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hendrik Brueckner [Fri, 11 Sep 2009 08:28:40 +0000 (10:28 +0200)]
[S390] kernel: Append scpdata to kernel boot command line
Append scpdata to the kernel boot command line. If scpdata starts
with the equal sign (=), the kernel boot command line is replaced.
(For consistency with zIPL and IPL PARM parameters.)
To use scpdata for the kernel boot command line, scpdata must consist
of ascii characters only. If scpdata contains other characters,
scpdata is not appended to the kernel boot command line.
In addition, re-IPL is extended for setting scpdata for the next
Linux reboot.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Frank Munzert [Fri, 11 Sep 2009 08:28:39 +0000 (10:28 +0200)]
[S390] tape: use init_timer_on_stack() rather than init_timer()
With CONFIG_DEBUG_OBJECTS_TIMERS=y "chccwdev --online" for a tape device
will fail with message "ODEBUG: object is on stack, but not annotated".
We now use init_timer_on_stack.
Signed-off-by: Frank Munzert <munzert@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:38 +0000 (10:28 +0200)]
[S390] proper use of device register
Don't use kfree directly after device registration started.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:37 +0000 (10:28 +0200)]
[S390] hibernation: merge files and move to kernel/
Merge the nearly empty C files and move everything from power/ to
kernel/. That way the files are easier to handle.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:36 +0000 (10:28 +0200)]
[S390] hibernation: remove dead file
There is no caller of do_after_copyback() anywhere. Remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:35 +0000 (10:28 +0200)]
[S390] atomic ops: small cleanups
Couple of coding style fixes, replace __inline__ with inline and
remove #ifdef __KERNEL_- since the header file isn't exported.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:34 +0000 (10:28 +0200)]
[S390] atomic ops: add effecient atomic64 support for 31 bit
Use compare double and swap to implement efficient atomic64 ops for 31 bit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 11 Sep 2009 08:28:33 +0000 (10:28 +0200)]
[S390] improve mcount code
Move the 64 bit mount code from mcount.S into mcount64.S and avoid
code duplication.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:32 +0000 (10:28 +0200)]
[S390] convert/optimize csum_fold() to C
In the meantime gcc generates better code than the old inline
assemblies do. Original inline assembly results in:
lr %r1,%r2
sr %r3,%r3
lr %r2,%r1
srdl %r2,16
alr %r2,%r3
alr %r1,%r2
srl %r1,16
xilf %r1,65535
llghr %r2,%r1
br %r14
Out of the C code gcc generates this:
rll %r1,%r2,16
ar %r1,%r2
srl %r1,16
xilf %r1,65535
llghr %r2,%r1
br %r14
In addition we don't have any static register allocations anymore and
gcc is free to shuffle instructions around for better pipeline usage.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:31 +0000 (10:28 +0200)]
[S390] introduce get_clock_monotonic
Introduce get_clock_monotonic() function which can be used to get a
(fast) timestamp. Resolution is the same as for get_clock(). The
only difference is that the timestamps are monotonic and don't jump
backward or forward.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Fri, 11 Sep 2009 08:28:30 +0000 (10:28 +0200)]
[S390] dasd: fix message naming
This patch fixes message naming so that generic dasd messages do not
contain the device discipline. For this purpose the dev_ makros are
replaced by pr_ makros for generic dasd messages.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Fri, 11 Sep 2009 08:28:29 +0000 (10:28 +0200)]
[S390] dasd: optimize cpu usage in goodcase
remove unnecessary dbf call, remove string operations for magic
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Weinhuber [Fri, 11 Sep 2009 08:28:28 +0000 (10:28 +0200)]
[S390] dasd: fail requests when device state is less then ready
A DASD device that is not ready or online has no defined disk layout,
so all requests that arrive in such a state need to be returned as
failed.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:27 +0000 (10:28 +0200)]
[S390] cio: remove ccw_device init_name
We used the init_name to set the console ccw_device's name early
at the boot stage. This patch moves the name setting (for all ccw
devices) to the point where we actually register the device. At this
time we can do dynamic allocations and therefore use dev_set_name.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:26 +0000 (10:28 +0200)]
[S390] cio: move final put_device to ccw_device_unregister
We use a test_and_clear_bit to prevent a device from being
unregistered twice. Unfortunately in this cases the "final"
put_device (from device_initialize) was issued more than once,
resulting in an use after free error. Fix this by moving this
put_device to ccw_device_unregister.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:25 +0000 (10:28 +0200)]
[S390] cio: remove subchannel init_name
We used the init_name to set the console subchannels name early
at the boot stage. With the patch cio: fix memleak in subchannel validation
we moved the name setting to the point where we actually register the
console subchannel. At this time we can do dynamic allocations and therefore
use dev_set_name.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:24 +0000 (10:28 +0200)]
[S390] cio: fix memleak in subchannel validation
When scanning for new subchannels we have a code path where we allocate
memory for a struct subchannel, set the device name (which is dynamically
allocated now) and do a check if the underlying device is blacklisted - if
so we free the subchannel structure.
Since we have not set up refcounting at this stage, the device name's memory
is lost. Fix this by moving the dev_set_name after the blacklist test.
Note: With this patch the init_name for the console subchannel becomes
virtually obsolete.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:23 +0000 (10:28 +0200)]
[S390] cio: fix use after free in s390 debug feature
When using s390dbf with "%s" in sprintf format strings the string itself
is not copied to the dbf buffer.
Since in this case only pointers are stored in the s390dbf, we should
not use dev_name - which is bound to the lifetime of the device.
Reading this entry from s390dbf after the device was released will cause
an use after free error.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jan Glauber [Fri, 11 Sep 2009 08:28:22 +0000 (10:28 +0200)]
[S390] qdio: remove limited number of debugfs entries
The number of qdio debugfs entries was limited. Remove this limit
and group the queue files in a per device directory.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Ernst [Fri, 11 Sep 2009 08:28:21 +0000 (10:28 +0200)]
[S390] cio: failing set online/offline processing.
When unit checks trigger sensing the device state is set to W4SENSE
until sense completion; then the device state is set back to
ONLINE. If a unit check occurs while set online or set offline
requests are processed then it might happen that the device's
temporary W4SENSE state causes these functions to terminate,
leaving the device in an inconsistent state when the state is set
back to ONLINE later on so that the device cannot be set online or
offline any longer.
To solve this, set online/offline and related rollback or error
routines are processed only if the device is in a final or
DISCONNECTED state.
Signed-off-by: Michael Ernst <mernst@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:20 +0000 (10:28 +0200)]
[S390] cio: ensure to hold a reference for deferred deregistration
Ensure to always hold an extra device reference for scheduling a
subchannel deregistration, by moving the get_device to
ccw_device_schedule_sch_unregister. This fixes an use after free
error in ccw_device_call_sch_unregister where put_device was called
on an already freed device structure.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jan Glauber [Fri, 11 Sep 2009 08:28:19 +0000 (10:28 +0200)]
[S390] qdio: continue polling if the queue is not finished
With commit
c38f96080955854e54df9cb392bc674e1ae330e1 polling was
stopped for the queue even if new data is available.
Return immediately after scheduling the queue tasklet if the queue
is not done.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:18 +0000 (10:28 +0200)]
[S390] cio: increase trace level
Move debug traces for start I/O and interrupt events to exclusive
trace levels. Also change tracing in hot-path from sprintf (costly)
to hex.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 11 Sep 2009 08:28:17 +0000 (10:28 +0200)]
[S390] cio: fix not oper handling after failed [on|off]line processing
If online/offline processing of a ccw device fails, resulting in not
operational state, notify the driver and unregister the device in case
the driver dosn't want to keep it.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Peter Oberparleiter [Fri, 11 Sep 2009 08:28:16 +0000 (10:28 +0200)]
[S390] cio: consolidate subchannel intparm reset
Ensure that the hardware interruption parameter for a subchannel is
reset when the associated subchannel data structure is freed.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Heiko Carstens [Fri, 11 Sep 2009 08:28:15 +0000 (10:28 +0200)]
[S390] cio: move scsw helper functions to header file
All scsw helper functions are very short and usage of them shouldn't
result in function calls. Therefore we move them to a separate header
file.
Also saves a lot of EXPORT_SYMBOLs.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Peter Oberparleiter [Fri, 11 Sep 2009 08:28:14 +0000 (10:28 +0200)]
[S390] cio: fix ineffective verify event
Path verification events occurring for offline devices are currently
ignored. As a result, offline devices are not removed, even though
they might no longer be accessible (for example because the last path
to the device was varied offline). Fix this by scheduling a status
evaluation for the affected subchannel when a path verification event
occurs.
Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Jens Axboe [Wed, 9 Sep 2009 07:10:25 +0000 (09:10 +0200)]
writeback: check for registered bdi in flusher add and inode dirty
Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Fri, 12 Jun 2009 12:45:52 +0000 (14:45 +0200)]
writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 25 May 2009 07:08:21 +0000 (09:08 +0200)]
writeback: add some debug inode list counters to bdi stats
Add some debug entries to be able to inspect the internal state of
the writeback details.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Mon, 18 May 2009 06:20:32 +0000 (08:20 +0200)]
writeback: get rid of pdflush completely
It is now unused, so kill it off.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 9 Sep 2009 07:08:54 +0000 (09:08 +0200)]
writeback: switch to per-bdi threads for flushing data
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45
where vanilla tends to fluctuate a lot in the creation phase:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54
A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.
A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 2 Sep 2009 07:19:46 +0000 (09:19 +0200)]
writeback: move dirty inodes from super_block to backing_dev_info
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Jens Axboe [Wed, 2 Sep 2009 10:34:32 +0000 (12:34 +0200)]
writeback: get rid of generic_sync_sb_inodes() export
This adds two new exported functions:
- writeback_inodes_sb(), which only attempts to writeback dirty inodes on
this super_block, for WB_SYNC_NONE writeout.
- sync_inodes_sb(), which writes out all dirty inodes on this super_block
and also waits for the IO to complete.
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Roland Dreier [Fri, 11 Sep 2009 04:19:45 +0000 (21:19 -0700)]
Merge branch 'mad' into for-linus
Conflicts:
drivers/infiniband/core/mad.c
Roland Dreier [Fri, 11 Sep 2009 04:18:07 +0000 (21:18 -0700)]
Merge branches 'cxgb3', 'ehca', 'ipath', 'ipoib', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus
James Morris [Thu, 10 Sep 2009 22:04:49 +0000 (08:04 +1000)]
Merge branch 'next' into for-linus
Geert Uytterhoeven [Thu, 10 Sep 2009 21:13:28 +0000 (23:13 +0200)]
md: Fix "strchr" [drivers/md/dm-log-userspace.ko] undefined!
Commit
b8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea ("dm log: remove incorrect
field from userspace table output") added a call to strstr() with a
single-character "needle" string parameter.
Unfortunately some versions of gcc replace such calls to strstr() by calls
to strchr() behind our back. This causes linking errors if strchr() is
defined as an inline function in <asm/string.h> (e.g. on m68k):
| WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined!
Avoid this by explicitly calling strchr() instead.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 9 Sep 2009 02:49:40 +0000 (19:49 -0700)]
binfmt_elf: fix PT_INTERP bss handling
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT.
Here is a small test case. (Yes, there are other, useful PT_INTERP
which have only .text and no .data/.bss.)
----- ptinterp.S
_start: .globl _start
nop
int3
-----
$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
$ ./hello
Segmentation fault # during execve() itself
After applying the patch:
$ ./hello
Trace trap # user-mode execution after execve() finishes
If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
suggested checking for PROT_WRITE in the bss logic. I think it makes
most sense to simply apply the bss logic only when there is bss.
This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.
Reported-by: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Jason Gunthorpe [Wed, 9 Sep 2009 23:22:18 +0000 (17:22 -0600)]
TPM: Fixup boot probe timeout for tpm_tis driver
When probing the device in tpm_tis_init the call request_locality
uses timeout_a, which wasn't being initalized until after
request_locality. This results in request_locality falsely timing
out if the chip is still starting. Move the initialization to before
request_locality.
This probably only matters for embedded cases (ie mine), a BIOS likely
gets the TPM into a state where this code path isn't necessary.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Thu, 10 Sep 2009 03:04:54 +0000 (20:04 -0700)]
Merge branch 'lookup-permissions-cleanup'
* lookup-permissions-cleanup:
jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
ext[234]: move over to 'check_acl' permission model
shmfs: use 'check_acl' instead of 'permission'
Make 'check_acl()' a first-class filesystem op
Simplify exec_permission_lite(), part 3
Simplify exec_permission_lite() further
Simplify exec_permission_lite() logic
Do not call 'ima_path_check()' for each path component
Roland McGrath [Wed, 9 Sep 2009 02:49:40 +0000 (19:49 -0700)]
binfmt_elf: fix PT_INTERP bss handling
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT.
Here is a small test case. (Yes, there are other, useful PT_INTERP
which have only .text and no .data/.bss.)
----- ptinterp.S
_start: .globl _start
nop
int3
-----
$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
$ ./hello
Segmentation fault # during execve() itself
After applying the patch:
$ ./hello
Trace trap # user-mode execution after execve() finishes
If the ELF headers are actually self-inconsistent, then dying is fine.
But having no PROT_WRITE segment is perfectly normal and correct if
there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser
suggested checking for PROT_WRITE in the bss logic. I think it makes
most sense to simply apply the bss logic only when there is bss.
This patch looks less trivial than it is due to some reindentation.
It just moves the "if (last_bss > elf_bss) {" test up to include the
partial-page bss logic as well as the more-pages bss logic.
Reported-by: John Reiser <jreiser@bitwagon.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David P. Quigley [Wed, 9 Sep 2009 18:25:37 +0000 (14:25 -0400)]
sysfs: Add labeling support for sysfs
This patch adds a setxattr handler to the file, directory, and symlink
inode_operations structures for sysfs. The patch uses hooks introduced in the
previous patch to handle the getting and setting of security information for
the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
sysfs_dirent structure has been replaced by a structure which contains the
iattr, secdata and secdata length to allow the changes to persist in the event
that the inode representing the sysfs_dirent is evicted. Because sysfs only
stores this information when a change is made all the optional data is moved
into one dynamically allocated field.
This patch addresses an issue where SELinux was denying virtd access to the PCI
configuration entries in sysfs. The lack of setxattr handlers for sysfs
required that a single label be assigned to all entries in sysfs. Granting virtd
access to every entry in sysfs is not an acceptable solution so fine grained
labeling of sysfs is required such that individual entries can be labeled
appropriately.
[sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.]
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
David P. Quigley [Thu, 3 Sep 2009 18:25:57 +0000 (14:25 -0400)]
LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.
This patch introduces three new hooks. The inode_getsecctx hook is used to get
all relevant information from an LSM about an inode. The inode_setsecctx is
used to set both the in-core and on-disk state for the inode based on a context
derived from inode_getsecctx.The final hook inode_notifysecctx will notify the
LSM of a change for the in-core state of the inode in question. These hooks are
for use in the labeled NFS code and addresses concerns of how to set security
on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's
explanation of the reason for these hooks is pasted below.
Quote Stephen Smalley
inode_setsecctx: Change the security context of an inode. Updates the
in core security context managed by the security module and invokes the
fs code as needed (via __vfs_setxattr_noperm) to update any backing
xattrs that represent the context. Example usage: NFS server invokes
this hook to change the security context in its incore inode and on the
backing file system to a value provided by the client on a SETATTR
operation.
inode_notifysecctx: Notify the security module of what the security
context of an inode should be. Initializes the incore security context
managed by the security module for this inode. Example usage: NFS
client invokes this hook to initialize the security context in its
incore inode to the value provided by the server for the file when the
server returned the file's attributes to the client.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
David P. Quigley [Thu, 3 Sep 2009 18:25:56 +0000 (14:25 -0400)]
VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.
This factors out the part of the vfs_setxattr function that performs the
setting of the xattr and its notification. This is needed so the SELinux
implementation of inode_setsecctx can handle the setting of the xattr while
maintaining the proper separation of layers.
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Wed, 9 Sep 2009 22:13:59 +0000 (15:13 -0700)]
Linux 2.6.31
Steve Wise [Wed, 9 Sep 2009 18:37:38 +0000 (11:37 -0700)]
RDMA/iwcm: Reject the connection when the cm_id is destroyed
If the cm_id of a connect request is destroyed prior to the ULP
accepting or rejecting the connection, then the provider never cleans
up the connection. The iwcm should explicitly reject these
connections if the cm_id is destroyed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Wed, 9 Sep 2009 18:25:56 +0000 (11:25 -0700)]
RDMA/cxgb3: Clean up properly on FW mismatch failures
FW mismatches can cause a crash in the iw_cxgb3 event handler.
- NULL the t3cdev->ulp pointer on failures in cxio_rdev_open()
- Silently ignore events when the ulp ptr is NULL in iwch_err_handler()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Steve Wise [Wed, 9 Sep 2009 18:25:55 +0000 (11:25 -0700)]
RDMA/cxgb3: Don't ignore insert_handle() failures
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Ed Cashin [Wed, 9 Sep 2009 12:10:18 +0000 (14:10 +0200)]
aoe: allocate unused request_queue for sysfs
Andy Whitcroft reported an oops in aoe triggered by use of an
incorrectly initialised request_queue object:
[ 2645.959090] kobject '<NULL>' (
ffff880059ca22c0): tried to add
an uninitialized object, something is seriously wrong.
[ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
[ 2645.959107] Call Trace:
[ 2645.959139] [<
ffffffff8126ca2f>] kobject_add+0x5f/0x70
[ 2645.959151] [<
ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
[ 2645.959155] [<
ffffffff8126043f>] add_disk+0x8f/0x160
[ 2645.959161] [<
ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
The request queue of an aoe device is not used but can be allocated in
code that does not sleep.
Bruno bisected this regression down to
cd43e26f071524647e660706b784ebcbefbd2e44
block: Expose stacked device queues in sysfs
"This seems to generate /sys/block/$device/queue and its contents for
everyone who is using queues, not just for those queues that have a
non-NULL queue->request_fn."
Addresses http://bugs.launchpad.net/bugs/410198
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942
Note that embedding a queue inside another object has always been
an illegal construct, since the queues are reference counted and
must persist until the last reference is dropped. So aoe was
always buggy in this respect (Jens).
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Bruno Premont <bonbons@linux-vserver.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
David Howells [Wed, 9 Sep 2009 07:30:21 +0000 (08:30 +0100)]
KEYS: Add missing linux/tracehook.h #inclusions
Add #inclusions of linux/tracehook.h to those arch files that had the tracehook
call for TIF_NOTIFY_RESUME added when support for that flag was added to that
arch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Wed, 9 Sep 2009 00:09:24 +0000 (17:09 -0700)]
i915: disable interrupts before tearing down GEM state
Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
when switching from graphics mode to the text console, or when
suspending (which does the same thing). With netconsole, the oops
turned out to be
BUG: unable to handle kernel NULL pointer dereference at
0000000000000084
IP: [<
ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
and it's due to the i915_gem.c code doing drm_irq_uninstall() after
having done i915_gem_idle(). And the i915_gem_idle() path will do
i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws() ->
dev_priv->hw_status_page = NULL;
but if an i915 interrupt comes in after this stage, it may want to
access that hw_status_page, and gets the above NULL pointer dereference.
And since the NULL pointer dereference happens from within an interrupt,
and with the screen still in graphics mode, the common end result is
simply a silently hung machine.
Fix it by simply uninstalling the irq handler before idling rather than
after. Fixes
http://bugzilla.kernel.org/show_bug.cgi?id=13819
Reported-and-tested-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 19:29:03 +0000 (12:29 -0700)]
jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'
This avoids an indirect call in the VFS for each path component lookup.
Well, at least as long as you own the directory in question, and the ACL
check is unnecessary.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 19:12:24 +0000 (12:12 -0700)]
ext[234]: move over to 'check_acl' permission model
Don't implement per-filesystem 'extX_permission()' functions that have
to be called for every path component operation, and instead just expose
the actual ACL checking so that the VFS layer can now do it for us.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 19:04:28 +0000 (12:04 -0700)]
shmfs: use 'check_acl' instead of 'permission'
shmfs wants purely standard POSIX ACL semantics, so we can use the new
generic VFS layer POSIX ACL checking rather than cooking our own
'permission()' function.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 18:51:25 +0000 (11:51 -0700)]
Make 'check_acl()' a first-class filesystem op
This is stage one in flattening out the callchains for the common
permission testing. Rather than have most filesystem implement their
own inode->i_op->permission function that just calls back down to the
VFS layers 'generic_permission()' with the per-filesystem ACL checking
function, the filesystem can just expose its 'check_acl' function
directly, and let the VFS layer do everything for it.
This is all just preparatory - no filesystem actually enables this yet.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 18:08:31 +0000 (11:08 -0700)]
Simplify exec_permission_lite(), part 3
Don't call down to the generic inode_permission() function just to
call the inode-specific permission function - just do it directly.
The generic inode_permission() code does things like checking MAY_WRITE
and devcgroup_inode_permission(), neither of which are relevant for the
light pathname walk permission checks (we always do just MAY_EXEC, and
the inode is never a special device).
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 17:53:56 +0000 (10:53 -0700)]
Simplify exec_permission_lite() further
This function is only called for path components that are already known
to be directories (they have a '->lookup' method). So don't bother
doing that whole S_ISDIR() testing, the whole point of the 'lite()'
version is that we know that we are looking at a directory component,
and that we're only checking name lookup permission.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 17:50:37 +0000 (10:50 -0700)]
Simplify exec_permission_lite() logic
Instead of returning EAGAIN and having the caller do something
special for that case, just do the special case directly.
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2009 17:05:33 +0000 (10:05 -0700)]
Do not call 'ima_path_check()' for each path component
Not only is that a supremely timing-critical path, but it's hopefully
some day going to be lockless for the common case, and ima can't do
that.
Plus the integrity code doesn't even care about non-regular files, so it
was always a total waste of time and effort.
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenyu Wang [Tue, 8 Sep 2009 06:52:25 +0000 (14:52 +0800)]
drm/i915: fix mask bits setting
eDP is exclusive connector too, and add missing crtc_mask
setting for TV.
This fixes
http://bugzilla.kernel.org/show_bug.cgi?id=14139
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reported-and-tested-by: Carlos R. Mafra <crmafra2@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Tue, 8 Sep 2009 16:31:45 +0000 (17:31 +0100)]
kmemleak: fix sparse warning for static declarations
This fixes these sparse warnings:
mm/kmemleak.c:1179:6: warning: symbol 'start_scan_thread' was not declared. Should it be static?
mm/kmemleak.c:1194:6: warning: symbol 'stop_scan_thread' was not declared. Should it be static?
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Luis R. Rodriguez [Tue, 8 Sep 2009 16:32:34 +0000 (17:32 +0100)]
kmemleak: fix sparse warning over overshadowed flags
A secondary irq_save is not required as a locking before it was
already disabling irqs.
This fixes this sparse warning:
mm/kmemleak.c:512:31: warning: symbol 'flags' shadows an earlier one
mm/kmemleak.c:448:23: originally declared here
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Luis R. Rodriguez [Sat, 5 Sep 2009 00:44:52 +0000 (17:44 -0700)]
kmemleak: move common painting code together
When painting grey or black we do the same thing, bring
this together into a helper and identify coloring grey or
black explicitly with defines. This makes this a little
easier to read.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Luis R. Rodriguez [Sat, 5 Sep 2009 00:44:51 +0000 (17:44 -0700)]
kmemleak: add clear command support
In an ideal world your kmemleak output will be small, when its
not (usually during initial bootup) you can use the clear command
to ingore previously reported and unreferenced kmemleak objects. We
do this by painting all currently reported unreferenced objects grey.
We paint them grey instead of black to allow future scans on the same
objects as such objects could still potentially reference newly
allocated objects in the future.
To test a critical section on demand with a clean
/sys/kernel/debug/kmemleak you can do:
echo clear > /sys/kernel/debug/kmemleak
test your kernel or modules
echo scan > /sys/kernel/debug/kmemleak
Then as usual to get your report with:
cat /sys/kernel/debug/kmemleak
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Luis R. Rodriguez [Tue, 8 Sep 2009 15:34:50 +0000 (16:34 +0100)]
kmemleak: use bool for true/false questions
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Catalin Marinas [Mon, 7 Sep 2009 09:14:42 +0000 (10:14 +0100)]
kmemleak: Do no create the clean-up thread during kmemleak_disable()
The kmemleak_disable() function could be called from various contexts
including IRQ. It creates a clean-up thread but the kthread_create()
function has restrictions on which contexts it can be called from,
mainly because of the kthread_create_lock. The patch changes the
kmemleak clean-up thread to a workqueue.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Eric Paris <eparis@redhat.com>