Ingo Molnar [Tue, 26 Apr 2011 17:36:14 +0000 (19:36 +0200)]
Merge branch 'perf/urgent' into perf/stat
Merge reason: We want to queue up dependent changes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Wed, 27 Apr 2011 09:51:41 +0000 (11:51 +0200)]
perf events, x86: Work around the Nehalem AAJ80 erratum
On Nehalem CPUs the retired branch-misses event can be completely bogus,
when there are no branch-misses occuring. When there are a lot of branch
misses then the count is pretty accurate. Still, this leaves us with an
event that over-counts a lot.
Detect this erratum and work it around by using BR_MISP_EXEC.ANY events.
These will also count speculated branches but still it's a lot more
precise in practice than the architectural event.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 26 Apr 2011 11:24:33 +0000 (13:24 +0200)]
perf, x86: Fix BTS condition
Currently the x86 backend incorrectly assumes that any BRANCH_INSN
with sample_period==1 is a BTS request. This is not true when we do
frequency driven profiling such as 'perf record -e branches'.
Solves this error:
$ perf record -e branches ./array
Error: sys_perf_event_open() syscall returned with 95 (Operation not supported).
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Tue, 26 Apr 2011 02:01:12 +0000 (19:01 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ecryptfs/ecryptfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Flush dirty pages in setattr
eCryptfs: Handle failed metadata read in lookup
eCryptfs: Add reference counting to lower files
eCryptfs: dput dentries returned from dget_parent
eCryptfs: Remove extra d_delete in ecryptfs_rmdir
Linus Torvalds [Tue, 26 Apr 2011 02:00:55 +0000 (19:00 -0700)]
Merge branch 'for-torvalds' of git://git./linux/kernel/git/linusw/linux-stericsson
* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
rtc: fix coh901331 startup crash
mach-ux500: fix i2c0 device setup regression
Eric Paris [Mon, 25 Apr 2011 20:26:29 +0000 (16:26 -0400)]
SELINUX: Make selinux cache VFS RCU walks safe
Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly. The SELinux AVC and security server access decision
code is RCU safe. A specific piece of the LSM audit code may not
be RCU safe.
This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code. It will normally just work under RCU. This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.
Based-on-patch-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Mon, 25 Apr 2011 18:01:36 +0000 (14:01 -0400)]
add hlist_bl_lock/unlock helpers
Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets. Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 26 Apr 2011 01:10:58 +0000 (18:10 -0700)]
bit_spinlock: don't play preemption games inside the busy loop
When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.
Do the preemption enable/disable around the loop, rather than inside of
it.
Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tyler Hicks [Fri, 22 Apr 2011 18:08:00 +0000 (13:08 -0500)]
eCryptfs: Flush dirty pages in setattr
After
57db4e8d73ef2b5e94a3f412108dff2576670a8a changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.
To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.
https://bugzilla.kernel.org/show_bug.cgi?id=33372
Reported-by: Rocko <rockorequin@hotmail.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Tue, 15 Mar 2011 19:54:00 +0000 (14:54 -0500)]
eCryptfs: Handle failed metadata read in lookup
When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.
If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.
This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.
https://bugs.launchpad.net/bugs/509180
Cc: <stable@kernel.org>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Thu, 14 Apr 2011 20:35:11 +0000 (15:35 -0500)]
eCryptfs: Add reference counting to lower files
For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.
This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.
Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.
This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Tue, 12 Apr 2011 16:23:09 +0000 (11:23 -0500)]
eCryptfs: dput dentries returned from dget_parent
Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Tyler Hicks [Tue, 12 Apr 2011 16:21:36 +0000 (11:21 -0500)]
eCryptfs: Remove extra d_delete in ecryptfs_rmdir
vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Linus Torvalds [Sun, 24 Apr 2011 15:51:15 +0000 (08:51 -0700)]
Merge branch 'dcache-cleanup'
* dcache-cleanup:
vfs: get rid of insane dentry hashing rules
Linus Torvalds [Sun, 24 Apr 2011 15:45:37 +0000 (08:45 -0700)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: ahci_start_engine compliant to AHCI spec
ata: pata_at91.c bugfix for initial_timing initialisation
ata: pata_at91.c bugfix for high master clock
ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
libata: Pioneer DVR-216D can't do SETXFER
ahci: don't enable port irq before handler is registered
libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
libata: Kill unused ATA_DFLAG_{H|D}IPM flags
ahci: EM supported message type sysfs attribute
Linus Torvalds [Sun, 24 Apr 2011 15:42:15 +0000 (08:42 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix master node recovery
UBIFS: fix false assertion warning in case of I/O failures
UBIFS: fix false space checking failure
Jian Peng [Sat, 23 Apr 2011 06:58:10 +0000 (23:58 -0700)]
libata: ahci_start_engine compliant to AHCI spec
At the end of section 10.1 of AHCI spec (rev 1.3), it states
Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
Even though most AHCI host controller works without this check,
specific controller will fail under this condition.
Signed-off-by: Jian Peng <jipeng2005@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Igor Plyatov [Mon, 28 Mar 2011 12:56:14 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for initial_timing initialisation
The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.
Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Igor Plyatov [Mon, 28 Mar 2011 12:56:15 +0000 (16:56 +0400)]
ata: pata_at91.c bugfix for high master clock
The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.
Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Seth Heasley [Wed, 20 Apr 2011 15:45:20 +0000 (08:45 -0700)]
ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
The previously submitted patch was word-wrapped.
This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Seth Heasley [Wed, 20 Apr 2011 15:43:37 +0000 (08:43 -0700)]
ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
The previously submitted patch was word-wrapped.
This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Jeff Mahoney [Tue, 19 Apr 2011 15:13:32 +0000 (11:13 -0400)]
libata: Pioneer DVR-216D can't do SETXFER
Commit
4a5610a04d415ed94af75bb1159d2621d62c8328 fixed an issue with
the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
reported a similar issue with his DVR-216D that the NOSETXFER horkage
worked around for him as well.
This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.
The issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=679143
Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Maxime Bizon [Wed, 16 Mar 2011 13:58:32 +0000 (14:58 +0100)]
ahci: don't enable port irq before handler is registered
The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.
Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Tejun Heo [Wed, 16 Mar 2011 10:14:55 +0000 (11:14 +0100)]
libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used. Implement ATA_FLAG_NO_DIPM and apply it.
This problem was reported by Stefan Bader in the following thread.
http://thread.gmane.org/gmane.linux.ide/48841
stable: applicable to 2.6.37 and 38.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Tejun Heo [Wed, 16 Mar 2011 10:14:25 +0000 (11:14 +0100)]
libata: Kill unused ATA_DFLAG_{H|D}IPM flags
ATA_DFLAG_{H|D}IPM flags are no longer used. Kill them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Hannes Reinecke [Fri, 4 Mar 2011 08:54:52 +0000 (09:54 +0100)]
ahci: EM supported message type sysfs attribute
This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Ben Hutchings [Sat, 23 Apr 2011 17:42:56 +0000 (18:42 +0100)]
kconfig: Avoid buffer underrun in choice input
Commit
40aee729b350 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.
However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer. If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 24 Apr 2011 14:58:46 +0000 (07:58 -0700)]
vfs: get rid of insane dentry hashing rules
The dentry hashing rules have been really quite complicated for a long
while, in odd ways. That made functions like __d_drop() very fragile
and non-obvious.
In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit. That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).
The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fbc2: "dcache: avoid RCU for never-hashed dentries").
That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.
That's just insane. It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.
So turn all of it around, and make it logical.
Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.
And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules. Now suddently that test at dentry free
time makes sense too.
And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.
Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Justin P. Mattock [Fri, 22 Apr 2011 17:08:52 +0000 (10:08 -0700)]
perf events, x86, P4: Fix typo in comment
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1303492132-3004-1-git-send-email-justinmattock@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Sun, 24 Apr 2011 05:35:16 +0000 (22:35 -0700)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Add missing syscore_suspend() and syscore_resume() calls
PM: Fix error code paths executed after failing syscore_suspend()
Linus Torvalds [Sun, 24 Apr 2011 05:32:03 +0000 (22:32 -0700)]
vfs: get rid of 'struct dcache_hash_bucket' abstraction
It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse. All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.
And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.
As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.
Make it all look and work the same.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Apr 2011 23:19:19 +0000 (16:19 -0700)]
Merge branch 'tty-linus' of git://git./linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
tty/n_gsm: fix bug in CRC calculation for gsm1 mode
serial/imx: read cts state only after acking cts change irq
parport_pc.c: correctly release the requested region for the IT887x
Andi Kleen [Fri, 22 Apr 2011 00:23:19 +0000 (17:23 -0700)]
SECURITY: Move exec_permission RCU checks into security modules
Right now all RCU walks fall back to reference walk when CONFIG_SECURITY
is enabled, even though just the standard capability module is active.
This is because security_inode_exec_permission unconditionally fails
RCU walks.
Move this decision to the low level security module. This requires
passing the RCU flags down the security hook. This way at least
the capability module and a few easy cases in selinux/smack work
with RCU walks with CONFIG_SECURITY=y
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 22 Apr 2011 21:59:07 +0000 (14:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
ASoC: add a module alias to the FSI driver
ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
ASoC: codecs: JZ4740: Fix OOPS
ASoC: Fix output PGA enabling in wm_hubs CODECs
ASoC: sn95031: decorate function with __devexit_p()
ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
ASoC: sst_platform: Fix lock acquring
ASoC: fsi: driver safely remove for against irq
ASoC: fsi: modify vague PM control on probe
ASoC: fsi: take care in failing case of dai register
MAINTAINERS: Update Samsung ASoC maintainer's id
ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
ASoC: Set left channel volume update bits for WM8994
ASoC: fix config error path
ASoC: check channel mismatch between cpu_dai and codec_dai
ASoC: Tegra: Suspend/resume support
Linus Torvalds [Fri, 22 Apr 2011 18:31:27 +0000 (11:31 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86: Update/fix Intel Nehalem cache events
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
x86, perf event: Turn off unstructured raw event access to offcore registers
perf: Support Xeon E7's via the Westmere PMU driver
Linus Torvalds [Fri, 22 Apr 2011 18:31:21 +0000 (11:31 -0700)]
Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xtensa: Fixup irq conversion fallout and nmi_count
Peter Zijlstra [Fri, 22 Apr 2011 11:39:56 +0000 (13:39 +0200)]
perf, x86: Update/fix Intel Nehalem cache events
Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.
Using:
main ()
{
int i;
for (i = 0; i <
1000000000; i++) {
asm("mov (%%rsp), %%rbx;"
"mov %%rbx, (%%rsp);" : : : "rbx");
}
}
We find:
$ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
Performance counter stats for './loop_1b_loads+stores' (10 runs):
4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% )
4,999,502,846 l1-dcache-loads:u ( +- 0.008% )
1,000,034,832 l1-dcache-stores:u ( +- 0.000% )
1.
565184942 seconds time elapsed ( +- 0.005% )
The 5b is surprising - we'd expect 1b:
$ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
Performance counter stats for './loop_1b_loads+stores' (10 runs):
4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% )
1,000,021,961 r10b:u ( +- 0.000% )
1,000,030,951 l1-dcache-stores:u ( +- 0.000% )
1.
565055422 seconds time elapsed ( +- 0.003% )
Which this patch thus fixes.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:21 +0000 (11:03 -0400)]
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).
Don pointed out that:
" I also noticed this patch fixed some unknown NMIs
on a P4 when I stressed the box".
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov [Thu, 21 Apr 2011 15:03:20 +0000 (11:03 -0400)]
perf, x86: P4 PMU -- Use perf_sample_data_init helper
Instead of opencoded assignments better to use
perf_sample_data_init helper.
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 22 Apr 2011 08:19:26 +0000 (10:19 +0200)]
Merge branch 'linus' into perf/core
Merge reason: Pick up upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 22 Apr 2011 06:44:38 +0000 (08:44 +0200)]
x86, perf event: Turn off unstructured raw event access to offcore registers
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:
|
| The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
| user space bits were not. This made it impossible to set the extra mask
| and actually do the OFFCORE profiling
|
Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:
|
| Some raw events -- like the Intel OFFCORE events -- support additional
| parameters. These can be appended after a ':'.
|
| For example on a multi socket Intel Nehalem:
|
| perf stat -e r1b7:20ff -a sleep 1
|
| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
| that measures any access to DRAM on another socket.
|
But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.
The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.
We already have such generalization in place for CPU cache events,
and it's all very extensible.
"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.
We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.
That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:
perf record -e dram ./myapp
perf record -e dram-remote ./myapp
... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.
These generalized events would work on all CPUs and architectures that
have comparable PMU features.
( Note, these are just examples: actual implementation could have more
sophistication and more parameter - as long as they center around
similarly simple usecases. )
Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:
e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.
( Note: after generalization has been implemented raw offcore events can be
supported as well: there can always be an odd event that is marginally
useful but not useful enough to generalize. DRAM profiling is definitely
*not* such a category so generalization must be done first. )
Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.
As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.
Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.
Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen [Thu, 21 Apr 2011 23:48:35 +0000 (16:48 -0700)]
perf: Support Xeon E7's via the Westmere PMU driver
There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Thu, 21 Apr 2011 17:50:56 +0000 (10:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
block: don't propagate unlisted DISK_EVENTs to userland
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
Tejun Heo [Thu, 21 Apr 2011 17:43:59 +0000 (19:43 +0200)]
ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling. Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.
As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem. Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tejun Heo [Thu, 21 Apr 2011 17:43:58 +0000 (19:43 +0200)]
block: don't propagate unlisted DISK_EVENTs to userland
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices. Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances. For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.
This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland. Those
events are processed the same for internal purposes but uevent
generation is suppressed.
This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Jens Axboe [Thu, 21 Apr 2011 17:28:35 +0000 (19:28 +0200)]
elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Thu, 21 Apr 2011 17:01:26 +0000 (10:01 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix duplicate message output
Linus Torvalds [Thu, 21 Apr 2011 17:01:03 +0000 (10:01 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
Revert "x86, NUMA: Fix fakenuma boot failure"
Randy Dunlap [Thu, 21 Apr 2011 16:07:26 +0000 (09:07 -0700)]
raid5: fix build error, sector_t usage
Change <sectors> from unsigned long long to sector_t.
This matches its source field.
ERROR: "__udivdi3" [drivers/md/raid456.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 21 Apr 2011 16:58:42 +0000 (09:58 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio: console: Enable call to hvc_remove() on console port remove
virtio_pci: Prevent double-free of pci regions after device hot-unplug
virtio: Decrement avail idx on buffer detach
Linus Torvalds [Thu, 21 Apr 2011 16:57:56 +0000 (09:57 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
agp: fix arbitrary kernel memory writes
agp: fix OOM and buffer overflow
drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines
Linus Torvalds [Thu, 21 Apr 2011 16:57:13 +0000 (09:57 -0700)]
Merge branch 'drm-intel-fixes' of git://git./linux/kernel/git/keithp/linux-2.6
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
drm/i915: Initialise g4x watermarks for disabled pipes
drm/i915: Sanitize the output registers after resume
drm/i915/tv: Fix modeset flickering introduced in
7f58aabc3
drm/i915/tv: Only poll for TV connections
drm/i915/tv: Remember the detected TV type
Linus Torvalds [Thu, 21 Apr 2011 16:56:35 +0000 (09:56 -0700)]
Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
intel_iommu: disable all VT-d PMRs when TXT launched
intel-iommu: Fix get_domain_for_dev() error path
intel-iommu: Unlink domain from iommu
intel-iommu: Fix use after release during device attach
Jan Kara [Wed, 20 Apr 2011 18:30:40 +0000 (20:30 +0200)]
vfs: Pass setxattr(2) flags properly
For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
XATTR_REPLACE) to the filesystem specific helper. This caused that
setxattr(2) syscall just ignored these flags.
Fix the bug by passing flags correctly.
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amit Shah [Mon, 14 Mar 2011 12:15:48 +0000 (17:45 +0530)]
virtio: console: Enable call to hvc_remove() on console port remove
This call was disabled as hot-unplugging one virtconsole port led to
another virtconsole port freezing.
Upon testing it again, this now works, so enable it.
In addition, a bug was found in qemu wherein removing a port of one type
caused the guest output from another port to stop working. I doubt it
was just this bug that caused it (since disabling the hvc_remove() call
did allow other ports to continue working), but since it's all solved
now, we're fine with hot-unplugging of virtconsole ports.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Amit Shah [Mon, 14 Mar 2011 12:15:02 +0000 (17:45 +0530)]
virtio_pci: Prevent double-free of pci regions after device hot-unplug
In the case where a virtio-console port is in use (opened by a program)
and a virtio-console device is removed, the port is kept around but all
the virtio-related state is assumed to be gone.
When the port is finally released (close() called), we call
device_destroy() on the port's device. This results in the parent
device's structures to be freed as well. This includes the PCI regions
for the virtio-console PCI device.
Once this is done, however, virtio_pci_release_dev() kicks in, as the
last ref to the virtio device is now gone, and attempts to do
pci_iounmap(pci_dev, vp_dev->ioaddr);
pci_release_regions(pci_dev);
pci_disable_device(pci_dev);
which results in a double-free warning.
Move the code that releases regions, etc., to the virtio_pci_remove()
function, and all that's now left in release_dev is the final freeing of
the vp_dev.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Amit Shah [Wed, 16 Mar 2011 13:42:10 +0000 (19:12 +0530)]
virtio: Decrement avail idx on buffer detach
When detaching a buffer from a vq, the avail.idx value should be
decremented as well.
This was noticed by hot-unplugging a virtio console port and then
plugging in a new one on the same number (re-using the vqs which were
just 'disowned'). qemu reported
'Guest moved used index from 0 to 256'
when any IO was attempted on the new port.
CC: stable@kernel.org
Reported-by: juzhang <juzhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Joseph Cihula [Mon, 21 Mar 2011 18:04:24 +0000 (11:04 -0700)]
intel_iommu: disable all VT-d PMRs when TXT launched
Intel VT-d Protected Memory Regions (PMRs) are supposed to be disabled,
on each VT-d engine, after DMA remapping is enabled on the engines.
This is because the behavior of having both enabled is not deterministic
and because, if TXT has been used to launch the kernel, the PMRs may be
programmed to cover memory regions that will be used for DMA.
Under some circumstances (certain quirks detected, lack of multiple
devices, etc.), the current code does not set up DMA remapping on some
VT-d engines. In such cases it also skips disabling the PMRs. This
causes failures when the kernel is launched with TXT (most often this
occurs on the graphics engine and results in colored vertical bars on
the display).
This patch detects when the kernel has been launched with TXT and then
disables the PMRs on all VT-d engines. In some cases where the reason
that remapping is not being enabled is due to possible ACPI DMAR table
errors, the VT-d engine addresses may not be correct and thus not able
to be safely programmed even to disable PMRs. Because part of the TXT
launch process is the verification of these addresses, it will always be
safe to disable PMRs if the TXT launch has succeeded and hence only
doing this in such cases.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Artem Bityutskiy [Thu, 21 Apr 2011 11:49:55 +0000 (14:49 +0300)]
UBIFS: fix master node recovery
This patch fixes the following symptoms:
1. Unmount UBIFS cleanly.
2. Start mounting UBIFS R/W and have a power cut immediately
3. Start mounting UBIFS R/O, this succeeds
4. Try to re-mount UBIFS R/W - this fails immediately or later on,
because UBIFS will write the master node to the flash area
which has been written before.
The analysis of the problem:
1. UBIFS is unmounted cleanly, both copies of the master node are clean.
2. UBIFS is being mounter R/W, starts changing master node copy 1, and
a power cut happens. The copy N1 becomes corrupted.
3. UBIFS is being mounted R/O. It notices the copy N1 is corrupted and
reads copy N2. Copy N2 is clean.
4. Because of R/O mode, UBIFS cannot recover copy 1.
5. The mount code (ubifs_mount()) sees that the master node is clean,
so it decides that no recovery is needed.
6. We are re-mounting R/W. UBIFS believes no recovery is needed and
starts updating the master node, but copy N1 is still corrupted
and was not recovered!
Fix this problem by marking the master node as dirty every time we
recover it and we are in R/O mode. This forces further recovery and
the UBIFS cleans-up the corruptions and recovers the copy N1 when
re-mounting R/W later.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable@kernel.org
Artem Bityutskiy [Thu, 21 Apr 2011 07:39:54 +0000 (10:39 +0300)]
UBIFS: fix false assertion warning in case of I/O failures
When UBIFS switches to R/O mode because it detects I/O failures, then
when we unmount, we still may have allocated budget, and the assertions
which verify that we have not budget will fire. But it is expected to
have the budget in case of I/O failures, so the assertion warnings will
be false. Suppress them for the I/O failure case.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Takashi Iwai [Thu, 21 Apr 2011 10:44:38 +0000 (12:44 +0200)]
Merge branch 'fix/hda' into for-linus
David Rientjes [Thu, 21 Apr 2011 02:19:13 +0000 (19:19 -0700)]
x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.
NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.
debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.
It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.
Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.
-v2: Fix the return statements, by Kosaki Motohiro
Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Rientjes [Thu, 21 Apr 2011 02:19:10 +0000 (19:19 -0700)]
Revert "x86, NUMA: Fix fakenuma boot failure"
Andreas Herrmann reported that
7d6b46707f24 ("x86, NUMA: Fix fakenuma
boot failure") causes certain physical NUMA topologies (for example
AMD Magny-Cours) to move sibling cpus to a single node when in reality
they are in separate domains.
This may result in some nodes being completely void of cpus, which
doesn't accurately represent the correct topology. The system will
boot, but will have suboptimal NUMA performance.
This commit was intended as a fix for NUMA emulation, but should
not cause a regression for real NUMA machines as a side effect.
( There will be a separate fix for the numa-debug code, which
will not affect physical topologies. )
Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Vasiliy Kulikov [Thu, 14 Apr 2011 16:55:16 +0000 (20:55 +0400)]
agp: fix arbitrary kernel memory writes
pg_start is copied from userspace on AGPIOC_BIND and AGPIOC_UNBIND ioctl
cmds of agp_ioctl() and passed to agpioc_bind_wrap(). As said in the
comment, (pg_start + mem->page_count) may wrap in case of AGPIOC_BIND,
and it is not checked at all in case of AGPIOC_UNBIND. As a result, user
with sufficient privileges (usually "video" group) may generate either
local DoS or privilege escalation.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vasiliy Kulikov [Thu, 14 Apr 2011 16:55:19 +0000 (20:55 +0400)]
agp: fix OOM and buffer overflow
page_count is copied from userspace. agp_allocate_memory() tries to
check whether this number is too big, but doesn't take into account the
wrap case. Also agp_create_user_memory() doesn't check whether
alloc_size is calculated from num_agp_pages variable without overflow.
This may lead to allocation of too small buffer with following buffer
overflow.
Another problem in agp code is not addressed in the patch - kernel memory
exhaustion (AGPIOC_RESERVE and AGPIOC_ALLOCATE ioctls). It is not checked
whether requested pid is a pid of the caller (no check in agpioc_reserve_wrap()).
Each allocation is limited to 16KB, though, there is no per-process limit.
This might lead to OOM situation, which is not even solved in case of the
caller death by OOM killer - the memory is allocated for another (faked) process.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Thu, 21 Apr 2011 01:18:19 +0000 (18:18 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (max34440) Add driver documentation
hwmon: (max16064) Add driver documentation
hwmon: (max8688) Add driver documentation
hwmon: (pmbus) Documentation updates
hwmon: (smm665) Fix spelling error in driver documentation
hwmon: (pmbus) Removed unused variable from struct pmbus_data
hwmon: Add submitting-patches checklist to documentation
Linus Torvalds [Thu, 21 Apr 2011 00:41:06 +0000 (17:41 -0700)]
Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
Open with O_CREAT flag set fails to open existing files on non writable directories
nfsd4: Fix filp leak
nfsd4: fix struct file leak on delegation
Linus Torvalds [Thu, 21 Apr 2011 00:40:45 +0000 (17:40 -0700)]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6881/1: cputype.h uses __attribute_const__ which requires including kernel.h
ARM: Add new syscalls
Linus Torvalds [Thu, 21 Apr 2011 00:40:25 +0000 (17:40 -0700)]
Merge branch 'stable/bug-fixes-rc4' of git://git./linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
xen: do not create the extra e820 region at an addr lower than 4G
Linus Torvalds [Thu, 21 Apr 2011 00:40:02 +0000 (17:40 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: Update documentation for sync_min and sync_max entries
md: Cleanup after raid45->raid0 takeover
md: Fix dev_sectors on takeover from raid0 to raid4/5
md/raid5: remove setting of ->queue_lock
Mike Waychison [Wed, 20 Apr 2011 19:04:36 +0000 (12:04 -0700)]
ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
When SND_HDA_NEEDS_RESUME is not defined, the compiler identifies that
the following symbols are static but not used:
restore_shutup_pins
hda_cleanup_all_streams
Fix warnings by adding SND_HDA_NEEDS_RESUME guards.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Wed, 20 Apr 2011 16:48:52 +0000 (09:48 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Remove the extra check in queue_requests_store
block, blk-sysfs: Fix an err return path in blk_register_queue()
block: remove stale kerneldoc member from __blk_run_queue()
block: get rid of QUEUE_FLAG_REENTER
cfq-iosched: read_lock() does not always imply rcu_read_lock()
block: kill blk_flush_plug_list() export
Linus Walleij [Sun, 17 Apr 2011 18:32:19 +0000 (20:32 +0200)]
rtc: fix coh901331 startup crash
The rtc_device_register() call has changed semantics so that it
will immediately call out to rtc_read_alarm() and since the
callbacks require the drvdata to be set, we need to set it before
the registration call to avoid NULL dereference.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Walleij [Wed, 30 Mar 2011 12:31:42 +0000 (14:31 +0200)]
mach-ux500: fix i2c0 device setup regression
Adding two sets of I2C devices to the same bus doesn't quite work,
atleast not anymore. Stash one array and determine how much of it
shall be added instead.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Dave Chinner [Tue, 12 Apr 2011 13:34:20 +0000 (13:34 +0000)]
xfs: fix duplicate message output
Commit
957935dc ("xfs: fix xfs_debug warnings" broke the logic in
__xfs_printk(). Instead of only printing one of two possible output
strings based on whether the fs has a name or not, it outputs both.
Fix it to only output one message again.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Artem Bityutskiy [Wed, 20 Apr 2011 15:02:45 +0000 (18:02 +0300)]
UBIFS: fix false space checking failure
This patch fixes UBIFS mount failure when the debugging support is enabled,
we are recovering from a power cut, we were first mounter R/O and we are
re-mounting R/W. In this case we should not assume that the amount of free
space before we have re-mounted R/W and after are equivalent, because
when we have mounted R/O the file-system is in a non-committed state so
the amount of free space is slightly smaller, due to the fact that we cannot
predict the amount of free space precisely before we commit.
This patch fixes the issue by skipping the debugging check in case of
recovery. This issue was reported by Caizhiyong <caizhiyong@huawei.com>
here: http://thread.gmane.org/gmane.linux.drivers.mtd/34350/focus=34387
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reported-by: Caizhiyong <caizhiyong@huawei.com>
Cc: stable@kernel.org [2.6.30+]
Sachin Prabhu [Wed, 20 Apr 2011 12:09:35 +0000 (13:09 +0100)]
Open with O_CREAT flag set fails to open existing files on non writable directories
An open on a NFS4 share using the O_CREAT flag on an existing file for
which we have permissions to open but contained in a directory with no
write permissions will fail with EACCES.
A tcpdump shows that the client had set the open mode to UNCHECKED which
indicates that the file should be created if it doesn't exist and
encountering an existing flag is not an error. Since in this case the
file exists and can be opened by the user, the NFS server is wrong in
attempting to check create permissions on the parent directory.
The patch adds a conditional statement to check for create permissions
only if the file doesn't exist.
Signed-off-by: Sachin S. Prabhu <sprabhu@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stefano Stabellini [Tue, 19 Apr 2011 13:47:31 +0000 (14:47 +0100)]
xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.
Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Stefano Stabellini [Tue, 12 Apr 2011 11:19:52 +0000 (12:19 +0100)]
xen: do not create the extra e820 region at an addr lower than 4G
Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().
It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
David Ahern [Thu, 7 Apr 2011 03:54:20 +0000 (21:54 -0600)]
perf script: improve validation of sample attributes for output fields
Check for required sample attributes using evsel rather than sample_type
in the session header. If the attribute for a default field is not
present for the event type (e.g., new command operating on file from
older kernel) the field is removed from the output list.
Expected event types must exist. For example, if a user specifies
-f trace:time,trace -f sw:time,cpu,sym
the perf.data file must contain both tracepoints and software events
(ie., it is an error if either does not exist in the file).
Attribute checking is done once at the beginning of perf-script rather
than for each sample.
v1 -> v2:
- addressed comments from acme
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1302148460-570-1-git-send-email-daahern@cisco.com
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
CoolCold [Wed, 20 Apr 2011 05:40:01 +0000 (15:40 +1000)]
md: Update documentation for sync_min and sync_max entries
linux/Documentation/md.txt is missing description for sync_min and
sync_max entries.
This patch adds description for sync_min and sync_max entries.
Signed-off-by: Roman Ovchinnikov <coolthecold@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Krzysztof Wojcik [Wed, 20 Apr 2011 05:39:53 +0000 (15:39 +1000)]
md: Cleanup after raid45->raid0 takeover
Problem:
After raid4->raid0 takeover operation, another takeover operation
(e.g raid0->raid10) results "kernel oops".
Root cause:
Variables 'degraded' in mddev structure is not cleared
on raid45->raid0 takeover.
This patch reset this variable.
Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 20 Apr 2011 05:38:18 +0000 (15:38 +1000)]
md: Fix dev_sectors on takeover from raid0 to raid4/5
A raid0 array doesn't set 'dev_sectors' as each device might
contribute a different number of sectors.
So when converting to a RAID4 or RAID5 we need to set dev_sectors
as they need the number.
We have already verified that in fact all devices do contribute
the same number of sectors, so use that number.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 20 Apr 2011 05:38:07 +0000 (15:38 +1000)]
md/raid5: remove setting of ->queue_lock
We previously needed to set ->queue_lock to match the raid5
device_lock so we could safely use queue_flag_* operations (e.g. for
plugging). which test the ->queue_lock is in fact locked.
However that need has completely gone away and is unlikely to come
back to remove this now-pointless setting.
Signed-off-by: NeilBrown <neilb@suse.de>
Linus Torvalds [Wed, 20 Apr 2011 01:32:57 +0000 (18:32 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: pll tweaks for r7xx
drm/nouveau: fix allocation of notifier object
drm/nouveau: fix notifier memory corruption bug
drm/nouveau: fix pinning of notifier block
drm/nouveau: populate ttm_alloced with false, when it's not
drm/nouveau: fix nv30 pcie boards
drm/nouveau: split ramin_lock into two locks, one hardirq safe
drm/radeon/kms: adjust evergreen display watermark setup
drm/radeon/kms: add connectors even if i2c fails
drm/radeon/kms: fix bad shift in atom iio table parser
Cédric Cano [Tue, 19 Apr 2011 15:07:13 +0000 (11:07 -0400)]
drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines
agd5f: fix commit message.
Signed-off-by: Cedric Cano <ccano@interfaceconcept.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Tue, 19 Apr 2011 19:24:59 +0000 (15:24 -0400)]
drm/radeon/kms: pll tweaks for r7xx
Prefer min m to max p only on pre-r7xx asics.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=36197
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mikhail Kshevetskiy [Sun, 27 Mar 2011 00:05:00 +0000 (04:05 +0400)]
tty/n_gsm: fix bug in CRC calculation for gsm1 mode
Problem description:
gsm_queue() calculate a CRC for arrived frames. As a last step of
CRC calculation it call
gsm->fcs = gsm_fcs_add(gsm->fcs, gsm->received_fcs);
This work perfectly for the case of GSM0 mode as gsm->received_fcs
contain the last piece of data required to generate final CRC.
gsm->received_fcs is not used for GSM1 mode. Thus we put an
additional byte to CRC calculation. As result we get a wrong CRC
and reject incoming frame.
Signed-off-by: Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Uwe Kleine-König [Mon, 11 Apr 2011 08:59:09 +0000 (10:59 +0200)]
serial/imx: read cts state only after acking cts change irq
If cts changes between reading the level at the cts input (USR1_RTSS)
and acking the irq (USR1_RTSD) the last edge doesn't generate an irq and
uart_handle_cts_change is called with a outdated value for cts.
The race was introduced by commit
ceca629 ([ARM] 2971/1: i.MX uart handle rts irq)
Reported-by: Arwed Springer <Arwed.Springer@de.trumpf.com>
Tested-by: Arwed Springer <Arwed.Springer@de.trumpf.com>
Cc: stable@kernel.org # 2.6.14+
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Niels de Vos [Mon, 18 Apr 2011 14:26:03 +0000 (15:26 +0100)]
parport_pc.c: correctly release the requested region for the IT887x
Replace release_resource() by release_region() and also fix the
inconsistency in the size of the requested/released region.
The size of the resource should be 32, not 0x8 like it was corrected in
commit
e7c310c36e5fdf1b83a459e5db167bfbd86137db already.
CC: linux-serial@vger.kernel.org
Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Airlie [Tue, 19 Apr 2011 23:21:34 +0000 (09:21 +1000)]
Merge remote branch 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next into drm-fixes
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
drm/nouveau: fix allocation of notifier object
drm/nouveau: fix notifier memory corruption bug
drm/nouveau: fix pinning of notifier block
drm/nouveau: populate ttm_alloced with false, when it's not
drm/nouveau: fix nv30 pcie boards
drm/nouveau: split ramin_lock into two locks, one hardirq safe
Marcin Slusarz [Tue, 19 Apr 2011 21:52:42 +0000 (23:52 +0200)]
drm/nouveau: fix allocation of notifier object
Commit
73412c3854c877e5f37ad944ee8977addde4d35a ("drm/nouveau: allocate
kernel's notifier object at end of block") intended to align end of
notifier block to page boundary, but start of block was miscalculated
to be off by -16 bytes. Fix it.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Marcin Slusarz [Tue, 19 Apr 2011 21:50:48 +0000 (23:50 +0200)]
drm/nouveau: fix notifier memory corruption bug
nouveau_bo_wr32 expects offset to be in words, but we pass value in bytes,
so after commit
73412c3854c877e5f37ad944ee8977addde4d35a ("drm/nouveau: allocate
kernel's notifier object at end of block") we started to overwrite some memory
after notifier buffer object (previously m2mf_ntfy was always 0, so it didn't
matter it was a value in bytes).
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reported-by: Nigel Cunningham <lkml@nigelcunningham.com.au>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Pekka Paalanen <pq@iki.fi>
Cc: stable@kernel.org [2.6.38]
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sun, 17 Apr 2011 23:12:25 +0000 (09:12 +1000)]
drm/nouveau: fix pinning of notifier block
Problem introduced with commit
6ba9a68317781537d6184d3fdb2d0f20c97da3a4
Reported-by: Bob Gleitsmann <rjgleits@bellsouth.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Mon, 11 Apr 2011 06:37:44 +0000 (16:37 +1000)]
drm/nouveau: populate ttm_alloced with false, when it's not
Caught with kmemcheck on unrelated business.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Fri, 8 Apr 2011 00:07:34 +0000 (10:07 +1000)]
drm/nouveau: fix nv30 pcie boards
Wasn't aware they even existed, apparently they do! They're actually
AGP chips with a bridge as far as I can tell, which puts them in the
same boat as nv40/nv45.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Wed, 6 Apr 2011 03:28:35 +0000 (13:28 +1000)]
drm/nouveau: split ramin_lock into two locks, one hardirq safe
Fixes a possible lock ordering reversal between context_switch_lock
and ramin_lock.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Rafael J. Wysocki [Tue, 19 Apr 2011 22:36:11 +0000 (00:36 +0200)]
PM: Add missing syscore_suspend() and syscore_resume() calls
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature. However, commit
40dc166cb5dddbd36aa4ad11c03915ea
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.
To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Thomas Gleixner [Tue, 19 Apr 2011 20:52:58 +0000 (22:52 +0200)]
xtensa: Fixup irq conversion fallout and nmi_count
Some unnamed moron fatfingered the arguments of the irq chip callbacks
to irq_chip instead of irq_data.
While at it remove the nmi_count() print in arch_show_interrupts()
which has been broken before the irq conversion already.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Tue, 19 Apr 2011 22:16:41 +0000 (15:16 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
netfilter: ipset: Fix the order of listing of sets
ip6_pol_route panic: Do not allow VLAN on loopback
bnx2x: Fix port identification problem
r8169: add Realtek as maintainer.
ip: ip_options_compile() resilient to NULL skb route
bna: fix memory leak during RX path cleanup
bna: fix for clean fw re-initialization
usbnet: Fix up 'FLAG_POINTTOPOINT' and 'FLAG_MULTI_PACKET' overlaps.
iwlegacy: fix tx_power initialization
Revert "tcp: disallow bind() to reuse addr/port"
qlcnic: limit skb frags for non tso packet
net: can: mscan: fix build breakage in mpc5xxx_can
netfilter: ipset: set match and SET target fixes
netfilter: ipset: bitmap:ip,mac type requires "src" for MAC
sctp: fix oops while removed transport still using as retran path
sctp: fix oops when updating retransmit path with DEBUG on
net: Disable NETIF_F_TSO_ECN when TSO is disabled
net: Disable all TSO features when SG is disabled
sfc: Use rmb() to ensure reads occur in order
ieee802154: Remove hacked CFLAGS in net/ieee802154/Makefile
...