Lee Schermerhorn [Tue, 15 Dec 2009 01:58:23 +0000 (17:58 -0800)]
hugetlb: add generic definition of NUMA_NO_NODE
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers
to generic header 'linux/numa.h' for use in generic code. NUMA_NO_NODE
replaces bare '-1' where it's used in this series to indicate "no node id
specified". Ultimately, it can be used to replace the -1 elsewhere where
it is used similarly.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 15 Dec 2009 01:58:21 +0000 (17:58 -0800)]
hugetlb: derive huge pages nodes allowed from task mempolicy
This patch derives a "nodes_allowed" node mask from the numa mempolicy of
the task modifying the number of persistent huge pages to control the
allocation, freeing and adjusting of surplus huge pages when the pool page
count is modified via the new sysctl or sysfs attribute
"nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows:
* For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
is produced. This will cause the hugetlb subsystem to use
node_online_map as the "nodes_allowed". This preserves the
behavior before this patch.
* For "preferred" mempolicy, including explicit local allocation,
a nodemask with the single preferred node will be produced.
"local" policy will NOT track any internode migrations of the
task adjusting nr_hugepages.
* For "bind" and "interleave" policy, the mempolicy's nodemask
will be used.
* Other than to inform the construction of the nodes_allowed node
mask, the actual mempolicy mode is ignored. That is, all modes
behave like interleave over the resulting nodes_allowed mask
with no "fallback".
See the updated documentation [next patch] for more information
about the implications of this patch.
Examples:
Starting with:
Node 0 HugePages_Total: 0
Node 1 HugePages_Total: 0
Node 2 HugePages_Total: 0
Node 3 HugePages_Total: 0
Default behavior [with or without this patch] balances persistent
hugepage allocation across nodes [with sufficient contiguous memory]:
sysctl vm.nr_hugepages[_mempolicy]=32
yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 8
Node 3 HugePages_Total: 8
Of course, we only have nr_hugepages_mempolicy with the patch,
but with default mempolicy, nr_hugepages_mempolicy behaves the
same as nr_hugepages.
Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
'--membind' because it allows multiple nodes to be specified
and it's easy to type]--we can allocate huge pages on
individual nodes or sets of nodes. So, starting from the
condition above, with 8 huge pages per node, add 8 more to
node 2 using:
numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
This yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The incremental 8 huge pages were restricted to node 2 by the
specified mempolicy.
Similarly, we can use mempolicy to free persistent huge pages
from specified nodes:
numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
yields:
Node 0 HugePages_Total: 4
Node 1 HugePages_Total: 4
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The 8 huge pages freed were balanced over nodes 0 and 1.
[rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 15 Dec 2009 01:58:17 +0000 (17:58 -0800)]
hugetlb: factor init_nodemask_of_node()
Factor init_nodemask_of_node() out of the nodemask_of_node() macro.
This will be used to populate the huge pages "nodes_allowed" nodemask for
a single node when basing nodes_allowed on a preferred/local mempolicy or
when a persistent huge page pool page count is modified via a per node
sysfs attribute.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 15 Dec 2009 01:58:16 +0000 (17:58 -0800)]
hugetlb: add nodemask arg to huge page alloc, free and surplus adjust functions
In preparation for constraining huge page allocation and freeing by the
controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer
to the allocate, free and surplus adjustment functions. For now, pass
NULL to indicate default behavior--i.e., use node_online_map. A
subsqeuent patch will derive a non-default mask from the controlling
task's numa mempolicy.
Note that this method of updating the global hstate nr_hugepages under the
constraint of a nodemask simplifies keeping the global state
consistent--especially the number of persistent and surplus pages relative
to reservations and overcommit limits. There are undoubtedly other ways
to do this, but this works for both interfaces: mempolicy and per node
attributes.
[rientjes@google.com: fix HIGHMEM compile error]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Tue, 15 Dec 2009 01:58:15 +0000 (17:58 -0800)]
hugetlb: rework hstate_next_node_* functions
Modify the hstate_next_node* functions to allow them to be called to
obtain the "start_nid". Then, whereas prior to this patch we
unconditionally called hstate_next_node_to_{alloc|free}(), whether or not
we successfully allocated/freed a huge page on the node, now we only call
these functions on failure to alloc/free to advance to next allowed node.
Factor out the next_node_allowed() function to handle wrap at end of
node_online_map. In this version, the allowed nodes include all of the
online nodes.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 15 Dec 2009 01:58:13 +0000 (17:58 -0800)]
nodemask: make NODEMASK_ALLOC more general
This is a series of patches to provide control over the location of the
allocation and freeing of persistent huge pages on a NUMA platform.
Please consider for merging into mmotm.
This series uses two mechanisms to constrain the nodes from which
persistent huge pages are allocated: 1) the task NUMA mempolicy of the
task modifying a new sysctl "nr_hugepages_mempolicy", based on a
suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
attributes have been added [in V4] to each node system device under:
/sys/devices/node/node[0-9]*/hugepages
The per node attibutes allow direct assignment of a huge page count on a
specific node, regardless of the task's mempolicy or cpuset constraints.
This patch:
NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary.
It's perfectly reasonable to use this macro to allocate a nodemask_t,
which is anonymous, either dynamically or on the stack depending on
NODES_SHIFT.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 15 Dec 2009 01:58:11 +0000 (17:58 -0800)]
mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place
Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
in right after isolate_page().
This patch does it.
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:58:10 +0000 (17:58 -0800)]
/dev/mem: remove redundant parameter from do_write_kmem()
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:58:10 +0000 (17:58 -0800)]
/dev/mem: remove the "written" variable in write_kmem()
Also rename "len" to "sz". No behavior change.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:58:09 +0000 (17:58 -0800)]
/dev/mem: make size_inside_page() logic straight
Also convert more size_inside_page() users.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:58:08 +0000 (17:58 -0800)]
/dev/mem: cleanup unxlate_dev_mem_ptr() calls
No behaviour change.
[akpm@linux-foundation.org: cleanuplets]
[akpm@linux-foundation.org: remove unused `ret']
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:58:07 +0000 (17:58 -0800)]
/dev/mem: introduce size_inside_page()
Introduce size_inside_page() to replace duplicate /dev/mem code.
Also apply it to /dev/kmem, whose alignment logic was buggy.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 15 Dec 2009 01:57:57 +0000 (17:57 -0800)]
/dev/mem: remove redundant test on len
The len test in write_kmem() is always true, so can be reduced.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Tue, 15 Dec 2009 01:57:56 +0000 (17:57 -0800)]
mmap: don't return ENOMEM when mapcount is temporarily exceeded in munmap()
On ia64, the following test program exit abnormally, because glibc thread
library called abort().
========================================================
(gdb) bt
#0 0xa000000000010620 in __kernel_syscall_via_break ()
#1 0x20000000003208e0 in raise () from /lib/libc.so.6.1
#2 0x2000000000324090 in abort () from /lib/libc.so.6.1
#3 0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0
#4 0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0
#5 0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1
========================================================
The fact is, glibc call munmap() when thread exitng time for freeing
stack, and it assume munlock() never fail. However, munmap() often make
vma splitting and it with many mapcount make -ENOMEM.
Oh well, that's crazy, because stack unmapping never increase mapcount.
The maxcount exceeding is only temporary. internal temporary exceeding
shouldn't make ENOMEM.
This patch does it.
test_max_mapcount.c
==================================================================
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<pthread.h>
#include<errno.h>
#include<unistd.h>
#define THREAD_NUM 30000
#define MAL_SIZE (8*1024*1024)
void *wait_thread(void *args)
{
void *addr;
addr = malloc(MAL_SIZE);
sleep(10);
return NULL;
}
void *wait_thread2(void *args)
{
sleep(60);
return NULL;
}
int main(int argc, char *argv[])
{
int i;
pthread_t thread[THREAD_NUM], th;
int ret, count = 0;
pthread_attr_t attr;
ret = pthread_attr_init(&attr);
if(ret) {
perror("pthread_attr_init");
}
ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
if(ret) {
perror("pthread_attr_setdetachstate");
}
for (i = 0; i < THREAD_NUM; i++) {
ret = pthread_create(&th, &attr, wait_thread, NULL);
if(ret) {
fprintf(stderr, "[%d] ", count);
perror("pthread_create");
} else {
printf("[%d] create OK.\n", count);
}
count++;
ret = pthread_create(&thread[i], &attr, wait_thread2, NULL);
if(ret) {
fprintf(stderr, "[%d] ", count);
perror("pthread_create");
} else {
printf("[%d] create OK.\n", count);
}
count++;
}
sleep(3600);
return 0;
}
==================================================================
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Chiang [Tue, 15 Dec 2009 01:57:54 +0000 (17:57 -0800)]
page-types: exit early when invoked with -d|--describe
On a system with large amount of memory (256GB), invoking page-types can
take quite a long time, which is unreasonable considering the user only
wants a description of the flags:
# time ./page-types -d 0x10
0x0000000000000010 ____D_____________________________ dirty
real 0m34.285s
user 0m1.966s
sys 0m32.313s
This is because we still walk the entire address range.
Exiting early seems like a reasonble solution:
# time ./page-types -d 0x10
0x0000000000000010 ____D_____________________________ dirty
real 0m0.007s
user 0m0.001s
sys 0m0.005s
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Haicheng Li <haicheng.li@intel.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Chiang [Tue, 15 Dec 2009 01:57:53 +0000 (17:57 -0800)]
page-types: whitespace alignment
Align the output when page-type -h is invoked.
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Chiang [Tue, 15 Dec 2009 01:57:52 +0000 (17:57 -0800)]
page-types: learn to describe flags directly from command line
Teach page-types to describe page flags directly from the command line.
Why is this useful? For instance, if you're using memory hotplug and see
this in /var/log/messages:
kernel: removing from LRU failed
3836dd0/1/
1e00000000000010
It would be nice to decode those page flags without staring at the source.
Example usage and output:
# Documentation/vm/page-types -d 0x10
0x0000000000000010 ____D_____________________________ dirty
# Documentation/vm/page-types -d anon
0x0000000000001000 ____________a_____________________ anonymous
# Documentation/vm/page-types -d anon,0x10
0x0000000000001010 ____D_______a_____________________ dirty,anonymous
[achiang@hp.com: documentation]
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Haicheng Li <haicheng.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 15 Dec 2009 01:57:49 +0000 (17:57 -0800)]
page-types: unsigned cannot be less than 0 in add_page()
If not signed, testing of the read() return value in this function
will not work.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tommi Rantala [Tue, 15 Dec 2009 01:57:48 +0000 (17:57 -0800)]
page-types: constify read only arrays
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 15 Dec 2009 01:57:47 +0000 (17:57 -0800)]
oom: dump stack and VM state when oom killer panics
The oom killer header, including information such as the allocation order
and gfp mask, current's cpuset and memory controller, call trace, and VM
state information is currently only shown when the oom killer has selected
a task to kill.
This information is omitted, however, when the oom killer panics either
because of panic_on_oom sysctl settings or when no killable task was
found. It is still relevant to know crucial pieces of information such as
the allocation order and VM state when diagnosing such issues, especially
at boot.
This patch displays the oom killer header whenever it panics so that bug
reports can include pertinent information to debug the issue, if possible.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Marek [Tue, 15 Dec 2009 01:57:43 +0000 (17:57 -0800)]
MAINTAINERS: new kbuild maintainer
Sam was fine with handing over kbuild maintainership to me. The git
trees are already in linux-next, a merge request will follow shortly.
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 15 Dec 2009 01:57:37 +0000 (17:57 -0800)]
hfs: fix a potential buffer overflow
A specially-crafted Hierarchical File System (HFS) filesystem could cause
a buffer overflow to occur in a process's kernel stack during a memcpy()
call within the hfs_bnode_read() function (at fs/hfs/bnode.c:24). The
attacker can provide the source buffer and length, and the destination
buffer is a local variable of a fixed length. This local variable (passed
as "&entry" from fs/hfs/dir.c:112 and allocated on line 60) is stored in
the stack frame of hfs_bnode_read()'s caller, which is hfs_readdir().
Because the hfs_readdir() function executes upon any attempt to read a
directory on the filesystem, it gets called whenever a user attempts to
inspect any filesystem contents.
[amwang@redhat.com: modify this patch and fix coding style problems]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Tue, 15 Dec 2009 01:57:34 +0000 (17:57 -0800)]
bsdacct: fix uid/gid misreporting
commit
d8e180dcd5bbbab9cd3ff2e779efcf70692ef541 "bsdacct: switch
credentials for writing to the accounting file" introduced credential
switching during final acct data collecting. However, uid/gid pair
continued to be collected from current which became credentials of who
created acct file, not who exits.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14676
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Juho K. Juopperi <jkj@kapsi.fi>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 14 Dec 2009 22:11:56 +0000 (14:11 -0800)]
Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-core: i2c bus should support PM entries in struct dev_pm_ops
i2c: Get rid of I2C_CLIENT_MODULE_PARM
i2c: Drop I2C_CLIENT_INSMOD_2 to 8
i2c: Drop I2C_CLIENT_INSMOD_1
i2c: Get rid of struct i2c_client_address_data
i2c: Drop the kind parameter from detect callbacks
Linus Torvalds [Mon, 14 Dec 2009 20:50:25 +0000 (12:50 -0800)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: Avoid IO in udf_clear_inode
udf: Try harder when looking for VAT inode
udf: Fix compilation with UDFFS_DEBUG enabled
Jan Kara [Thu, 3 Dec 2009 12:39:28 +0000 (13:39 +0100)]
udf: Avoid IO in udf_clear_inode
It is not very good to do IO in udf_clear_inode. First, VFS does not really
expect inode to become dirty there and thus we have to write it ourselves,
second, memory reclaim gets blocked waiting for IO when it does not really
expect it, third, the IO pattern (e.g. on umount) resulting from writes in
udf_clear_inode is bad and it slows down writing a lot.
The reason why UDF needed to do IO in udf_clear_inode is that UDF standard
mandates extent length to exactly match inode size. But when we allocate
extents to a file or directory, we don't really know what exactly the final
file size will be and thus temporarily set it to block boundary and later
truncate it to exact length in udf_clear_inode. Now, this is changed to
truncate to final file size in udf_release_file for regular files. For
directories and symlinks, we do the truncation at the moment when learn
what the final file size will be.
Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 30 Nov 2009 18:47:55 +0000 (19:47 +0100)]
udf: Try harder when looking for VAT inode
Some disks do not contain VAT inode in the last recorded block as required
by the standard but a few blocks earlier (or the number of recorded blocks
is wrong). So look for the VAT inode a bit before the end of the media.
Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara [Mon, 30 Nov 2009 18:47:10 +0000 (19:47 +0100)]
udf: Fix compilation with UDFFS_DEBUG enabled
Signed-off-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Mon, 14 Dec 2009 20:36:46 +0000 (12:36 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Clean up thermal init by introducing intel_thermal_supported()
x86, mce: Thermal monitoring depends on APIC being enabled
x86: Gart: fix breakage due to IOMMU initialization cleanup
x86: Move swiotlb initialization before dma32_free_bootmem
x86: Fix build warning in arch/x86/mm/mmio-mod.c
x86: Remove usedac in feature-removal-schedule.txt
x86: Fix duplicated UV BAU interrupt vector
nvram: Fix write beyond end condition; prove to gcc copy is safe
mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe
x86: Limit the number of processor bootup messages
x86: Remove enabling x2apic message for every CPU
doc: Add documentation for bootloader_{type,version}
x86, msr: Add support for non-contiguous cpumasks
x86: Use find_e820() instead of hard coded trampoline address
x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks
Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
Linus Torvalds [Mon, 14 Dec 2009 20:33:02 +0000 (12:33 -0800)]
Merge git://git./linux/kernel/git/brodo/pcmcia-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
pcmcia: CodingStyle fixes
pcmcia: remove unused IRQ_FIRST_SHARED
sonic zhang [Mon, 14 Dec 2009 20:17:30 +0000 (21:17 +0100)]
i2c-core: i2c bus should support PM entries in struct dev_pm_ops
Struct dev_pm_ops is not configured in current i2c bus type. i2c drivers
only depends on suspend/resume entries in struct dev_pm_ops are not
informed of PM suspend and resume events by i2c framework.
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 14 Dec 2009 20:17:29 +0000 (21:17 +0100)]
i2c: Get rid of I2C_CLIENT_MODULE_PARM
There is no user left of I2C_CLIENT_MODULE_PARM, so we can finally
get rid of this ugly macro.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Jean Delvare [Mon, 14 Dec 2009 20:17:27 +0000 (21:17 +0100)]
i2c: Drop I2C_CLIENT_INSMOD_2 to 8
These macros simply declare an enum, so drivers might as well declare
it themselves. This puts an end to the arbitrary limit of 8 chip types
per i2c driver.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Jean Delvare [Mon, 14 Dec 2009 20:17:26 +0000 (21:17 +0100)]
i2c: Drop I2C_CLIENT_INSMOD_1
This macro simply declares an enum, so drivers might as well declare
it themselves.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Jean Delvare [Mon, 14 Dec 2009 20:17:25 +0000 (21:17 +0100)]
i2c: Get rid of struct i2c_client_address_data
Struct i2c_client_address_data only contains one field at this point,
which makes its usefulness questionable. Get rid of it and pass simple
address lists around instead.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Jean Delvare [Mon, 14 Dec 2009 20:17:23 +0000 (21:17 +0100)]
i2c: Drop the kind parameter from detect callbacks
The "kind" parameter always has value -1, and nobody is using it any
longer, so we can remove it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Tested-by: Wolfram Sang <w.sang@pengutronix.de>
Linus Torvalds [Mon, 14 Dec 2009 18:22:11 +0000 (10:22 -0800)]
Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (23 commits)
spi: fix probe/remove section markings
Add OMAP spi100k driver
spi-imx: don't access struct device directly but use dev_get_platdata
spi-imx: Add mx25 support
spi-imx: use positive logic to distinguish cpu variants
spi-imx: correct check for platform_get_irq failing
ARM: NUC900: Add spi driver support for nuc900
spi: SuperH MSIOF SPI Master driver V2
spi: fix spidev compilation failure when VERBOSE is defined
spi/au1550_spi: fix setupxfer not to override cfg with zeros
spi/mpc8xxx: don't use __exit_p to wrap plat_mpc8xxx_spi_remove
spi/i.MX: fix broken error handling for gpio_request
spi/i.mx: drain MXC SPI transfer buffer when probing device
MAINTAINERS: add SPI co-maintainer.
spi/xilinx_spi: fix incorrect casting
spi/mpc52xx-spi: minor cleanups
xilinx_spi: add a platform driver using the xilinx_spi common module.
xilinx_spi: add support for the DS570 IP.
xilinx_spi: Switch to iomem functions and support little endian.
xilinx_spi: Split into of driver and generic part.
...
Linus Torvalds [Mon, 14 Dec 2009 18:13:22 +0000 (10:13 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf sched: Fix build failure on sparc
perf bench: Add "all" pseudo subsystem and "all" pseudo suite
perf tools: Introduce perf_session class
perf symbols: Ditch dso->find_symbol
perf symbols: Allow lookups by symbol name too
perf symbols: Add missing "Variables" entry to map_type__name
perf symbols: Add support for 'variable' symtabs
perf symbols: Introduce ELF counterparts to symbol_type__is_a
perf symbols: Introduce symbol_type__is_a
perf symbols: Rename kthreads to kmaps, using another abstraction for it
perf tools: Allow building for ARM
hw-breakpoints: Handle bad modify_user_hw_breakpoint off-case return value
perf tools: Allow cross compiling
tracing, slab: Fix no callsite ifndef CONFIG_KMEMTRACE
tracing, slab: Define kmem_cache_alloc_notrace ifdef CONFIG_TRACING
Trivial conflict due to different fixes to modify_user_hw_breakpoint()
in include/linux/hw_breakpoint.h
David Howells [Mon, 14 Dec 2009 14:13:44 +0000 (14:13 +0000)]
PCI: Global variable decls must match the defs in section attributes
Global variable declarations must match the definitions in section attributes
as the compiler is at liberty to vary the method it uses to access a variable,
depending on the section it is in.
When building the FRV arch, I now see:
drivers/built-in.o: In function `pci_apply_final_quirks':
drivers/pci/quirks.c:2606: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
drivers/pci/quirks.c:2623: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
drivers/pci/quirks.c:2630: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o
because the declaration of pci_dfl_cache_line_size in linux/pci.h does not
match the definition in drivers/pci/pci.c.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Mon, 14 Dec 2009 14:03:27 +0000 (14:03 +0000)]
FRV: Fix no-hardware-breakpoint case
If there is no hardware breakpoint support, modify_user_hw_breakpoint()
tries to return a NULL pointer through as an 'int' return value:
In file included from kernel/exit.c:53:
include/linux/hw_breakpoint.h: In function 'modify_user_hw_breakpoint':
include/linux/hw_breakpoint.h:96: warning: return makes integer from pointer without a cast
Return 0 instead.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 14 Dec 2009 18:04:04 +0000 (10:04 -0800)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (46 commits)
microblaze: Remove rt_sigsuspend wrapper
microblaze: nommu: Don't clobber R11 on syscalls
microblaze: Remove show_tmem function
microblaze: Support for WB cache
microblaze: Add PVR for Microblaze v7.30.a
microblaze: Remove ancient and fake microblaze version from cpu_ver table
microblaze: Remove panic_timeout init value
microblaze: Do not count system calls in default
microblaze: Enable DTC compilation
microblaze: Core oprofile configs and hooks
microblaze: Fix level interrupt ACKing
microblaze: Enable futimesat syscall
microblaze: Checking DTS against PVR for write-back cache
microblaze: Remove duplicity from pgalloc.h
microblaze: Futex support
microblaze: Adding dev_arch_data functions
microblaze: Fix the heartbeat gpio to be more robust
microblaze: Simple __copy_tofrom_user for noMMU
microblaze: Export memory_start for modules
microblaze: Use lowest-common-denominator default CPU settings
...
Linus Torvalds [Mon, 14 Dec 2009 18:03:36 +0000 (10:03 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md: (27 commits)
md: add 'recovery_start' per-device sysfs attribute
md: rcu_read_lock() walk of mddev->disks in md_do_sync()
md: integrate spares into array at earliest opportunity.
md: move compat_ioctl handling into md.c
md: revise Kconfig help for MD_MULTIPATH
md: add MODULE_DESCRIPTION for all md related modules.
raid: improve MD/raid10 handling of correctable read errors.
md/raid10: print more useful messages on device failure.
md/bitmap: update dirty flag when bitmap bits are explicitly set.
md: Support write-intent bitmaps with externally managed metadata.
md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb
md: support updating bitmap parameters via sysfs.
md: factor out parsing of fixed-point numbers
md: support bitmap offset appropriate for external-metadata arrays.
md: remove needless setting of thread->timeout in raid10_quiesce
md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.
md: move offset, daemon_sleep and chunksize out of bitmap structure
md: collect bitmap-specific fields into one structure.
md/raid1: add takeover support for raid5->raid1
md: add honouring of suspend_{lo,hi} to raid1.
...
Linus Torvalds [Mon, 14 Dec 2009 18:02:35 +0000 (10:02 -0800)]
Merge branch 'for-next' of git://git./linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (58 commits)
mfd: Add twl6030 regulator subdevices
regulator: Add support for twl6030 regulators
rtc: Add twl6030 RTC support
mfd: Add support for twl6030 irq framework
mfd: Rename twl4030_ routines in twl-regulator.c
mfd: Rename twl4030_ routines in rtc-twl.c
mfd: Rename all twl4030_i2c*
mfd: Rename twl4030* driver files to enable re-use
mfd: Clarify twl4030 return value for read and write
mfd: Add all twl4030 regulators to the twl4030 mfd driver
mfd: Don't set mc13783 ADREFMODE for touch conversions
mfd: Remove ezx-pcap defines for custom led gpio encoding
mfd: Near complete mc13783 rewrite
mfd: Remove build time warning for WM835x register default tables
mfd: Force I2C to be built in when building WM831x
mfd: Don't allow wm831x to be built as a module
mfd: Fix incorrect error check for wm8350-core
mfd: Fix twl4030 warning
gpiolib: Implement gpio_to_irq() for wm831x
mfd: Remove default selection of AB4500
...
Linus Torvalds [Mon, 14 Dec 2009 18:01:15 +0000 (10:01 -0800)]
Merge branch 'devel' of /home/rmk/linux-2.6-arm
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: fix lh7a40x build
ARM: fix sa1100 build
ARM: fix clps711x, footbridge, integrator, ixp2000, ixp2300 and s3c build bug
ARM: VFP: fix vfp thread init bug and document vfp notifier entry conditions
ARM: pxa: fix now incorrect reference of skt->irq by using skt->socket.pci_irq
[ARM] pxa/zeus: default configuration for Arcom Zeus SBC.
[ARM] pxa/zeus: make Viper pcmcia support more generic to support Zeus
[ARM] pxa/zeus: basic support for Arcom Zeus SBC
[ARM] pxa/em-x270: fix usb hub power up/reset sequence
PCMCIA: fix pxa2xx_lubbock modular build error
ARM: RealView: Fix typo in the RealView/PBX Kconfig entry
ARM: Do not allow the probing of the local timer
ARM: Add an earlyprintk debug console
Linus Torvalds [Mon, 14 Dec 2009 18:00:24 +0000 (10:00 -0800)]
Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (75 commits)
NFS: Fix nfs_migrate_page()
rpc: remove unneeded function parameter in gss_add_msg()
nfs41: Invoke RECLAIM_COMPLETE on all new client ids
SUNRPC: IS_ERR/PTR_ERR confusion
NFSv41: Fix a potential state leakage when restarting nfs4_close_prepare
nfs41: Handle NFSv4.1 session errors in the delegation recall code
nfs41: Retry delegation return if it failed with session error
nfs41: Handle session errors during delegation return
nfs41: Mark stateids in need of reclaim if state manager gets stale clientid
NFS: Fix up the declaration of nfs4_restart_rpc when NFSv4 not configured
nfs41: Don't clear DRAINING flag on NFS4ERR_STALE_CLIENTID
nfs41: nfs41_setup_state_renewal
NFSv41: More cleanups
NFSv41: Fix up some bugs in the NFS4CLNT_SESSION_DRAINING code
NFSv41: Clean up slot table management
NFSv41: Fix nfs4_proc_create_session
nfs41: Invoke RECLAIM_COMPLETE
nfs41: RECLAIM_COMPLETE functionality
nfs41: RECLAIM_COMPLETE XDR functionality
Cleanup some NFSv4 XDR decode comments
...
Linus Torvalds [Mon, 14 Dec 2009 17:58:24 +0000 (09:58 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
William Allen Simpson [Sun, 13 Dec 2009 20:12:46 +0000 (15:12 -0500)]
Documentation: rw_lock lessons learned
In recent months, two different network projects erroneously
strayed down the rw_lock path. Update the Documentation
based upon comments by Eric Dumazet and Paul E. McKenney in
those threads.
Further updates await somebody else with more expertise.
Changes:
- Merged with extensive content by Stephen Hemminger.
- Fix one of the comments by Linus Torvalds.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidetoshi Seto [Mon, 14 Dec 2009 08:57:00 +0000 (17:57 +0900)]
x86, mce: Clean up thermal init by introducing intel_thermal_supported()
It looks better to have a common function. No change in functionality.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <
4B25FDDC.407@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cyrill Gorcunov [Mon, 14 Dec 2009 08:56:34 +0000 (17:56 +0900)]
x86, mce: Thermal monitoring depends on APIC being enabled
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...
Note that "Intel Correct Machine Check Interrupts" already
has such a check.
Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <
4B25FDC2.
3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
David Miller [Mon, 14 Dec 2009 07:56:22 +0000 (23:56 -0800)]
perf sched: Fix build failure on sparc
Here, tvec->tv_usec is "unsigned int" not "unsigned long".
Since the type is different on every platform, it's probably
best to just use long printf formats and cast.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
20091213.235622.
53363059.davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Mon, 14 Dec 2009 02:52:15 +0000 (11:52 +0900)]
x86: Gart: fix breakage due to IOMMU initialization cleanup
This fixes the following breakage of the commit
75f1cdf1dda92cae037ec848ae63690d91913eac:
- GART systems that don't AGP with broken BIOS and more than 4GB
memory are forced to use swiotlb. They can allocate aperture by
hand and use GART.
- GART systems without GAP must disable GART on shutdown.
- swiotlb usage is forced by the boot option,
gart_iommu_hole_init() is not called, so we disable GART
early_gart_iommu_check().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <
1260759135-6450-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FUJITA Tomonori [Mon, 14 Dec 2009 02:52:14 +0000 (11:52 +0900)]
x86: Move swiotlb initialization before dma32_free_bootmem
The commit
75f1cdf1dda92cae037ec848ae63690d91913eac introduced a
bug that we initialize SWIOTLB right after dma32_free_bootmem so
we wrongly steal memory area allocated for GART with broken BIOS
earlier.
This moves swiotlb initialization before dma32_free_bootmem().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <
1260759135-6450-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Joe Perches [Mon, 14 Dec 2009 07:24:03 +0000 (23:24 -0800)]
x86: Fix build warning in arch/x86/mm/mmio-mod.c
Stephen Rothwell reported these warnings:
arch/x86/mm/mmio-mod.c: In function 'print_pte':
arch/x86/mm/mmio-mod.c:100: warning: too many arguments for format
arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format
The 'fmt' was left out accidentally.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus <torvalds@linux-foundation.org>
LKML-Reference: <
1260775443.18538.16.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FUJITA Tomonori [Mon, 14 Dec 2009 02:06:15 +0000 (11:06 +0900)]
x86: Remove usedac in feature-removal-schedule.txt
The reason of removal, "replaced by allowdac and no dac
combination" is incorrect. There is no way to do the same thing
with "allowdac" and "nodac" combination.
The usedac option enables us to stop via_no_dac() setting
forbid_dac to 1. That is, someone who uses VIA bridges can use
DAC with this option even if some of VIA bridges seem to be
broken about DAC.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: WANG Cong <amwang@redhat.com>
Cc: gcosta@redhat.com
LKML-Reference: <20091214104423X.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Hitoshi Mitake [Sun, 13 Dec 2009 08:01:59 +0000 (17:01 +0900)]
perf bench: Add "all" pseudo subsystem and "all" pseudo suite
This patch adds a new "all" pseudo subsystem and an "all" pseudo
suite. These are for testing all subsystem and its all suite, or
all suite of one subsystem.
(This patch also contains a few trivial comment fixes for
bench/* and output style fixes. I judged that there are no
necessity to make them into individual patch.)
Example of use:
| % ./perf bench sched all # Test all suites of sched subsystem
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.414 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted
1000000 pipe operations between two tasks
|
| Total time: 10.999 [sec]
|
| 10.999317 usecs/op
| 90914 ops/sec
|
| % ./perf bench all # Test all suites of all subsystems
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.420 [sec]
|
| # Running sched/pipe benchmark...
| # Extecuted
1000000 pipe operations between two tasks
|
| Total time: 11.741 [sec]
|
| 11.741346 usecs/op
| 85169 ops/sec
|
| # Running mem/memcpy benchmark...
| # Copying 1MB Bytes from 0x7ff33e920010 to 0x7ff3401ae010 ...
|
| 808.407437 MB/Sec
Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <
1260691319-4683-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Michal Simek [Fri, 11 Dec 2009 11:54:04 +0000 (12:54 +0100)]
microblaze: Remove rt_sigsuspend wrapper
Generic rt_sigsuspend syscalls doesn't need any asm wrapper.
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Wed, 9 Dec 2009 23:13:42 +0000 (17:13 -0600)]
microblaze: nommu: Don't clobber R11 on syscalls
The noMMU syscall trap has a bug that causes R11 to be zero on return to
userland. Remove the extra "save" of R11 responsible for the bug.
Remove reloading of mode indicator because r11 already contains it.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 11:06:03 +0000 (12:06 +0100)]
microblaze: Remove show_tmem function
show_tmem function do nothing that's why I removed it.
There is also cleaning of commented ancient code.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 10:43:57 +0000 (11:43 +0100)]
microblaze: Support for WB cache
Microblaze version 7.20.d is the first MB version which can be run
on MMU linux. Please do not used previous version because they contain
HW bug.
Based on WB support was necessary to redesign whole cache design.
Microblaze versions from 7.20.a don't need to disable IRQ and cache
before working with them that's why there are special structures for it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:54:07 +0000 (17:54 +0100)]
microblaze: Add PVR for Microblaze v7.30.a
Microblaze v7.30.a will have 0x10 version string.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:51:06 +0000 (17:51 +0100)]
microblaze: Remove ancient and fake microblaze version from cpu_ver table
We need to continue with next microblaze PVR version that's why
I have to remove that ancient version. These version strings not match
any versions. From Microblaze v5.00.a is possible to use this style.
I believe that none use ancients versions. If yes they will be just
labeled as unknown version.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 8 Dec 2009 16:49:21 +0000 (17:49 +0100)]
microblaze: Remove panic_timeout init value
panic_timeout is in BSS section and it is cleared with BSS section.
This means that value is setup to 0.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 7 Dec 2009 07:21:34 +0000 (08:21 +0100)]
microblaze: Do not count system calls in default
There is not necessary to count system calls that's why
I added DEBUG macro
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 30 Nov 2009 08:26:09 +0000 (09:26 +0100)]
microblaze: Enable DTC compilation
For simpleImage format we need to compile DTC. There is still possibility
to compile only Linux kernel without DTB compiled-in.
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Tue, 24 Nov 2009 10:27:54 +0000 (20:27 +1000)]
microblaze: Core oprofile configs and hooks
Microblaze uses timer interrupt mode. Microblaze don't have
any performance counter that's why we use just simple implementation.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Tue, 17 Nov 2009 14:43:39 +0000 (08:43 -0600)]
microblaze: Fix level interrupt ACKing
Level interrupts need to be ack'd in the unmask handler, as in powerpc.
Among other issues, this bug causes the system clock to appear to run at
double-speed.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 19 Oct 2009 11:50:02 +0000 (13:50 +0200)]
microblaze: Enable futimesat syscall
Futimesat was disabled. LTP testing shows that MB has no
problem with this syscall.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 21 Oct 2009 10:29:46 +0000 (12:29 +0200)]
microblaze: Checking DTS against PVR for write-back cache
WB cache has special flag in PVR. There is added checking mechanism
for PVR and DTS.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 23 Nov 2009 09:15:00 +0000 (10:15 +0100)]
microblaze: Remove duplicity from pgalloc.h
just file cleanup
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 19 Oct 2009 09:58:44 +0000 (11:58 +0200)]
microblaze: Futex support
Microblaze v7.20 provides new lwx, swx instructions which bring
possibility to implement lock rutines.
There are some tests in open posix thread LTP part but current
toolchain not support it.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 23 Nov 2009 09:07:51 +0000 (10:07 +0100)]
microblaze: Adding dev_arch_data functions
The functions, dev_arch_data_set_node and get_node are missing
and are needed by some device drivers such as I2C.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Linn [Fri, 5 Jun 2009 17:36:31 +0000 (11:36 -0600)]
microblaze: Fix the heartbeat gpio to be more robust
The device tree handling for the gpio in the heart beat was not handling
the system when there was no gpio and it wasn't working with a newer version
of the gpio core which does not have the is-bidir property.
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Fri, 14 Aug 2009 02:06:46 +0000 (12:06 +1000)]
microblaze: Simple __copy_tofrom_user for noMMU
This is first patch which clear part of uaccess.h.
uaccess.h will be clear later.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 23 Jul 2009 06:23:53 +0000 (08:23 +0200)]
microblaze: Export memory_start for modules
memory_start symbol is needed by kernel modules.
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Mon, 24 Aug 2009 03:52:33 +0000 (13:52 +1000)]
microblaze: Use lowest-common-denominator default CPU settings
This will ensure that kernels built with no custom CPU settings will still boot
OK on hardware that has additional CPU hardware instructions etc.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 21 Aug 2009 11:47:09 +0000 (13:47 +0200)]
microblaze: Update default generic DTS
It is generated with longer compatible list
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 26 Oct 2009 08:56:48 +0000 (09:56 +0100)]
microblaze: Enable asm optimization only for HW with barrel-shifter
Asm code uses barrel-shifter instruction that's why we have
to protect cases when HW don't have it.
Reported-by: John Linn <john.linn@xilinx.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
John Williams [Mon, 24 Aug 2009 03:52:32 +0000 (13:52 +1000)]
microblaze: Remove the buggy ALLOW_EDIT_AUTO config option
This was intended to allow manual override of CPU settings copied automatically
to Kconfig.auto, however it's problematic for several reasons, but mostly:
* If the defconfig doesn't have ALLOW_EDIT_AUTO=y, then it's impossible for
that defconfig to iverride the values in the kernel source tree. This leads
to very strange errors where the kernel is compiled with the wrong CPUFLAGS.
Next patch in the series will back out the default in Kconfig.auto to baseline
settings, so a kernel built with no default values will at least boot on any
hardware, just not make use of additional CPU features.
Signed-off-by: John Williams <john.williams@petalogix.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 13:18:13 +0000 (15:18 +0200)]
microblaze: Move cache macro from cache.h to cacheflush.h
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 14 Oct 2009 15:38:26 +0000 (17:38 +0200)]
microblaze: support U-BOOT image format
Two version are generated.
linux.bin.ub which is created from linux.bin file
and
simpleImage.<dts>.ub which is created from stripped simpleImage.<dts> file
Load address and entry point is for microblaze first instruction
which is CONFIG_KERNEL_BASE_ADDR variable.
There is possible for simpleImage format parse _start symbol too.
simpleImage.<dts> is still stripped elf file
I cleared simpleImage.<dts>.unstrip file because there are so big.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 09:32:25 +0000 (11:32 +0200)]
microblaze: Ptrace notifying from signal code
After the signal frame is set up on the userspace stack, ptrace() should
be given an opportunity to single-step into the signal handler
FRV, Blackfin, mn10300 and UM. Worth to look at that patches.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Wed, 14 Oct 2009 09:12:50 +0000 (11:12 +0200)]
microblaze: Extend cpuinfo for support write-back caches
There is missing checking agains PVR but this is not important
for now. There are some missing checking too.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 8 Oct 2009 11:06:42 +0000 (13:06 +0200)]
microblaze: Fix cache_line_lenght
We used cache_line as cache_line_lenght. For this reason
we did cache flushing 4 times longer than was necessary.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 15 Oct 2009 11:34:31 +0000 (13:34 +0200)]
microblaze: Detect new 7.20.d version
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 29 Oct 2009 09:12:59 +0000 (10:12 +0100)]
microblaze: Support both levels for reset
Till this patch reset always perform writen to 1.
Now we can use negative logic and perform reset write to 0.
It is opposite level than is currently read from that pin
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 29 Oct 2009 07:58:15 +0000 (08:58 +0100)]
microblaze: Fix announce message for reset gpio
I had to change message for gpio-reset because I always
not to see it. Prefix RESET is big and visible.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 13 Nov 2009 07:26:49 +0000 (08:26 +0100)]
microblaze: Remove saving and restoring before calling signal code
Saving is done in SAVE_STATE macros that's why another save discard
previous saved value.
This change has no effect to normal programs because they ends in any exception
and they are killed. On the other side has effect on debugging.
Signed-off-by: Michal Simek <monstr@monstr.eu>
steve@digidescorp.com [Fri, 13 Nov 2009 22:08:29 +0000 (16:08 -0600)]
microblaze: Fix pfn_valid() for noMMU
Configuring DEBUG_SLAB causes a noMMU kernel to die during initialization
with an invalid virtual address panic in kfree_debugcheck().
The panic is due to an improper definition of pfn_valid().
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 09:34:15 +0000 (10:34 +0100)]
microblaze: ftrace: Add dynamic function graph tracer
This patch add support for dynamic function graph tracer.
There is one my expactation that I can do flush_icache after
all code modification. On microblaze is this safer than do
flush for every entry. For icache is used name flush but
correct should be invalidation - this will be fix in upcomming
new cache implementaion and WB support.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 09:32:10 +0000 (10:32 +0100)]
microblaze: ftrace: add function graph support
For more information look at Documentation/trace folder.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 13:15:44 +0000 (14:15 +0100)]
microblaze: ftrace: Add dynamic trace support
With dynamic function tracer, by default, _mcount is defined as an
"empty" function, it returns directly without any more action. When
enabling it in user-space, it will jump to a real tracing
function(ftrace_caller), and do the real job for us.
Differ from the static function tracer, dynamic function tracer provides
two functions ftrace_make_call()/ftrace_make_nop() to enable/disable the
tracing of some indicated kernel functions(set_ftrace_filter).
In the kernel version, there is only one "_mcount" string for every
kernel function, so, we just need to match this one in mcount_regex of
scripts/recordmcount.pl.
For more information please look at code and Documentation/trace folder.
Steven ACK that scripts/recordmcount.pl part.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:55:08 +0000 (09:55 +0100)]
microblaze: ftrace: enable HAVE_FUNCTION_TRACE_MCOUNT_TEST
Implement MCOUNT_TEST in asm code - it is faster than use
generic code
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:40:14 +0000 (09:40 +0100)]
microblaze: ftrace: add static function tracer
If -pg of gcc is enabled with CONFIG_FUNCTION_TRACER=y. a calling to
_mcount will be inserted into each kernel function. so, there is a
possibility to trace the kernel functions in _mcount.
This patch add the specific _mcount support for static function
tracing. by default, ftrace_trace_function is initialized as
ftrace_stub(an empty function), so, the default _mcount will introduce
very little overhead. after enabling ftrace in user-space, it will jump
to a real tracing function and do static function tracing for us.
Commit message from Wu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 30 Oct 2009 11:26:53 +0000 (12:26 +0100)]
microblaze: Add TRACE_IRQFLAGS_SUPPORT
There are just two major changes
Renamed local_irq functions to raw_local_irq in irq.c.
Added TRACE_IRQFLAGS_SUPPORT to Kconfig.debug.
Look at Documentation/irqflags-tracing.txt
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Mon, 16 Nov 2009 08:09:47 +0000 (09:09 +0100)]
microblaze: preliminary enabling for LATENCYTOP support in Kconfig
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Thu, 10 Dec 2009 11:07:02 +0000 (12:07 +0100)]
microblaze: Lockdep support
Microblaze needs to do lock_init very soon because MMU init calls lock functions.
Here is the explanation from Peter Zijlstra why we have to enable
__ARCH_WANTS_INTERRUPTS_ON_CTSW.
"So we schedule while holding rq->lock (for obvious reasons), but since
lockdep tracks held locks per tasks, we need to transfer the held state
from the prev to the next task. We do this by explicity calling
spin_release(&rq->lock) in context_switch() right before switch_to(),
and calling spin_acquire(&rq->lock) in
finish_task_switch()->finish_lock_switch().
Now, for some reason lockdep thinks that interrupts got enabled over the
context switch (git grep __ARCH_WANTS_INTERRUPTS_ON_CTSW arch/microblaze
doesn't seem to turn up anything).
Clearly trying to acquire the rq->lock with interrupts enabled is a bad
idea and lockdep warns you about this."
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 6 Nov 2009 11:31:00 +0000 (12:31 +0100)]
microblaze: Register timecounter/cyclecounter
It is the same counter as we use as free running one.
I would like to use it for ftrace.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 10 Nov 2009 14:57:01 +0000 (15:57 +0100)]
microblaze: Stack trace support
This is working implemetation but the problem is that
Microblaze misses frame pointer that's why is there
big loop which trace and show all addresses which are in text.
It shows addresses which are in registers, etc.
This is problem and this is the reason why all Microblaze
traces are wrong. There is an option to do hacks and trace
the kernel code but this is too complicated.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 6 Nov 2009 11:27:25 +0000 (12:27 +0100)]
microblaze: Add IRQENTRY_TEXT to lds
It is important for ftrace irqsoff support
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Fri, 30 Oct 2009 13:41:52 +0000 (14:41 +0100)]
microblaze: __init_begin symbol must be aligned
The problem was that free_initmem pass to free_initrd_mem got
bad aligned __init_begin symbol and free_initrd_mem don't care
about __init_end but take PAGE_SIZE instead.
Here is behavior in kernel bootlog.
ramdisk_execute_command from (init/main.c) was rewrite
Freeing unused kernel memory: 6224k freed
Failed to execute ��������������{���
Failed to execute ��������������{����. Attempting defaults...
Mounting proc:
Mounting var:
Signed-off-by: Michal Simek <monstr@monstr.eu>